In [1] and [2] we discussed the need of a Result.results property and how we can
avoid unclear code. This patch implements a class for the reslut-lists of
engines::
searx.result_types.EngineResults
A simple example for the usage in engine development::
from searx.result_types import EngineResults
...
def response(resp) -> EngineResults:
res = EngineResults()
...
res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
...
return res
[1] https://github.com/searxng/searxng/pull/4183#pullrequestreview-257400034
[2] https://github.com/searxng/searxng/pull/4183#issuecomment-2614301580
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Typification of SearXNG
=======================
This patch introduces the typing of the results. The why and how is described
in the documentation, please generate the documentation ..
$ make docs.clean docs.live
and read the following articles in the "Developer documentation":
- result types --> http://0.0.0.0:8000/dev/result_types/index.html
The result types are available from the `searx.result_types` module. The
following have been implemented so far:
- base result type: `searx.result_type.Result`
--> http://0.0.0.0:8000/dev/result_types/base_result.html
- answer results
--> http://0.0.0.0:8000/dev/result_types/answer.html
including the type for translations (inspired by #3925). For all other
types (which still need to be set up in subsequent PRs), template documentation
has been created for the transition period.
Doc of the fields used in Templates
===================================
The template documentation is the basis for the typing and is the first complete
documentation of the results (needed for engine development). It is the
"working paper" (the plan) with which further typifications can be implemented
in subsequent PRs.
- https://github.com/searxng/searxng/issues/357
Answer Templates
================
With the new (sub) types for `Answer`, the templates for the answers have also
been revised, `Translation` are now displayed with collapsible entries (inspired
by #3925).
!en-de dog
Plugins & Answerer
==================
The implementation for `Plugin` and `Answer` has been revised, see
documentation:
- Plugin: http://0.0.0.0:8000/dev/plugins/index.html
- Answerer: http://0.0.0.0:8000/dev/answerers/index.html
With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up
PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented)
Autocomplete
============
The autocompletion had a bug where the results from `Answer` had not been shown
in the past. To test activate autocompletion and try search terms for which we
have answerers
- statistics: type `min 1 2 3` .. in the completion list you should find an
entry like `[de] min(1, 2, 3) = 1`
- random: type `random uuid` .. in the completion list, the first item is a
random UUID
Extended Types
==============
SearXNG extends e.g. the request and response types of flask and httpx, a module
has been set up for type extensions:
- Extended Types
--> http://0.0.0.0:8000/dev/extended_types.html
Unit-Tests
==========
The unit tests have been completely revised. In the previous implementation,
the runtime (the global variables such as `searx.settings`) was not initialized
before each test, so the runtime environment with which a test ran was always
determined by the tests that ran before it. This was also the reason why we
sometimes had to observe non-deterministic errors in the tests in the past:
- https://github.com/searxng/searxng/issues/2988 is one example for the Runtime
issues, with non-deterministic behavior ..
- https://github.com/searxng/searxng/pull/3650
- https://github.com/searxng/searxng/pull/3654
- https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469
- https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005
Why msgspec.Struct
==================
We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past:
- https://github.com/searxng/searxng/pull/1562/files
- https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2
- https://github.com/searxng/searxng/pull/1412/files
- https://github.com/searxng/searxng/pull/1356
In my opinion, TypeDict is unsuitable because the objects are still dictionaries
and not instances of classes / the `dataclass` are classes but ...
The `msgspec.Struct` combine the advantages of typing, runtime behaviour and
also offer the option of (fast) serializing (incl. type check) the objects.
Currently not possible but conceivable with `msgspec`: Outsourcing the engines
into separate processes, what possibilities this opens up in the future is left
to the imagination!
Internally, we have already defined that it is desirable to decouple the
development of the engines from the development of the SearXNG core / The
serialization of the `Result` objects is a prerequisite for this.
HINT: The threads listed above were the template for this PR, even though the
implementation here is based on msgspec. They should also be an inspiration for
the following PRs of typification, as the models and implementations can provide
a good direction.
Why just one commit?
====================
I tried to create several (thematically separated) commits, but gave up at some
point ... there are too many things to tackle at once / The comprehensibility of
the commits would not be improved by a thematic separation. On the contrary, we
would have to make multiple changes at the same places and the goal of a change
would be vaguely recognizable in the fog of the commits.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Adds [1] to the searxng.min.js and horizontal swipe events to the image gallery.
[1] https://www.npmjs.com/package/swiped-events
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The migration was done by the following steps, firts prepare the node enviroment
an open a bash in this environment::
$ make clean nvm.nodejs
...
$ ./manage nvm.bash
$ which npx
searxng/.nvm/versions/node/v23.5.0/bin/npx
In this environment the migration command from [1] is started::
$ npx @eslint/migrate-config .eslintrc.json
Need to install the following packages:
@eslint/migrate-config@1.3.5
Migrating .eslintrc.json
Wrote new config to ./eslint.config.mjs
You will need to install the following packages to use the new config:
- globals
- @eslint/js
- @eslint/eslintrc
You can install them using the following command:
npm install globals @eslint/js @eslint/eslintrc -D
The following messages were generated during migration:
- The 'node' environment is used, so switching sourceType to 'commonjs'.
[1] https://eslint.org/docs/latest/use/configure/migration-guide#migrate-your-config-file
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To avoid issue like [1], versions from now on are pinned in::
searx/static/themes/simple/package-lock.json
To test nodejs v23 or newer is needed (will be installed by nvm). To drop a
possibly existing installation::
$ make clean
Install nodejs in nvm::
$ make nvm.nodejs
INFO: install (update) NVM at searxng/.nvm
...
Now using node v23.5.0 (npm v10.9.2)
Creating default alias: default -> v23.5 (-> v23.5.0)
INFO: Node.js is installed at searxng/.nvm/versions/node/v23.5.0/bin/node
INFO: Node.js is version v23.5.0
INFO: npm is installed at searxng/.nvm/versions/node/v23.5.0/bin/npm
INFO: npm is version 10.9.2
INFO: NVM is installed at searxng/.nvm
To test npm checks and builds:
$ make static.build.commit
Related:
[1] https://github.com/searxng/searxng/issues/4143
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
$ ./manage pyenv.cmd pybabel compile --statistics -d searx/translations/
reports:
catalog searx/translations/ga/LC_MESSAGES/messages.po is marked as fuzzy, skipping
This commit removes the ``fuzzy`` tag and BTW reverts commit 655e41f27
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The value of `params['api_key']` isn't read anywhere.
Writing directly into the header object solves this quite easily though.
> [Users can authenticate by including their API key either in a request URL by appending `?apikey=<API KEY>`, or by including the `X-API-Key: <API KEY>` header with the request.](https://wallhaven.cc/help/api)
Github action Update data - update_engine_traits [1] had issues in annas
archive, bing-* and zlibrary engines:
./manage pyenv.cmd python ./searxng_extra/update/update_engine_traits.py
[1] https://github.com/searxng/searxng/actions/runs/12530827768/job/34953392587
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
A SearXNG maintainer on Matrix reported a traceback::
File "searxng-src/searx/engines/xpath.py", line 272, in response
dom = html.fromstring(resp.text)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 850, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
raise etree.ParserError(
lxml.etree.ParserError: Document is empty
I don't have an example to reproduce the issue, but the issue and this patch are
clearly recognizable even without an example.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In some results, Google returns a <script> tag that must be removed before
extracting the content.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The WEB page (PL only) has changed and there is now also a kind of CAPTCHA.
There is currently no possibility to restore the function of this engine.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The engines do not have / do not need a property `base_url`, lets remove it from
the settings.yml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The properties `item.service_medium` and `item.thumb_gallery` are not given for
every result item. It is more reliable to use the first (thumb) and
last (image) URL in the list of of URLs in `image_url`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The engine has been revised; there is now the option ``adobe_content_types``
with which it is possible to configure engines for video and audio from the
adobe stock. BTW this patch adds documentation to the engine.
To test all three engines in one use a search term like::
!asi !asv !asa sound
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In order to be able to implement error handling, it is necessary to know which
URL triggered the exception / the URL has not yet been logged.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove 'Autodetect search language', which is no longer valid, from settings,
and add 'Unit converter plugin', which is now default enabled, to settings.
The entire source code of the duckduckgo engine has been reengineered and
purified.
1. DDG used the URL https://html.duckduckgo.com/html for no-JS requests whose
response is also easier to parse than the previous
https://lite.duckduckgo.com/lite/ URL
2. the bot detection of DDG has so far caused problems and often led to a
CAPTCHA, this can be circumvented using `'Sec-Fetch-Mode'] = “navigate”`
Closes: https://github.com/searxng/searxng/issues/3927
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The previous implementation could not distinguish a CAPTCHA response from an
ordinary result list. In the previous implementation a CAPTCHA was taken as a
result list where no items are in.
DDG does not block IPs. Instead, a CAPTCHA wall is placed in front of request
on a dubious request.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch adds an additional *isinstance* check within the ast parser to check
for float along with int, fixing the underlying issue.
Co-Authored: Markus Heiser <markus.heiser@darmarit.de>
Improve region and language detection / all locale
Testing has shown the following behaviour for the different
default and empty values of Mojeeks parameters:
| param | idx | value | behaviour |
| -------- | --- | ------ | ------------------------- |
| region | 0 | '' | detect region based on IP |
| region | 1 | 'none' | all regions |
| language | 0 | '' | all languages |
Without this patch the Gitea Search Engine is only partially compatible with
modern gitea or forgejo:
- Fixing some JSON Fields
- Using Repository Avatar when Available
To Verify My results you can look at the Modern API doc and results, its
available on all Gitea and Forgejo instance by Default. Heres an Search API
result of Mine:
- https://git.euph.dev/api/v1/repos/search?q=ccna
All favicons implementations have been documented and moved to the Python
package:
searx.favicons
There is a configuration (based on Pydantic) for the favicons and all its
components:
searx.favicons.config
A solution for caching favicons has been implemented:
searx.favicon.cache
If the favicon is already in the cache, the returned URL is a data URL [1]
(something like `data:image/png;base64,...`). By generating a data url from
the FaviconCache, additional HTTP roundtripps via the favicon_proxy are saved:
favicons.proxy.favicon_url
The favicon proxy service now sets a HTTP header "Cache-Control: max-age=...":
favicons.proxy.favicon_proxy
The resolvers now also provide the mime type (data, mime):
searx.favicon.resolvers
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- for tests which perform the same arrange/act/assert pattern but with different
data, the data portion has been moved to the ``paramaterized.expand`` fields
- for monolithic tests which performed multiple arrange/act/asserts,
they have been broken up into different unit tests.
- when possible, change generic assert statements to more concise
asserts (i.e. ``assertIsNone``)
This work ultimately is focused on creating smaller and more concise tests.
While paramaterized may make adding new configurations for existing tests
easier, that is just a beneficial side effect. The main benefit is that smaller
tests are easier to reason about, meaning they are easier to debug when they
start failing. This improves the developer experience in debugging what went
wrong when refactoring the project.
Total number of tests went from 192 -> 259; or, broke apart larger tests into 69
more concise ones.
add Cloudflare AI Gateway engine
add settings for Cloudflare AI Gateway engine
set utf8 encode for data, fix non english char cause 500 error
format json data
fixed indentation and config format error
fix line-length limitation in CI
reformatted code for CI
reformatted code for CI
limit system prompts to less 120 chars
cleanup unused variable & format code
In its previous implementation, the macro ``checkbox_onoff_reversed`` always
created an ``aria-labelledby`` attribute, even if there was no descriptive tag
with the generated ID (used as the value of the ``aria-labelledby``).
Before this patch, the Nu-HTML-Checker [1] reported 255 issues of this type::
The aria-labelledby attribute must point to an element in the same document. (255)
[1] https://validator.w3.org/nu/
Signed-off-by: Markus <markus@venom.fritz.box>
The ``aria-labelledby`` [1] attribute identifies the element that labels the
element it is applied to. The templates ``infinite_scroll.html`` and
``search_on_category_select.html`` define a ``aria-labelledby`` at the <input>
tag but miss the id in the <div> with the description.
[1] https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Attributes/aria-labelledby
Signed-off-by: Markus <markus@venom.fritz.box>
So far a CAPTCHA was not recognized in the response of the qwant engine and a
SearxEngineAPIException was raised by mistake. With this patch a CAPTCHA
redirect is recognized and the correct SearxEngineCaptchaException is raised.
Closes: https://github.com/searxng/searxng/issues/3806
Signed-off-by: Markus <markus@venom.fritz.box>