The zero click result from DuckDuckGo for IP should not be displayed. It will
return the IP of the searxng server, not the user's IP, and looks a bit strange
when the `self_info` plugin is enabled as two different IPs get returned.
The use of img_src AND thumbnail in the default results makes no sense (only a
thumbnail is needed). In the current state this is rather confusing, because
img_src is displayed like a thumbnail (small) and thumbnail is displayed like an
image (large).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The names of the links are rather tags than real names, and they sometimes vary
greatly in their spelling:
- GitHub: github, Github
- Source code: Repository, SCM, Project Source Code
- Documentation: docs, Documentation
It was standardized to terms such as 'Source code' and 'Documentation', as
translations already exist for these terms.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch is a leftover from [1] in which the WIKIDATA_UNITS values has become
a dictionary.
[1] https://github.com/searxng/searxng/pull/3378
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Startpage has changed its HTML layout, classes like ``w-gl__result__main`` do no
longer exists and the result items have been slightly changed in their
structure.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
CCC media serves several recording formats, to name a few:
- application/x-subrip
- video/mp4
- video/webm
- audio/mpeg
- audio/opus
- audio/mpeg
not all of them are suitable for a video frame. If available we should prefer
video/mp4 due to its minimal data rates.
Closes: https://github.com/searxng/searxng/issues/3431
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test this patch I used .. and checked the diff of the `messages.pot` file::
$ ./manage pyenv.cmd pybabel extract -F babel.cfg \
-o ./searx/translations/messages.pot searx/
$ git diff ./searx/translations/messages.pot
----
hint from @dalf: f-string are not supported [1] but there is no error [2].
[1] python-babel/babel#594
[2] python-babel/babel#715
Closes: https://github.com/searxng/searxng/issues/3412
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
`youtube_api.py` throws an exception if the search results contain a channel, as
channels have no videoId. This PR adds a keycheck for parsing the json response.
In the past, some files were tested with the standard profile, others with a
profile in which most of the messages were switched off ... some files were not
checked at all.
- ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished
- the distinction ``# lint: pylint`` is no longer necessary
- the pylint tasks have been reduced from three to two
1. ./searx/engines -> lint engines with additional builtins
2. ./searx ./searxng_extra ./tests -> lint all other python files
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 8af181533 in PR:
- https://github.com/searxng/searxng/pull/3321
the category `journal_article` has been removed, `book_any` has been removed
longer time ago.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
paging is broken in searchcode.com's API .. not sure it will ever been fixed /
this commit disables paging in the engine and BTW pylint `searchcode_code.py`.
Closes: https://github.com/searxng/searxng/issues/3287
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Parse the result list from ask.com given in the variable named
window.MESON.initialState::
<script nonce="..">
window.MESON = window.MESON || {};
window.MESON.initialState = {"siteConfig": ...
...}};
window.MESON.loadedLang = "en";
</script>
The result list is in field::
json_resp['search']['webResults']['results']
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In Presearch there are languages for the UI and regions for narrowing down the
search. With this change the SearXNG engine supports a search by region. The
details can be found in the documentation of the source code.
To test, you can search terms like::
!presearch bmw :zh-TW
!presearch bmw :en-CA
1. You should get results corresponding to the region (Taiwan, Canada)
2. and in the language (Chinese, Englisch).
3. The context in info box content is in the same language.
Exceptions:
1. Region or language is not supported by Presearch or
2. SearXNG user did not selected a region tag, example::
!presearch bmw :en
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
DDG's bot detection is sensitive to the vqd value. For some search terms (such
as extremely long search terms that are often sent by bots), no vqd value can be
determined.
If SearXNG cannot determine a vqd value, then no request should go out to
DDG (WEB): a request with a wrong vqd value leads to DDG temporarily putting
SearXNG's IP on a block list.
Requests from IPs in this block list run into timeouts.
Not sure, but it seems the block list is a sliding window: to get my IP rid from
the bot list I had to cool down my IP for 1h (send no requests from that IP to
DDG).
Since such issues can't reproduce in a local instance I tested this patch 24h on
my public SearXNG instance: There are still errors (rare), but the reliability
is still 100%.
Related:
- https://github.com/searxng/searxng/pull/2922
- https://github.com/searxng/searxng/pull/2923
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Some search terms do not have results and therefore no vqd value
BTW: remove a leftover from 9197efa
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
We have had problems with this before, the bot protection from ddg-lite seems to
have included this referer in the rating [1][2].
From reverse engineering:
- The Referer ``https://google.com/`` was set in commt 257dc7d6c4 --> DDG lite
does not like this referer anymore!
- The 'Referer' header is only set on second and follow up pages but not on the
first page
- The vqd value is not needed on the first page, the ddg-lite client sets this
value only on follow up pages / this can help to reduce the vqd requests from
SearXNG.
Related to 'Referer' header & ddg requests:
[1] https://github.com/searxng/searxng/pull/2161
[2] https://github.com/searxng/searxng/pull/2081
Closes: https://github.com/searxng/searxng/issues/2796
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Instead of thumbnail use img_src in the result item, otherwise the "movies"
categories looks clunky.
Related:
- b4e0d2eedc (r128785388)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Anna’s Archive has cleaned up their languages, available file extensions and
changed the HTML form.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Crossref was broken on result types journal-issue and component .. The old code
had lots of assumptions, and broke during parsing. Now the assumptions are more
explicit and checked them with the API.
Remove the usage of searx.network.multi_requests
The results from Bing contains the target URL encoded in base64
See the u parameter, remove the first two character "a1", and done.
Also add a comment the check of the result_len / pageno
( from https://github.com/searx/searx/pull/1387 )
It seems there is an API change:
extratags can be either a dictionnary or None.
This commit avoid crash when extratags is None
Test query "!osm gare du nord"
The method EngineTraits.get_region(..) returns engine's region string
that **best fits** to SearXNG's locale. This means it returns a
region (country) if only a language is set in the locale. By example the method
returns for a locale tag `es` a region `ES`.
Google's search parameter `cr` restricts search results to documents originating
in a particular country / in case of a locale tag (language) as described above,
this argument should be unset in the query send to Google.
Closes: https://github.com/searxng/searxng/issues/2672
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Show URL of the ddg-search page, not the URL of a (generic) Javascript. The
latter one is not usefull for the user.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Tis patch adds some more fields to the result items and changed paging to the
``nextResultSet`` given in seekr's JSON response.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Disable btdigg because on most SearXNG instances, SearXNG is blocked by btdigg
due to cloudflare too many requests.
This impementation did not parse the HTML page because there is an API in
XML (RSS). The RSS feed provides fewer data like amount of seeders/leechers and
the files in the torrent file. It's a tradeoff for a "stable" engine as the XML
from RSS content will change way less than the HTML page.
Closes: https://github.com/searxng/searxng/issues/2553
SearXNG does not allow a None value in the content field of a result item.
If the key (shortDescription, uploaderName) in the JSON response from piped
exists but is set to None, SearXNG ignores this result item::
DEBUG searx : result: invalid content: { .., 'content': None, ..}
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>