Internet search engines

Tips, tricks, confidentiality

August 2, 2021 — April 26, 2024

computers are awful together
faster pussycat
information provenance
NLP
search
Figure 1

Finding things on the internet! At one point this felt like a solved problem, but it seems to have gotten unsolved.

Famously, Google seems not to be good at search any longer. Speculations about why include losing the SEO battle, or that human-friendly content is being squeezed aside in general, that Google are spending down their credibility in order to bring in advertising revenue, or some other more complicated mechanisms and incentives are just making things terrible or boring.

For some quantifiable data on this theme, see webis-de/ecir24-seo-spam-in-search-engines (Bevendorff et al. 2024).

Regardless of the details or reasons, it does seem to be true for me that search results are bad right now.

In addition, I am uncomfortable with the surveillance and tracking involved in search engines. Insofar as they are the way I access the world, they can potentially know too much about me.

I am interested in solving these problems.

1 Better commercial search providers

I don’t want large search businesses to know what I am searching for.

Here are some links to search engines which may reduce the degree of user surveillance, or at least, diffuse the surveillance across a few different players, or provide added value over the classic searches such as Google and Bing.

Many of these make strong claims to protect user privacy, although few offer substantive guarantees in excess of inspecting tracking headers. Some of them repackage other searches; some run their own indices. Most of them have very unclear business models.

1.1 Kagi

An exception to the opaque-business-model rule is Kagi. Their value proposition is, they claim, to be credibly user-centric:

Kagi has no ads and is fully supported only by its users. We worked very hard to provide high quality, fast and tracking-free results at a minimum cost to ensure sustainability of our operation.

By choosing a paid Kagi plan, you are also helping accelerate our mission of humanizing the web.

The free plan is pretty good, and they will happily sell you extra features/more searches:

  • Kagi search features | Kagi Blog

  • No ads

  • Ability to block/boost domains

  • Bangs allow you to quickly jump to all popular sites on the web.

  • zero telemetry, zero tracking

  • See how fast is a website or how many ads/trackers it has before clicking the result.

They have been criticised for being being chaos pants. These criticisms to me seem reasonable but not fatal.

Obviously if I become a subscriber, they can in principle track me, so the privacy angle hinges upon some trust.

1.2 Marginalia

File under quirky/quixotic/small web, Marginalia Search:

This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren’t aware of in favor of the sort of sites you probably already knew existed.

The software for this search engine is all custom-built, and all crawling and indexing is done in-house. The project is open source. Feel free to poke about in the source code or contribute to the development!

The search engine is currently serving about 107 queries/minute.

1.3 Startpage

Startpage claims to repackage Google search results AFAIK anonymously, although I cannot see much information about why I should believe them on this. Dutch company. To use them as a search bar in Firefox I needed to add a browser extension, for some tedious reason.

1.4 DuckDuckGo

Perennial favourite, duckduckgo is a search engine run by strident privacy advocates which is laudable I s’pose. The search is… OK. Usually not as good as Google. Every now and again it is serendipitously wonderful, but not reliably.

1.5 Brave

Brave Search recently launched, backed by the creators of the Brave browser. TBC.

1.6 Mojeek

Mojeek/Mojeek Focus (Bookmark) Search Engine

Mojeek was created to provide a globally competitive and genuine alternative search engine based in the UK, and from the outset one that didn’t track its users nor simply retrieve its results from another engine (i.e. to provide real alternative results).

Mojeek’s technology has been developed entirely from scratch by Marc Smith, mostly using the C programming language, and uses no pre-existing search or web crawler technology. All technology and IP is fully owned by Mojeek Limited.

1.7 Qwant

Qwant promises to forget user data rapidly. French company.

1.8 Runnaroo

Similar to Qwant? See runaroo. Promises to aggregate many other search engines and reviews sites. Their business model is opaque.

1.9 Search encrypt

search encrypt claims to additional privacy via encryption in the Perfect Forward Secrecy mode. Presumably this is supposed to prevent them from assembling a history of my searches?

1.10 Suppressing spam in search results

2 DIY search proxies

A.k.a. meta-searching. I suspect these imply maintenance overhead as the search companies attempt to circumvent this circumvention of their business model. Effectively, you would be participating in an arms race.

2.1 searx

The searx family is a network of metasearch engine portals with the aim of protecting the privacy of users. Searx does not share users IP addresses or search history with the search engines from which it gathers results. Tracking cookies served by the search engines are blocked etc. The flagship instance is searx.me There are many user-operated instances and it is open source. Advanced: run your own DIY search anonymiser!

2.2 mysearch

mysearch — Local search engine portal designed to anonymise search requests and display search results better.A public instance is available at search.jesuislibre.net. Dead AFAICT.

5 Incoming

  • Vicki Boykis, How I search in 2024

    We are now in a very weird liminal space in information retrieval for consumers, particularly those attuned to trends in search and working on the bleeding edge of LLMs.

    […]we have the fall of old companies. Broadcast-based centralized social media, which steadily served as a newsfeed and realtime search for a small, vocal minority, is basically dead, or on its last legs. Search, namely Google, is basically a useless pile of ads and SEO gamification at this point and a stopping point for Reddit results. Everyone has written about it and covered this extensively. […]

    … on the heels of the large companies of the last 15 years declining, we have a new indie search engine scene emerging, hungry, armed with AI tooling, and ready to take back quality on the web.

    She picks Kagi, Marginalia, and Perplexity.

  • Introducing Simple Search

    Simple Search is an extension that highlights the “traditional” or “ten blue link” search results provided by the search engine, laying them over the info boxes and other content. Close the window to view the full results page. Compatible with Bing and Google search engines.

  • Internet Search Tips · Gwern.net

  • Google Search Really Has Gotten Worse, Researchers Find

7 References

Bevendorff, Wiegmann, Potthast, et al. 2024. Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines.” In Advances in Information Retrieval.