Putting aside what happens if one allows Javascript and uses "modern" browsers, Startpage generally does not seem to require any more data from users than DDG. A small shell script can be used to search Startpage or DDG (or almost any other search engine) from the commandline without sending any unecessary data, like unecessary headers, cookies or hidden form variables. The best part is by not using the "modern" browser to send the search, one can easily automate editing the results page before viewing it in a browser, discarding all the cruft. I like to just return the URLs. (I notice that Startpage also (a) supports HTTP/1.1 pipelining, e.g., multiple page requests over a single TCP connection; Google does not and (b) allows bans to be overcome by solving an easily read captcha and this seems to prevent further bans; Google imposes automatic temporary bans that cannot be overcome by solving a captcha.)
The biggest problem I see with the major search engines and these minor search engines that repackage results from the major ones is that they are too often limiting the number of results returned. For example, Google limits to something like 200-300. In the early days of the web, search engines used to brag about how many pages were searched, and they proved their claim by how many results they returned. Today search engines want to localise and limit the results. Not to mention promoting their own websites. I also notice repeated searches where one is collecting the total results not simply the first page yield different results.
Not every query is a question and not every user is interested in an instantaneous "answer" or the most popular website. That type of quick searching certainly has its place but it is not "research" and will not lead users to learn much about what actually exists on the web, or how to think critically about the web's content. Some users may want to search for pages and then evaluate the pages themselves. Exploration and discovery. Those users are treated as "bots" in order to justify what can only be anti-competitive practices. The sad consequence of this "limiting" behaviour is to keep curious users from ever learning what actually exists on the web (versus what a "search engine" decides to promote, or demote).
I have been playing around with Common Crawl data and it seems woefully circumspect in its scope. A web index should be public information but these search engines sure as heck do not treat it as such.
But with or without a shell script, Startpage keeps your privacy. Our web app acts as a proxy between your endpoint and the rest of the web. We couldn't collect user data even if we wanted to. You deserve an explanation, and here's why:
Startpage is delighted that users are conscientious about our privacy practices, as they should be. Privacy-aware users are the kind of users we enjoy serving. Asking questions is always a good idea.
As we’ve stated, System1 is interested in Startpage’s anonymous contextual advertising revenue, not in our data. Mainly because we don’t store any.
Even if they wanted to change our privacy policies, it wouldn’t be possible. Our co-owners and Surfboard Holding BV still have authority in our company. Our infrastructure is all in the European Union, where the strict GDPR legislation applies and the US Cloud law doesn’t.
Maintaining user privacy is our reason for being. We thank you for your curiosity and your vigilance. We hope you continue to ask questions and enjoy Startpage. We are gladly answering all your questions.
Putting aside what happens if one allows Javascript and uses "modern" browsers, Startpage generally does not seem to require any more data from users than DDG. A small shell script can be used to search Startpage or DDG (or almost any other search engine) from the commandline without sending any unecessary data, like unecessary headers, cookies or hidden form variables. The best part is by not using the "modern" browser to send the search, one can easily automate editing the results page before viewing it in a browser, discarding all the cruft. I like to just return the URLs. (I notice that Startpage also (a) supports HTTP/1.1 pipelining, e.g., multiple page requests over a single TCP connection; Google does not and (b) allows bans to be overcome by solving an easily read captcha and this seems to prevent further bans; Google imposes automatic temporary bans that cannot be overcome by solving a captcha.)
The biggest problem I see with the major search engines and these minor search engines that repackage results from the major ones is that they are too often limiting the number of results returned. For example, Google limits to something like 200-300. In the early days of the web, search engines used to brag about how many pages were searched, and they proved their claim by how many results they returned. Today search engines want to localise and limit the results. Not to mention promoting their own websites. I also notice repeated searches where one is collecting the total results not simply the first page yield different results.
Not every query is a question and not every user is interested in an instantaneous "answer" or the most popular website. That type of quick searching certainly has its place but it is not "research" and will not lead users to learn much about what actually exists on the web, or how to think critically about the web's content. Some users may want to search for pages and then evaluate the pages themselves. Exploration and discovery. Those users are treated as "bots" in order to justify what can only be anti-competitive practices. The sad consequence of this "limiting" behaviour is to keep curious users from ever learning what actually exists on the web (versus what a "search engine" decides to promote, or demote).
I have been playing around with Common Crawl data and it seems woefully circumspect in its scope. A web index should be public information but these search engines sure as heck do not treat it as such.