> I used to work at a startup that aggregated apartment listings. Long gone now, we couldn't compete with Zillow or Apartments.com. But what we were doing is just aggregating all of the rental websites we could possibly scrape into one interface.
[...]
> We couldn't have the most up to date data, that was determined by how fast we could go back to scrape a listing. And often times we were knee deep in a battle to avoid being blocked by these companies. Often times, after we had exhausted our proxies, the only thing left to use was Tor.
The flip side is if you are looking at listing from small property management companies, they are probably low on resources. The websites aren't very well optimized to serve thousands of concurrent users. Every request to search inventory hits SQL Server which is probably one small box with no automatic failover. I don't like it but unless we somehow help these people better optimize their website (how?),they will continue using heavy handed tactics like blocking scrapers.
[...]
> We couldn't have the most up to date data, that was determined by how fast we could go back to scrape a listing. And often times we were knee deep in a battle to avoid being blocked by these companies. Often times, after we had exhausted our proxies, the only thing left to use was Tor.
The flip side is if you are looking at listing from small property management companies, they are probably low on resources. The websites aren't very well optimized to serve thousands of concurrent users. Every request to search inventory hits SQL Server which is probably one small box with no automatic failover. I don't like it but unless we somehow help these people better optimize their website (how?),they will continue using heavy handed tactics like blocking scrapers.