Hacker News
- Robots.txt meant for search engines don’t work well for web archives http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ 143 comments
Linking pages
- With the rise of AI, web crawlers are suddenly controversial - The Verge https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders 101 comments
- A Curious Case of Disregarded Robots.txt – mike.pub https://mike.pub/20170425-disregarded-robots-txt 10 comments
- Robots.txt is 25 years old â Martijn Koster's Pages https://www.greenhills.co.uk/posts/robotstxt-25/ 2 comments
- Common Crawl And Unlocking Web Archives For Research https://www.forbes.com/sites/kalevleetaru/2017/09/28/common-crawl-and-unlocking-web-archives-for-research/ 1 comment
- What is the Internet Archive doing with our books? | NWU https://nwu.org/what-is-the-internet-archive-doing-with-our-books/ 0 comments
- 2018-04-24: Why we need multiple web archives: the case of blog.reidreport.com https://ws-dl.blogspot.com/2018/04/2018-04-24-why-we-need-multiple-web.html 0 comments
- Internet Archive to ignore robots.txt directives | Boing Boing http://boingboing.net/2017/04/22/internet-archive-to-ignore-rob.html 0 comments
- What Celine Dion–Fan Dreams Say About the Early Internet - The Atlantic https://www.theatlantic.com/technology/archive/2020/01/celine-dreams-fan-site-geocities-internet-archive/604750/ 0 comments
- GitHub - buren/wayback_archiver: Ruby gem to send URLs to Wayback Machine https://github.com/buren/wayback_archiver 0 comments
- On Robots and Text – Pixel Envy https://pxlnv.com/blog/on-robots-and-text/ 0 comments
Related searches:
Search whole site: site:blog.archive.org
Search title: Robots.txt meant for search engines don’t work well for web archives - Internet Archive Blogs
See how to search.