Hacker News
- [Python] How can I clean up Wikipedia's XML backup dump to create dictionaries of commonly used words for multiple languages? https://dumps.wikimedia.org/ 3 comments learnprogramming
- Download the entire wikipedia (if you are so inclined) http://dumps.wikimedia.org/#wikipedia 3 comments reddit.com
Linking pages
- What every software engineer should know about search https://scribe.rip/p/what-every-software-engineer-should-know-about-search-27d1df99f80d 132 comments
- GitHub - martinblech/xmltodict: Python module that makes working with XML feel like you are working with JSON https://github.com/martinblech/xmltodict 51 comments
- Transforming Wikipedia into an accurate cultural knowledge quiz | by Michael Baldwin | Medium https://medium.com/@mjbaldwin/transforming-wikipedia-into-an-accurate-cultural-knowledge-quiz-b0a0f74877c#hn 46 comments
- Reinventing Enterprise Search – Amazon Kendra is Now Generally Available | AWS News Blog https://aws.amazon.com/blogs/aws/reinventing-enterprise-search-amazon-kendra-is-now-generally-available/ 17 comments
- GitHub - pirate/wikipedia-mirror: 🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump https://github.com/pirate/wikipedia-mirror#how-to-self-host-a-mirror-of-wikipediaorgwith-nginx-kimix-or-mediawikixowa--docker 16 comments
- Data Competition: Announcing the Wikipedia Participation Challenge – Diff http://blog.wikimedia.org/2011/06/28/data-competition-announcing-the-wikipedia-participation-challenge/ 9 comments
- The ALDE booming on Wikipedia Italia | by Elif Lab | Medium https://medium.com/@eliflab/the-alde-booming-on-wikipedia-italia-76bf2271106e#.i1ih5gm62 7 comments
- How to Extract and Analyze Data from Wikipedia - Mixnode https://www.mixnode.com/tutorials/how-to-extract-and-analyze-data-from-wikipedia 6 comments
- Companies in Multilingual Wikipedia: Articles Quality and Important Sources of Information | SpringerLink https://link.springer.com/chapter/10.1007/978-3-031-29570-6_3 6 comments
- GitHub - jon-edward/wiki_dump: A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors. https://github.com/jon-edward/wiki_dump 4 comments
- Mapping a million new sister cities | Little Short Bulletins https://www.leebutterman.com/2020/01/11/mapping-a-million-new-sister-cities.html 1 comment
- Iterating on how we do NFS at Wikimedia Cloud Services – [[WM:TECHBLOG]] https://techblog.wikimedia.org/2021/10/19/iterating-on-how-we-do-nfs-at-wikimedia-cloud-services/ 0 comments
- GitHub - chiphuyen/lazynlp: Library to scrape and clean web pages to create massive datasets. https://github.com/chiphuyen/lazynlp 0 comments
- A Fast WordPiece Tokenization System – Google AI Blog https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html 0 comments
- MEMEX - Rendered static HTML [2021-08-13] https://memex.marginalia.nu/log/13-static-html.gmi 0 comments
- GitHub - open-guides/og-search-engineering: Want to build or improve a search experience? Start here. https://github.com/open-guides/og-search-engineering 0 comments
- Data Sources for Cool Data Science Projects: Part 1 - Guest Post - Ryan Swanstrom http://101.datascience.community/2014/10/17/data-sources-for-cool-data-science-projects-part-1-guest-post/ 0 comments
- Explore wiki project data faster with mwsql – [[WM:TECHBLOG]] https://techblog.wikimedia.org/2022/06/13/explore-wiki-project-data-faster-with-mwsql/ 0 comments
- Anki Scripting: Automate your flashcards https://www.juliensobczak.com/write/2016/12/26/anki-scripting.html 0 comments
- Your own Wikidata Query Service, with no limits - addshore https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/ 0 comments
Related searches:
Search whole site: site:dumps.wikimedia.org
Search title: Wikimedia Downloads
See how to search.