Wikimedia Downloads - discu.eu

Hacker News

Download the Entire Wikimedia Database https://dumps.wikimedia.org/ 74 comments 6/3/2021

Reddit

[Python] How can I clean up Wikipedia's XML backup dump to create dictionaries of commonly used words for multiple languages? https://dumps.wikimedia.org/ 3 comments 11/10/2021 learnprogramming
Download the entire wikipedia (if you are so inclined) http://dumps.wikimedia.org/#wikipedia 3 comments 16/2/2007 reddit.com

Linking pages

What every software engineer should know about search https://scribe.rip/p/what-every-software-engineer-should-know-about-search-27d1df99f80d 132 comments
GitHub - martinblech/xmltodict: Python module that makes working with XML feel like you are working with JSON https://github.com/martinblech/xmltodict 51 comments
Transforming Wikipedia into an accurate cultural knowledge quiz | by Michael Baldwin | Medium https://medium.com/@mjbaldwin/transforming-wikipedia-into-an-accurate-cultural-knowledge-quiz-b0a0f74877c#hn 46 comments
Reinventing Enterprise Search – Amazon Kendra is Now Generally Available | AWS News Blog https://aws.amazon.com/blogs/aws/reinventing-enterprise-search-amazon-kendra-is-now-generally-available/ 17 comments
GitHub - pirate/wikipedia-mirror: 🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump https://github.com/pirate/wikipedia-mirror#how-to-self-host-a-mirror-of-wikipediaorgwith-nginx-kimix-or-mediawikixowa--docker 16 comments
Data Competition: Announcing the Wikipedia Participation Challenge – Diff http://blog.wikimedia.org/2011/06/28/data-competition-announcing-the-wikipedia-participation-challenge/ 9 comments
The ALDE booming on Wikipedia Italia | by Elif Lab | Medium https://medium.com/@eliflab/the-alde-booming-on-wikipedia-italia-76bf2271106e#.i1ih5gm62 7 comments
How to Extract and Analyze Data from Wikipedia - Mixnode https://www.mixnode.com/tutorials/how-to-extract-and-analyze-data-from-wikipedia 6 comments
Companies in Multilingual Wikipedia: Articles Quality and Important Sources of Information | SpringerLink https://link.springer.com/chapter/10.1007/978-3-031-29570-6_3 6 comments
GitHub - jon-edward/wiki_dump: A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors. https://github.com/jon-edward/wiki_dump 4 comments
Mapping a million new sister cities | Little Short Bulletins https://www.leebutterman.com/2020/01/11/mapping-a-million-new-sister-cities.html 1 comment
Iterating on how we do NFS at Wikimedia Cloud Services – [[WM:TECHBLOG]] https://techblog.wikimedia.org/2021/10/19/iterating-on-how-we-do-nfs-at-wikimedia-cloud-services/ 0 comments
GitHub - chiphuyen/lazynlp: Library to scrape and clean web pages to create massive datasets. https://github.com/chiphuyen/lazynlp 0 comments
A Fast WordPiece Tokenization System – Google AI Blog https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html 0 comments
MEMEX - Rendered static HTML [2021-08-13] https://memex.marginalia.nu/log/13-static-html.gmi 0 comments
GitHub - open-guides/og-search-engineering: Want to build or improve a search experience? Start here. https://github.com/open-guides/og-search-engineering 0 comments
Data Sources for Cool Data Science Projects: Part 1 - Guest Post - Ryan Swanstrom http://101.datascience.community/2014/10/17/data-sources-for-cool-data-science-projects-part-1-guest-post/ 0 comments
Explore wiki project data faster with mwsql – [[WM:TECHBLOG]] https://techblog.wikimedia.org/2022/06/13/explore-wiki-project-data-faster-with-mwsql/ 0 comments
Anki Scripting: Automate your flashcards https://www.juliensobczak.com/write/2016/12/26/anki-scripting.html 0 comments
Your own Wikidata Query Service, with no limits - addshore https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/ 0 comments

Related searches:

Search whole site: site:dumps.wikimedia.org

Search title: Wikimedia Downloads

See how to search.

Submit link to: