Hacker News
- Common Crawl https://commoncrawl.org/ 7 comments
- Common Crawl https://commoncrawl.org/ 61 comments
- Navigating the WARC file format http://commoncrawl.org/navigating-the-warc-file-format/ 2 comments
- Lexalytics Text Analysis Work with Common Crawl Data http://commoncrawl.org/lexalytics-text-analysis-work-with-common-crawl-data/ 2 comments
- 102TB of New Crawl Data Available http://commoncrawl.org/new-crawl-data-available/ 37 comments
- SwiftKey’s Head Data Scientist on the Value of Common Crawl’s Open Data [video] http://commoncrawl.org/startup-profile-swiftkeys-head-data-scientist-on-the-value-of-common-crawls-open-data/ 2 comments
- A Look Inside Our 210TB 2012 Web Corpus http://commoncrawl.org/a-look-inside-common-crawls-210tb-2012-web-corpus/ 36 comments
- Share code that uses new URL Search tool and win AWS credit http://commoncrawl.org/url-search-tool/ 16 comments
- Triv.io donates URL index to Common Crawl http://commoncrawl.org/common-crawl-url-index/ 16 comments
- Common Crawl announces Open Source Big Data code contest winners http://commoncrawl.org/announcing-the-winners-of-the-code-contest/ 9 comments
- Spend the 3 day weekend hacking big data - win $1000 cash + other prizes http://commoncrawl.org/common-crawl-code-contest-extended-through-the-holiday-weekend/ 4 comments
- Common Crawl code contest - with fresh crawl of 3.2 billion web pages http://commoncrawl.org/common-crawls-brand-spanking-new-video-and-first-ever-code-contest/ 5 comments
- Bored in grad school? Learn Hadoop http://commoncrawl.org/learn-hadoop-and-get-a-paper-published/ 21 comments
- Common Crawl http://commoncrawl.org/ 5 comments
- MapReduce for the Masses: Zero to Hadoop in 5 minutes with Common Crawl http://www.commoncrawl.org/mapreduce-for-the-masses/ 26 comments
- CommonCrawl: an open repository of web crawl data that is universally accessible http://www.commoncrawl.org/ 8 comments
Lobsters
- MapReduce for the Masses: Zero to Hadoop in Five Minutes with Common Crawl | CommonCrawl http://www.commoncrawl.org/mapreduce-for-the-masses/ 4 comments programming
Linking pages
- What Is ChatGPT Doing … and Why Does It Work?—Stephen Wolfram Writings https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ 989 comments
- Understanding ChatGPT - Atmosera https://www.atmosera.com/ai/understanding-chatgpt/ 232 comments
- psuter.net https://psuter.net/2019/07/07/z-index 136 comments
- What every software engineer should know about search https://scribe.rip/p/what-every-software-engineer-should-know-about-search-27d1df99f80d 132 comments
- A look at search engines with their own indexes - Seirdy https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/ 125 comments
- Microsoft unveils AI model that understands image content, solves visual puzzles | Ars Technica https://arstechnica.com/?p=1920920 102 comments
- Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog https://blog.burntsushi.net/transducers/ 93 comments
- OpenAI: Copy, steal, paste | Computerworld https://www.computerworld.com/article/3712540/openai-copy-steal-paste.html 79 comments
- A decoder-only foundation model for time-series forecasting – Google Research Blog https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html 78 comments
- Why You (Probably) Don't Need to Fine-tune an LLM - Tidepool by Aquarium https://www.tidepool.so/2023/08/17/why-you-probably-dont-need-to-fine-tune-an-llm/ 73 comments
- Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer – Google AI Blog https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html 66 comments
- GitHub - ashvardanian/StringZilla: Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖 https://github.com/ashvardanian/Stringzilla 57 comments
- Microsoft unveils AI model that understands image content, solves visual puzzles | Ars Technica https://arstechnica.com/information-technology/2023/03/microsoft-unveils-kosmos-1-an-ai-language-model-with-visual-perception-abilities/ 54 comments
- How to turn an ordinary gzip archive into a database | Artem Golubin https://rushter.com/blog/gzip-indexing/ 47 comments
- Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B | Max Woolf's Blog https://minimaxir.com/2021/06/gpt-j-6b/ 41 comments
- Bigger data; same laptop http://www.frankmcsherry.org/graph/scalability/cost/2015/02/04/COST2.html 38 comments
- Language-Agnostic BERT Sentence Embedding – Google AI Blog https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html 35 comments
- Minority Voices 'Filtered' Out of Google Natural Language Processing Models - Unite.AI https://www.unite.ai/minority-voices-filtered-out-of-google-natural-language-processing-models/ 34 comments
- GitHub - openvenues/libpostal: A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. https://github.com/openvenues/libpostal 32 comments
- Lookism in TikTok. Intro | by Enryu | Sep, 2022 | Medium https://medium.com/@enryu9000/lookism-in-tiktok-3def0f20cf78 31 comments