Hacker News
- Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples https://arxiv.org/abs/1903.03096 2 comments
- Georgia Tech, Meta create open dataset to advance solutions for carbon capture https://news.gatech.edu/news/2024/05/02/georgia-tech-and-meta-create-massive-open-dataset-advance-ai-solutions-carbon 174 comments
- BeeTrove – OpenAI GPTs Open-Source Dataset https://beetrove.com/ 12 comments
- The Splitgraph Data Delivery Network – query over 40k public datasets https://www.splitgraph.com/blog/data-delivery-network-launch 95 comments
- Audio Datasets for Machine Learning https://lionbridge.ai/datasets/12-best-audio-datasets-for-machine-learning/ 3 comments
- Covid-19 Open Research Dataset https://pages.semanticscholar.org/coronavirus-research 47 comments
- Tips for making an Object Tracking Dataset https://sci-hub.se/https:/ieeexplore.ieee.org/document/7001050 9 comments computervision
- Build dataflows with larger-than-memory datasets. Use two Python open source libraries in this hands-on guide to create Big Data pipelines https://medium.com/@marine.gosselin/big-data-models-vs-computer-memory-b345814ece9f 8 comments programming
- Help needed regarding data privacy! I have this dataset, how can I identify the k anonymity for this? If i wish to make it 3-anonymous and 2-diverse, what can i generalize and suppress? https://drive.google.com/file/d/1Nn1kKscKfUyBxsX6gbj9kUbEOTPp6r3-/view?usp=sharing 2 comments privacy
- [P] ChatGPT Survey: Performance on NLP datasets http://opensamizdat.com/posts/chatgpt_survey/ 4 comments machinelearning
- Collecting datasets for CV: any wishlists? https://www.tictag.io 14 comments computervision
- Having troubles with loading a custom dataset into yoloV5 https://learnopencv.com/custom-object-detection-training-using-yolov5/?ck_subscriber_id=1373562521#Custom-Object-Detection-Training-using-YOLOv5 3 comments learnmachinelearning
- Air pollution causes nearly 2 million asthma cases, and a similar number of excess deaths, per year. The researchers examined an existing dataset that looked at nitrogen dioxide concentrations in 58 countries during 2010–12. https://cosmosmagazine.com/earth/climate/air-pollution-asthma-excess-deaths/ 9 comments science
- [R] AnimeCeleb: Large-Scale Animation CelebFaces Dataset via Controllable 3D Synthetic Models https://arxiv.org/abs/2111.07640 2 comments machinelearning
- DeepMind open-sources protein structure dataset generated by AlphaFold 2 https://venturebeat.com/2021/07/22/deepmind-open-sources-protein-structure-dataset-generated-by-alphafold-2/ 3 comments science
- Handling large datasets using pandas if you have memory constraint. https://www.kaggle.com/c/avazu-ctr-prediction/overview 6 comments learnmachinelearning
- What colors should I prepare for use for graphs with unknown datasets? https://www.reddit.com/r/web_design/comments/n4f6gb/what_colors_should_i_prepare_for_use_for_graphs/ 7 comments web_design
- survivoR R package: "a collection of datasets detailing events and the cast across all 40 seasons of the US Survivor, including castaway information, vote history, immunity and reward challenge winners, jury votes, and viewers" http://gradientdescending.com/survivor-now-on-cran/ 4 comments rstats
- Rush Limbaugh downplaying hurricane Irma may have decreased evacuations. Phone-location dataset shows correlation with election results. https://arstechnica.com/science/2020/09/rush-limbaugh-downplaying-hurricane-irma-may-have-decreased-evacuations/ 5 comments politics
- Rust Notebooks: Loading Datasets from CSV into NDArray https://shahinrostami.com/posts/programming/rust-notebooks/loading-datasets-from-csv-into-ndarray/ 3 comments rust
- Google just published 25 million free datasets https://towardsdatascience.com/google-just-published-25-million-free-datasets-d83940e24284 89 comments technews
- Not sure if this is the correct subreddit, but this is a dataset of "Good" and "Evil" chat messages in a video game and I want someone to train a classifer on it https://drive.google.com/file/d/1bxAQrt-Nomj4npg7GLLtA3FTs374kK6l/view?usp=sharing 4 comments learnmachinelearning
- Training a YOLOv3 Object Detection Model with a Custom Dataset https://blog.roboflow.ai/training-a-yolov3-object-detection-model-with-a-custom-dataset/ 3 comments computervision
- Twelve Million Phones, One Dataset, Zero Privacy https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html 28 comments firefox
- 70+ Machine Learning Datasets - Gain real-world experience with Data Science projects! https://data-flair.training/blogs/machine-learning-datasets/ 3 comments programming
- What Does ‘Broken’ Sound Like? First-Ever Audio Dataset of Malfunctioning Industrial Machines https://medium.com/syncedreview/what-does-broken-sound-like-first-ever-audio-dataset-of-malfunctioning-industrial-machines-b4f8f6d81dd7 3 comments artificial
- Github Releases Dataset Of Six Million Methods From Open Source Projects For CodeSearchNet Challenge https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/?utm_campaign=1569513857&utm_medium=social&utm_source=twitter&utm_content=1569513857 13 comments programming
- 300+ Free Datasets for Machine Leaning divided into 10 Use Cases https://lionbridge.ai/business-resources/open-datasets-for-machine-learning/ 11 comments datascience
- Training a chatbot with dialogues dataset https://www.reddit.com/r/artificial/comments/an337w/training_a_chatbot_with_dialogues_dataset/ 10 comments artificial
- I Compiled a Dataset of ~2.5 Million /r/WallStreetBets Comments for an Algorithmic Trading Strategy Based on Market Volatility [X-Post /r/AlgoTrading] https://www.kaggle.com/theriley106/wallstreetbetscomments 29 comments wallstreetbets
- Linux Game Compatibility Checker update - new datasets, grouping & sorting & filtering options, more statistics http://lgc.lysioneer.nl/ 5 comments linux_gaming
- MNIST Tutorial with Tensorflow Dataset API http://cjalmeida.net/post/tensorflow-mnist/ 3 comments programming
- Unstable MLP's accuracy on train dataset https://datascience.stackexchange.com/questions/19763/unstable-mlps-accuracy-on-train-dataset 3 comments learnmachinelearning
- Scraping a Craft Beer dataset http://www.jeannicholashould.com/python-web-scraping-tutorial-for-craft-beers.html 7 comments datascience
- Where to find a large datasets of indian songs lyrics combine into one text file? https://www.reddit.com/r/india/comments/5kis4r/where_to_find_a_large_datasets_of_indian_songs/ 4 comments india
- Advice on checksums for very large datasets https://www.reddit.com/r/crypto/comments/5j5gxr/advice_on_checksums_for_very_large_datasets/ 15 comments crypto
- At how large of a dataset should I be using something like haystack/elasticsearch rather than using the built in ORM? https://www.reddit.com/r/django/comments/5b5sgd/at_how_large_of_a_dataset_should_i_be_using/ 3 comments django
- The UK has been using massive datasets to spy on innocent civilians for years http://uk.businessinsider.com/the-uk-has-been-using-datasets-to-spy-on-innocent-civilians-2016-4 18 comments worldnews
- Joining big public datasets: How much attention does a Hacker News frontpage post drive to a GitHub project? https://www.reddit.com/r/bigquery/comments/3qpyor/joining_hacker_news_and_github_how_much_attention/ 3 comments programming
- Benchmarking BDB, CDB and Tokyo Cabinet on large datasets http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ 18 comments programming