Hacker News
- Search for Insights from Public Datasets https://demo.milou.app/ 0 comments
- Fine tune LLAMA3 on million scale dataset in consumer GPU using QLora, DeepSpeed https://medium.com/@sumandas0/fine-tune-llama3-on-million-scale-dataset-in-consumer-gpu-using-qlora-deepspeed-3ae8ad75299a 25 comments
- US brokers selling overlapping datasets with people as “actively pregnant“ https://gizmodo.com/data-brokers-selling-pregnancy-roe-v-wade-abortion-1849148426 4 comments
- Dataset and Model for “I built a system to take photos of planes over my house” https://universe.roboflow.com/skybot-cam/overhead-plane-detector 27 comments
- Turning petabytes of raw video data into a high-quality ML dataset https://medium.com/@mvoodarla/curating-a-dataset-from-raw-images-and-videos-c8b962eca9ba 2 comments
- The Pandora Papers – leaked dataset of 11.9M financial documents https://twitter.com/ICIJorg/status/1444474822797545476 5 comments
- Hobbling computer vision datasets against unauthorized use https://www.unite.ai/hobbling-computer-vision-datasets-against-unauthorized-use/ 4 comments
- Ethical issues in research using datasets of illicit origin https://www.lightbluetouchpaper.org/2017/11/07/ethical-issues-in-research-using-datasets-of-illicit-origin/ 19 comments
- Build dataflows with larger-than-memory datasets. Use two Python open source libraries in this hands-on guide to create Big Data pipelines https://medium.com/@marine.gosselin/big-data-models-vs-computer-memory-b345814ece9f 8 comments programming
- Help needed regarding data privacy! I have this dataset, how can I identify the k anonymity for this? If i wish to make it 3-anonymous and 2-diverse, what can i generalize and suppress? https://drive.google.com/file/d/1Nn1kKscKfUyBxsX6gbj9kUbEOTPp6r3-/view?usp=sharing 2 comments privacy
- Collecting datasets for CV: any wishlists? https://www.tictag.io 14 comments computervision
- Having troubles with loading a custom dataset into yoloV5 https://learnopencv.com/custom-object-detection-training-using-yolov5/?ck_subscriber_id=1373562521#Custom-Object-Detection-Training-using-YOLOv5 3 comments learnmachinelearning
- Air pollution causes nearly 2 million asthma cases, and a similar number of excess deaths, per year. The researchers examined an existing dataset that looked at nitrogen dioxide concentrations in 58 countries during 2010–12. https://cosmosmagazine.com/earth/climate/air-pollution-asthma-excess-deaths/ 9 comments science
- [R] AnimeCeleb: Large-Scale Animation CelebFaces Dataset via Controllable 3D Synthetic Models https://arxiv.org/abs/2111.07640 2 comments machinelearning
- DeepMind open-sources protein structure dataset generated by AlphaFold 2 https://venturebeat.com/2021/07/22/deepmind-open-sources-protein-structure-dataset-generated-by-alphafold-2/ 3 comments science
- Handling large datasets using pandas if you have memory constraint. https://www.kaggle.com/c/avazu-ctr-prediction/overview 6 comments learnmachinelearning
- What colors should I prepare for use for graphs with unknown datasets? https://www.reddit.com/r/web_design/comments/n4f6gb/what_colors_should_i_prepare_for_use_for_graphs/ 7 comments web_design
- survivoR R package: "a collection of datasets detailing events and the cast across all 40 seasons of the US Survivor, including castaway information, vote history, immunity and reward challenge winners, jury votes, and viewers" http://gradientdescending.com/survivor-now-on-cran/ 4 comments rstats
- Rush Limbaugh downplaying hurricane Irma may have decreased evacuations. Phone-location dataset shows correlation with election results. https://arstechnica.com/science/2020/09/rush-limbaugh-downplaying-hurricane-irma-may-have-decreased-evacuations/ 5 comments politics
- Rust Notebooks: Loading Datasets from CSV into NDArray https://shahinrostami.com/posts/programming/rust-notebooks/loading-datasets-from-csv-into-ndarray/ 3 comments rust
- Google just published 25 million free datasets https://towardsdatascience.com/google-just-published-25-million-free-datasets-d83940e24284 89 comments technews
- Not sure if this is the correct subreddit, but this is a dataset of "Good" and "Evil" chat messages in a video game and I want someone to train a classifer on it https://drive.google.com/file/d/1bxAQrt-Nomj4npg7GLLtA3FTs374kK6l/view?usp=sharing 4 comments learnmachinelearning
- Training a YOLOv3 Object Detection Model with a Custom Dataset https://blog.roboflow.ai/training-a-yolov3-object-detection-model-with-a-custom-dataset/ 3 comments computervision
- Twelve Million Phones, One Dataset, Zero Privacy https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html 28 comments firefox
- 70+ Machine Learning Datasets - Gain real-world experience with Data Science projects! https://data-flair.training/blogs/machine-learning-datasets/ 3 comments programming
- What Does ‘Broken’ Sound Like? First-Ever Audio Dataset of Malfunctioning Industrial Machines https://medium.com/syncedreview/what-does-broken-sound-like-first-ever-audio-dataset-of-malfunctioning-industrial-machines-b4f8f6d81dd7 3 comments artificial
- Github Releases Dataset Of Six Million Methods From Open Source Projects For CodeSearchNet Challenge https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/?utm_campaign=1569513857&utm_medium=social&utm_source=twitter&utm_content=1569513857 13 comments programming
- 300+ Free Datasets for Machine Leaning divided into 10 Use Cases https://lionbridge.ai/business-resources/open-datasets-for-machine-learning/ 11 comments datascience
- Training a chatbot with dialogues dataset https://www.reddit.com/r/artificial/comments/an337w/training_a_chatbot_with_dialogues_dataset/ 10 comments artificial
- I Compiled a Dataset of ~2.5 Million /r/WallStreetBets Comments for an Algorithmic Trading Strategy Based on Market Volatility [X-Post /r/AlgoTrading] https://www.kaggle.com/theriley106/wallstreetbetscomments 29 comments wallstreetbets
- Linux Game Compatibility Checker update - new datasets, grouping & sorting & filtering options, more statistics http://lgc.lysioneer.nl/ 5 comments linux_gaming
- MNIST Tutorial with Tensorflow Dataset API http://cjalmeida.net/post/tensorflow-mnist/ 3 comments programming
- Unstable MLP's accuracy on train dataset https://datascience.stackexchange.com/questions/19763/unstable-mlps-accuracy-on-train-dataset 3 comments learnmachinelearning
- Scraping a Craft Beer dataset http://www.jeannicholashould.com/python-web-scraping-tutorial-for-craft-beers.html 7 comments datascience
- Where to find a large datasets of indian songs lyrics combine into one text file? https://www.reddit.com/r/india/comments/5kis4r/where_to_find_a_large_datasets_of_indian_songs/ 4 comments india
- Advice on checksums for very large datasets https://www.reddit.com/r/crypto/comments/5j5gxr/advice_on_checksums_for_very_large_datasets/ 15 comments crypto
- At how large of a dataset should I be using something like haystack/elasticsearch rather than using the built in ORM? https://www.reddit.com/r/django/comments/5b5sgd/at_how_large_of_a_dataset_should_i_be_using/ 3 comments django
- The UK has been using massive datasets to spy on innocent civilians for years http://uk.businessinsider.com/the-uk-has-been-using-datasets-to-spy-on-innocent-civilians-2016-4 18 comments worldnews
- Joining big public datasets: How much attention does a Hacker News frontpage post drive to a GitHub project? https://www.reddit.com/r/bigquery/comments/3qpyor/joining_hacker_news_and_github_how_much_attention/ 3 comments programming
- Benchmarking BDB, CDB and Tokyo Cabinet on large datasets http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ 18 comments programming