Dataset Training and Testing

World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

Credit: Image generated by VentureBeat with Gemini 2.5 Flash (nano banana) AI models are only as good as the data they're trained on. That data generally needs to be labeled, curated and organized ...

Medical Xpress

Test of 'poisoned dataset' shows vulnerability of LLMs to medical misinformation

For their study published in the journal Nature Medicine, the group generated thousands of articles containing misinformation and inserted them into an AI training dataset and conducted general LLM ...

VentureBeat

Getty Images drops ‘cleanest’ visual dataset for training foundation models

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Getty Images is going all in to establish itself as a trusted data ...

Opinion

12dOpinion

Show inaccessible results

World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

Test of 'poisoned dataset' shows vulnerability of LLMs to medical misinformation

Getty Images drops ‘cleanest’ visual dataset for training foundation models

BBC-Pair Dataset: A dataset for training and evaluating detection of ai-generated media

EleutherAI releases massive AI training dataset of licensed and open domain text

Nearly 12,000 API keys and passwords found in AI training dataset

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft