Common Crawl

Assess

Tools

A large-scale public web crawl commonly used to train language models.

Why it's here

Placed in Assess: 1 article(s) of evidence from 1 source(s), led by model releases, with 1 in the last 30 days. Confidence 24%. Low accumulated evidence, so it defaults conservatively pending more signal.

Evidence (1)

7Simon Willison·6/2/2026model_release
Microsoft launches MAI-Thinking-1 and MAI-Code-1-Flash
Microsoft announced two new text LLMs: MAI-Thinking-1, a reasoning model, and MAI-Code-1-Flash, a code-focused model for GitHub Copilot and Visual Studio Code. The announcement emphasizes low active parameter counts and claims enterprise-grade, licensed training data, though the accompanying technical paper shows MAI-Thinking-1 was trained on large-scale web and Common Crawl data after filtering.