Data quality engineering
AssessTechniques
Practices for filtering, validating, and improving data before it is used in model training.
Why it's here
Placed in Assess: 1 article(s) of evidence from 1 source(s), led by research-stage coverage, with 1 in the last 30 days. Confidence 24%. Low accumulated evidence, so it defaults conservatively pending more signal.
Evidence (1)
- 6The New Stack·6/11/2026researchSonarSweep aims to clean AI training data and reduce code bugs
The article explains how Sonar’s SonarSweep technology is designed to filter and improve code used to train large language models. It argues that public code repositories often contain insecure, outdated, or low-quality examples that can cause models to generate bugs, vulnerabilities, and maintainability issues in production code.