IT-Bench

Hold

Tools

An evaluation benchmark for assessing enterprise agent performance on IT tasks.

Why it's here

Placed in Hold: 1 article(s) of evidence from 1 source(s), led by research-stage coverage, with 0 in the last 30 days. Confidence 24%. Low accumulated evidence, so it defaults conservatively pending more signal.

Evidence (1)

6Hugging Face Blog·2/18/2026research
IBM and UC Berkeley Study Why Enterprise Agents Fail
IBM and UC Berkeley present IT-Bench and MAST to evaluate enterprise AI agents and diagnose where they break down in realistic business workflows. The work focuses on identifying failure modes in agent performance rather than introducing a new consumer product.