Anthropic runs 200-attempt attack campaigns. OpenAI reports single-attempt metrics. A 16-dimension comparison reveals what ...
Evalite is a TypeScript-native eval runner designed for AI applications, enabling developers to create reproducible evals ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results