APEX TESTING_
Find out which AI coding models actually deliver and which are just hype.
by HauhauCS
Models Tested
68
Tasks
70
Total Runs
6333
Avg Score
69.1
Capital Spent
$6050.79
Top Models
View full leaderboard →| # | Model | ELO |
|---|---|---|
| 1 | Claude Opus 4.7 | 1904 |
| 2 | GPT 5.5 | 1867 |
| 3 | GPT 5.4 Mini | 1790 |
| 4 | Claude Opus 4.6 | 1781 |
| 5 | Claude Sonnet 4.6 | 1770 |
Recent Activity
Qwen3.6 27b [Q4_K_XL]→Find and patch all OWASP Top 10 vulnerabilities
2.3s
Qwen3.6 27b [Q4_K_XL]→Build multi-tool LLM agent runtime
3.5s
Qwen3.6 27b [Q4_K_XL]→Implement multi-tenant row-level security in Postgres
1.2s
Qwen3.6 27b [Q4_K_XL]→Debug race condition in worker pool
2.3s
Qwen3.6 27b [Q4_K_XL]→Split 1100-line god file into proper modules
1.2s