APEX
Back to models

GPT 5.1 Codex Mini

OpenAI

400K context$3.00/M input$15.00/M output
1624

Avg Score

76.6

Avg Cost

$0.60

Score/$

127.7

Runs

77

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languageexpert
3237
from-scratchmedium
2573
frontendexpert
2372
from-scratcheasy
2244
frontendhard
2200
backendeasy
2173
from-scratchexpert
2109
code-reviewhard
2103
from-scratchhard
2047
from-scratch
1936
multi-languagehard
1926
refactoringexpert
1860
multi-language
1858
debuggingmedium
1792
backendexpert
1714
frontendeasy
1714
code-review
1683
code-reviewmedium
1655
frontend
1647
backendhard
1641
backend
1615
full-stackmedium
1613
full-stack
1588
debuggingexpert
1582
frontendmedium
1580
full-stackhard
1570
debugging
1544
backendmedium
1536
refactoring
1427
debugginghard
1427
refactoringmedium
1359
frontendmaster
0

All Results

TaskCategoryScore
Migrate callback-hell Express app to async/awaitrefactoring62.3
Fix and extend Chrome browser extensionfrontend
Build interactive data visualization dashboardfrontend
Add WebSocket real-time updatesfull-stack63.1
Build 3D browser game with physics and multiplayer syncfrontend33.1
Fix Node.js stream backpressure causing OOM on large filesbackend47.0
Build multi-tool LLM agent runtimebackend
Implement JWT auth middlewarebackend
Migrate Express monolith to modular architecturebackend29.3
Migrate callback-hell Express app to async/awaitrefactoring70.2
Fix broken GitHub Actions CI pipelinedebugging92.2
Harden insecure Docker setup with 12 vulnerabilitiescode-review88.7
Split 1100-line god file into proper modulesrefactoring0.0
Write Kubernetes manifests for Node.js microservicefull-stack85.7
Replace console.log with structured loggingrefactoring55.1
Add file upload with S3 presigned URLsbackend49.4
Fix auth bypass vulnerabilitydebugging72.4
Code review: identify security vulnscode-review58.4
Add retry logic and dead letter queue to Python task queuebackend82.0
Add caching layer to eliminate slow SSR page loadsfull-stack82.5
Add WebSocket real-time updatesfull-stack84.8
Fix broken responsive layoutfrontend76.1
Add streaming SSE endpoint for LLM chatbackend48.7
Harden insecure Docker setup with 12 vulnerabilitiescode-review88.0
Build codebase indexer for LLM context windowsfrom-scratch83.6
Code review: identify security vulnscode-review78.9
Dockerize Node.js monorepofull-stack79.7
Split 1100-line god file into proper modulesrefactoring84.6
Build RAG pipeline with vector searchbackend85.0
Add i18n with locale routing to Next.js appfull-stack81.5
Remove AI slop and over-engineering from codebaserefactoring90.9
Write Kubernetes manifests for Node.js microservicefull-stack91.0
Optimize bloated React bundle under 500KBfrontend79.4
Implement multi-tenant row-level security in Postgresbackend82.5
Implement zero-trust API authentication layerbackend77.7
Fix broken GitHub Actions CI pipelinedebugging88.3
Find and patch all OWASP Top 10 vulnerabilitiesdebugging77.8
Build distributed node cluster with gossip protocolfrom-scratch72.5
Convert React app to PWA with offline supportfrontend79.7
Find and fix 4 hidden backdoors in Flask appdebugging87.0
Build production website with auth and members areafrontend79.3
Build SaaS admin dashboard from scratchfrom-scratch86.4
Build MCP server for database managementbackend69.2
Implement transformer inference engine with KV cachefrom-scratch86.6
Implement background job scheduler with persistencebackend82.0
Build CLI tool with subcommands and configfrom-scratch82.9
Build LLM evaluation harness with structured gradingbackend61.5
Build real-time portfolio risk calculatorbackend72.2
Add Redis caching layer to Express APIbackend80.0
Fix race conditions in order matching enginebackend90.9
Debug and fix 6 broken database triggers and constraintsdebugging79.8
Fix data integrity bugs in denormalized e-commerce schemadebugging80.8
Write complex SQL report with window functionsbackend79.5
Build materialized view refresh pipeline for analyticsbackend75.3
Fix deadlocking transaction patterns in Flask appbackend79.8
Fix hallucination and context window bugs in RAG agentbackend77.3
Write tests for untested legacy Flask servicecode-review92.5
Add Google OAuth2 login to Express appfull-stack82.0
Optimize slow Postgres queries in Flask appbackend81.0
Add slash commands and moderation to Discord botbackend83.8
Add virtual scrolling to table rendering 5000 rowsfrontend81.8
Fix 12 WCAG accessibility violations in checkout formfrontend90.4
Add GraphQL layer over REST APImulti-language80.5
Write integration tests for payment flowcode-review81.0
Zero-downtime schema migrationfull-stack76.5
Add rate limiting middlewarebackend83.5
Implement Stripe webhook handlerbackend80.5
Port Python CLI to Rustmulti-language90.3
Fix flaky test suitedebugging88.4
Add cursor-based pagination to REST APIbackend87.9
Fix N+1 query in dashboardbackend82.9
Fix memory leak in event handlerdebugging85.8
Refactor monolithic handler to CQRSrefactoring72.7
Debug race condition in worker pooldebugging76.6
Fix React hydration mismatchfrontend69.2
Build terminal UI dashboardfrom-scratch82.2
Build REST API from scratchfrom-scratch90.5