APEX
Back to models

Gemini 2.5 Pro

OpenRouter

1049K context$1.25/M input$10.00/M output
1436peak 1437

Avg Score

68.4

Avg Cost

$0.27

Score/$

257.6

Runs

116

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

frontendhard
1979
from-scratchexpert
1884
debuggingmedium
1839
frontendeasy
1638
full-stackmedium
1582
backendeasy
1567
debugging
1536
backendmedium
1497
debuggingexpert
1493
from-scratchhard
1493
multi-languageexpert
1490
frontend
1489
debugginghard
1474
frontendexpert
1467
from-scratch
1464
full-stack
1443
backend
1443
backendexpert
1427
frontendmedium
1419
backendhard
1378
refactoringmedium
1367
refactoring
1315
full-stackhard
1276
multi-language
1253
from-scratchmedium
1235
code-reviewmedium
1182
code-review
1159
from-scratcheasy
1105
code-reviewhard
193
refactoringexpert
0
multi-languagehard
0

All Results

TaskCategoryScore
Build SaaS admin dashboard from scratchfrom-scratch50.1
Split 1100-line god file into proper modulesrefactoring62.7
Implement JWT auth middlewarebackend52.0
Convert React app to PWA with offline supportfrontend52.1
Add file upload with S3 presigned URLsbackend74.2
Implement multi-tenant row-level security in Postgresbackend66.7
Code review: identify security vulnscode-review73.0
Build terminal UI dashboardfrom-scratch59.3
Build production website with auth and members areafrontend65.9
Add retry logic and dead letter queue to Python task queuebackend70.5
Add slash commands and moderation to Discord botbackend78.9
Dockerize Node.js monorepofull-stack75.5
Fix 12 WCAG accessibility violations in checkout formfrontend86.5
Add virtual scrolling to table rendering 5000 rowsfrontend85.6
Implement zero-trust API authentication layerbackend68.8
Write integration tests for payment flowcode-review41.4
Fix React hydration mismatchfrontend38.5
Add WebSocket real-time updatesfull-stack58.7
Harden insecure Docker setup with 12 vulnerabilitiescode-review65.2
Build distributed node cluster with gossip protocolfrom-scratch58.0
Implement background job scheduler with persistencebackend40.2
Add caching layer to eliminate slow SSR page loadsfull-stack85.2
Write complex SQL report with window functionsbackend40.0
Fix hallucination and context window bugs in RAG agentbackend63.8
Fix N+1 query in dashboardbackend69.7
Zero-downtime schema migrationfull-stack48.1
Build real-time portfolio risk calculatorbackend58.6
Refactor monolithic handler to CQRSrefactoring28.3
Optimize bloated React bundle under 500KBfrontend79.7
Build CLI tool with subcommands and configfrom-scratch73.3
Implement transformer inference engine with KV cachefrom-scratch82.6
Fix broken GitHub Actions CI pipelinedebugging88.8
Add GraphQL layer over REST APImulti-language34.6
Implement Stripe webhook handlerbackend66.5
Find and patch all OWASP Top 10 vulnerabilitiesdebugging31.8
Replace console.log with structured loggingrefactoring60.8
Add streaming SSE endpoint for LLM chatbackend82.4
Fix race conditions in order matching enginebackend79.1
Build materialized view refresh pipeline for analyticsbackend72.8
Fix Node.js stream backpressure causing OOM on large filesbackend43.4
Build MCP server for database managementbackend82.4
Build codebase indexer for LLM context windowsfrom-scratch52.9
Fix flaky test suitedebugging93.0
Find and fix 4 hidden backdoors in Flask appdebugging90.9
Add i18n with locale routing to Next.js appfull-stack75.7
Add rate limiting middlewarebackend73.5
Debug and fix 6 broken database triggers and constraintsdebugging88.8
Write tests for untested legacy Flask servicecode-review33.4
Optimize slow Postgres queries in Flask appbackend86.3
Fix auth bypass vulnerabilitydebugging78.5
Debug race condition in worker pooldebugging88.0
Fix broken responsive layoutfrontend75.0
Build RAG pipeline with vector searchbackend37.8
Fix memory leak in event handlerdebugging34.3
Migrate callback-hell Express app to async/awaitrefactoring64.4
Build LLM evaluation harness with structured gradingbackend68.3
Add Redis caching layer to Express APIbackend50.5
Remove AI slop and over-engineering from codebaserefactoring75.3
Port Python CLI to Rustmulti-language52.2
Build REST API from scratchfrom-scratch70.2
Convert React app to PWA with offline supportfrontend66.8
Dockerize Node.js monorepofull-stack69.0
Implement multi-tenant row-level security in Postgresbackend56.5
Remove AI slop and over-engineering from codebaserefactoring84.5
Write Kubernetes manifests for Node.js microservicefull-stack82.3
Implement JWT auth middlewarebackend75.0
Harden insecure Docker setup with 12 vulnerabilitiescode-review78.3
Build codebase indexer for LLM context windowsfrom-scratch35.0
Add caching layer to eliminate slow SSR page loadsfull-stack88.1
Add streaming SSE endpoint for LLM chatbackend72.7
Fix broken responsive layoutfrontend68.8
Add i18n with locale routing to Next.js appfull-stack63.7
Split 1100-line god file into proper modulesrefactoring75.0
Optimize bloated React bundle under 500KBfrontend70.1
Replace console.log with structured loggingrefactoring40.9
Implement zero-trust API authentication layerbackend70.5
Find and patch all OWASP Top 10 vulnerabilitiesdebugging69.4
Add Redis caching layer to Express APIbackend62.7
Implement background job scheduler with persistencebackend73.2
Implement transformer inference engine with KV cachefrom-scratch84.0
Build MCP server for database managementbackend51.9
Build SaaS admin dashboard from scratchfrom-scratch68.0
Build real-time portfolio risk calculatorbackend53.7
Fix hallucination and context window bugs in RAG agentbackend74.1
Build production website with auth and members areafrontend54.3
Build LLM evaluation harness with structured gradingbackend58.0
Build CLI tool with subcommands and configfrom-scratch70.5
Fix race conditions in order matching enginebackend67.2
Build materialized view refresh pipeline for analyticsbackend69.9
Fix deadlocking transaction patterns in Flask appbackend65.5
Debug and fix 6 broken database triggers and constraintsdebugging81.8
Write complex SQL report with window functionsbackend72.1
Fix data integrity bugs in denormalized e-commerce schemadebugging82.2
Build RAG pipeline with vector searchbackend72.8
Find and fix 4 hidden backdoors in Flask appdebugging82.0
Write tests for untested legacy Flask servicecode-review60.5
Fix 12 WCAG accessibility violations in checkout formfrontend81.8
Optimize slow Postgres queries in Flask appbackend85.9
Add retry logic and dead letter queue to Python task queuebackend72.8
Add slash commands and moderation to Discord botbackend81.8
Fix Node.js stream backpressure causing OOM on large filesbackend63.9
Build distributed node cluster with gossip protocolfrom-scratch79.8
Write integration tests for payment flowcode-review68.5
Add GraphQL layer over REST APImulti-language73.0
Fix auth bypass vulnerabilitydebugging95.0
Add rate limiting middlewarebackend78.7
Zero-downtime schema migrationfull-stack76.5
Add cursor-based pagination to REST APIbackend90.0
Fix flaky test suitedebugging83.1
Fix N+1 query in dashboardbackend77.7
Refactor monolithic handler to CQRSrefactoring79.9
Fix memory leak in event handlerdebugging62.0
Fix React hydration mismatchfrontend76.4
Build terminal UI dashboardfrom-scratch73.4
Debug race condition in worker pooldebugging88.8
Build REST API from scratchfrom-scratch86.0