APEX
Back to models

GPT 5.5

OpenAI

400K context$5.00/M input$30.00/M output
1866peak 1867

Avg Score

86.8

Avg Cost

$1.61

Score/$

53.7

Runs

70

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languageexpert
3114
code-reviewhard
2756
frontendeasy
2671
from-scratchmedium
2639
multi-languagehard
2415
frontendexpert
2395
backendeasy
2290
from-scratchhard
2271
frontendhard
2255
from-scratcheasy
2215
multi-language
2121
code-review
2116
code-reviewmedium
2109
from-scratchexpert
2075
from-scratch
2023
backendexpert
2009
frontendmedium
2006
refactoringmedium
1986
backendhard
1960
full-stackmedium
1958
frontend
1923
full-stackhard
1922
full-stack
1921
backend
1877
refactoring
1850
debuggingexpert
1815
backendmedium
1791
debuggingmedium
1775
backendmaster
1762
refactoringexpert
1752
debugging
1695
debugginghard
1633
frontendmaster
1588

All Results

TaskCategoryScore
Build multi-tool LLM agent runtimebackend86.0
Build interactive data visualization dashboardfrontend65.2
Migrate Express monolith to modular architecturebackend83.0
Fix and extend Chrome browser extensionfrontend67.0
Build 3D browser game with physics and multiplayer syncfrontend87.1
Convert React app to PWA with offline supportfrontend87.0
Add virtual scrolling to table rendering 5000 rowsfrontend89.7
Build REST API from scratchfrom-scratch89.7
Debug and fix 6 broken database triggers and constraintsdebugging90.4
Remove AI slop and over-engineering from codebaserefactoring91.8
Add streaming SSE endpoint for LLM chatbackend87.5
Fix 12 WCAG accessibility violations in checkout formfrontend91.3
Find and patch all OWASP Top 10 vulnerabilitiesdebugging91.3
Add slash commands and moderation to Discord botbackend86.0
Split 1100-line god file into proper modulesrefactoring88.5
Build terminal UI dashboardfrom-scratch82.5
Refactor monolithic handler to CQRSrefactoring69.3
Build CLI tool with subcommands and configfrom-scratch87.2
Find and fix 4 hidden backdoors in Flask appdebugging77.3
Debug race condition in worker pooldebugging93.0
Fix React hydration mismatchfrontend87.0
Replace console.log with structured loggingrefactoring91.8
Add caching layer to eliminate slow SSR page loadsfull-stack91.8
Add file upload with S3 presigned URLsbackend85.3
Build MCP server for database managementbackend90.9
Add i18n with locale routing to Next.js appfull-stack82.4
Add retry logic and dead letter queue to Python task queuebackend88.9
Build RAG pipeline with vector searchbackend83.7
Fix race conditions in order matching enginebackend90.5
Fix broken GitHub Actions CI pipelinedebugging89.5
Add Redis caching layer to Express APIbackend84.1
Build production website with auth and members areafrontend80.7
Build distributed node cluster with gossip protocolfrom-scratch87.7
Build materialized view refresh pipeline for analyticsbackend86.0
Implement transformer inference engine with KV cachefrom-scratch85.8
Build codebase indexer for LLM context windowsfrom-scratch81.8
Implement zero-trust API authentication layerbackend90.7
Implement JWT auth middlewarebackend64.0
Build real-time portfolio risk calculatorbackend84.0
Add Google OAuth2 login to Express appfull-stack87.2
Write Kubernetes manifests for Node.js microservicefull-stack91.8
Fix data integrity bugs in denormalized e-commerce schemadebugging85.6
Implement Stripe webhook handlerbackend88.0
Add cursor-based pagination to REST APIbackend87.7
Optimize slow Postgres queries in Flask appbackend89.8
Fix hallucination and context window bugs in RAG agentbackend85.8
Fix deadlocking transaction patterns in Flask appbackend86.0
Build SaaS admin dashboard from scratchfrom-scratch86.5
Fix flaky test suitedebugging89.5
Fix N+1 query in dashboardbackend91.7
Harden insecure Docker setup with 12 vulnerabilitiescode-review94.6
Add rate limiting middlewarebackend86.8
Write integration tests for payment flowcode-review93.2
Build LLM evaluation harness with structured gradingbackend87.2
Write complex SQL report with window functionsbackend88.9
Fix Node.js stream backpressure causing OOM on large filesbackend94.1
Fix broken responsive layoutfrontend91.7
Zero-downtime schema migrationfull-stack86.4
Fix auth bypass vulnerabilitydebugging93.7
Optimize bloated React bundle under 500KBfrontend91.7
Port Python CLI to Rustmulti-language85.8
Write tests for untested legacy Flask servicecode-review89.3
Fix memory leak in event handlerdebugging91.7
Add WebSocket real-time updatesfull-stack89.3
Add GraphQL layer over REST APImulti-language89.7
Code review: identify security vulnscode-review91.8
Implement multi-tenant row-level security in Postgresbackend85.7
Implement background job scheduler with persistencebackend84.1
Dockerize Node.js monorepofull-stack83.8
Migrate callback-hell Express app to async/awaitrefactoring87.2