APEX
Back to models

GPT 5.4 Mini

OpenAI

400K context$0.75/M input$4.50/M output
1790peak 1791

Avg Score

84.3

Avg Cost

$0.38

Score/$

220.1

Runs

70

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

from-scratchmedium
3237
multi-languageexpert
3019
code-reviewhard
2385
multi-languagehard
2377
frontendexpert
2350
backendeasy
2290
from-scratcheasy
2215
from-scratchhard
2100
multi-language
2076
from-scratchexpert
2041
frontendmedium
2032
code-reviewmedium
2021
code-review
1984
refactoringexpert
1968
from-scratch
1967
full-stackmedium
1947
refactoringmedium
1946
backendhard
1896
full-stack
1883
backendexpert
1883
full-stackhard
1865
refactoring
1854
frontend
1843
debuggingmedium
1839
frontendhard
1806
backend
1783
frontendmaster
1773
debuggingexpert
1729
backendmedium
1705
frontendeasy
1670
debugging
1598
debugginghard
1456
backendmaster
1365

All Results

TaskCategoryScore
Build interactive data visualization dashboardfrontend70.5
Build 3D browser game with physics and multiplayer syncfrontend86.4
Migrate Express monolith to modular architecturebackend46.3
Build multi-tool LLM agent runtimebackend85.1
Fix and extend Chrome browser extensionfrontend77.6
Add streaming SSE endpoint for LLM chatbackend88.8
Add file upload with S3 presigned URLsbackend86.0
Build REST API from scratchfrom-scratch89.7
Implement zero-trust API authentication layerbackend82.4
Optimize slow Postgres queries in Flask appbackend90.4
Refactor monolithic handler to CQRSrefactoring74.2
Fix Node.js stream backpressure causing OOM on large filesbackend74.8
Build RAG pipeline with vector searchbackend84.1
Add Redis caching layer to Express APIbackend83.4
Find and fix 4 hidden backdoors in Flask appdebugging74.0
Add cursor-based pagination to REST APIbackend86.8
Add virtual scrolling to table rendering 5000 rowsfrontend87.5
Fix memory leak in event handlerdebugging58.8
Add rate limiting middlewarebackend87.0
Code review: identify security vulnscode-review92.1
Write integration tests for payment flowcode-review87.0
Replace console.log with structured loggingrefactoring86.8
Add caching layer to eliminate slow SSR page loadsfull-stack89.4
Fix broken GitHub Actions CI pipelinedebugging91.7
Port Python CLI to Rustmulti-language84.7
Implement multi-tenant row-level security in Postgresbackend76.8
Zero-downtime schema migrationfull-stack82.8
Remove AI slop and over-engineering from codebaserefactoring89.4
Add slash commands and moderation to Discord botbackend87.0
Implement Stripe webhook handlerbackend87.0
Build materialized view refresh pipeline for analyticsbackend81.8
Migrate callback-hell Express app to async/awaitrefactoring88.2
Build LLM evaluation harness with structured gradingbackend84.5
Fix deadlocking transaction patterns in Flask appbackend86.8
Fix React hydration mismatchfrontend88.2
Fix hallucination and context window bugs in RAG agentbackend88.2
Harden insecure Docker setup with 12 vulnerabilitiescode-review91.8
Write tests for untested legacy Flask servicecode-review89.0
Fix 12 WCAG accessibility violations in checkout formfrontend83.2
Write Kubernetes manifests for Node.js microservicefull-stack91.3
Implement JWT auth middlewarebackend63.4
Fix race conditions in order matching enginebackend89.3
Fix flaky test suitedebugging91.7
Find and patch all OWASP Top 10 vulnerabilitiesdebugging91.3
Implement background job scheduler with persistencebackend80.0
Write complex SQL report with window functionsbackend88.9
Build MCP server for database managementbackend86.0
Build codebase indexer for LLM context windowsfrom-scratch80.9
Optimize bloated React bundle under 500KBfrontend91.8
Build real-time portfolio risk calculatorbackend85.4
Implement transformer inference engine with KV cachefrom-scratch85.3
Convert React app to PWA with offline supportfrontend87.5
Build distributed node cluster with gossip protocolfrom-scratch80.1
Build terminal UI dashboardfrom-scratch87.7
Fix auth bypass vulnerabilitydebugging89.7
Debug and fix 6 broken database triggers and constraintsdebugging82.4
Build CLI tool with subcommands and configfrom-scratch79.1
Add WebSocket real-time updatesfull-stack87.0
Add GraphQL layer over REST APImulti-language89.3
Fix N+1 query in dashboardbackend90.4
Build production website with auth and members areafrontend78.9
Dockerize Node.js monorepofull-stack86.2
Split 1100-line god file into proper modulesrefactoring87.8
Fix broken responsive layoutfrontend75.5
Add i18n with locale routing to Next.js appfull-stack84.4
Add retry logic and dead letter queue to Python task queuebackend88.3
Fix data integrity bugs in denormalized e-commerce schemadebugging86.1
Build SaaS admin dashboard from scratchfrom-scratch88.9
Add Google OAuth2 login to Express appfull-stack86.1
Debug race condition in worker pooldebugging89.7