APEX
Back to models

Gemini 3 Flash Preview

Google

1049K context$0.50/M input$3.00/M output
1513peak 1514

Avg Score

72.4

Avg Cost

$0.02

Score/$

4150.6

Runs

46

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

refactoringexpert
2404
backendeasy
2173
from-scratchexpert
2169
from-scratcheasy
2084
frontendhard
1858
debuggingmedium
1819
code-reviewmedium
1670
refactoring
1644
from-scratch
1629
from-scratchhard
1625
full-stackhard
1614
full-stack
1595
backendmedium
1551
multi-languagehard
1549
full-stackmedium
1544
refactoringmedium
1533
backend
1502
code-review
1490
backendhard
1457
frontendmedium
1457
multi-language
1451
frontend
1449
debugging
1409
multi-languageexpert
1397
backendexpert
1380
debugginghard
1335
debuggingexpert
1288
from-scratchmedium
1235
frontendexpert
926
code-reviewhard
0

All Results

TaskCategoryScore
Build codebase indexer for LLM context windowsfrom-scratch60.7
Optimize bloated React bundle under 500KBfrontend72.0
Migrate callback-hell Express app to async/awaitrefactoring72.7
Add WebSocket real-time updatesfull-stack81.0
Port Python CLI to Rustmulti-language51.3
Harden insecure Docker setup with 12 vulnerabilitiescode-review88.8
Write Kubernetes manifests for Node.js microservicefull-stack84.5
Split 1100-line god file into proper modulesrefactoring68.5
Implement transformer inference engine with KV cachefrom-scratch87.7
Build MCP server for database managementbackend70.3
Build SaaS admin dashboard from scratchfrom-scratch75.0
Implement background job scheduler with persistencebackend65.5
Build production website with auth and members areafrontend56.8
Build CLI tool with subcommands and configfrom-scratch53.3
Fix hallucination and context window bugs in RAG agentbackend55.5
Write complex SQL report with window functionsbackend71.9
Find and fix 4 hidden backdoors in Flask appdebugging71.5
Fix 12 WCAG accessibility violations in checkout formfrontend84.8
Fix race conditions in order matching enginebackend78.1
Build real-time portfolio risk calculatorbackend62.4
Build LLM evaluation harness with structured gradingbackend72.0
Fix deadlocking transaction patterns in Flask appbackend44.3
Fix data integrity bugs in denormalized e-commerce schemadebugging78.4
Debug and fix 6 broken database triggers and constraintsdebugging66.6
Add Redis caching layer to Express APIbackend79.0
Optimize slow Postgres queries in Flask appbackend86.3
Add Google OAuth2 login to Express appfull-stack80.9
Write tests for untested legacy Flask servicecode-review47.3
Add retry logic and dead letter queue to Python task queuebackend72.0
Add GraphQL layer over REST APImulti-language73.8
Fix Node.js stream backpressure causing OOM on large filesbackend90.4
Build distributed node cluster with gossip protocolfrom-scratch72.9
Write integration tests for payment flowcode-review35.8
Add rate limiting middlewarebackend82.9
Implement Stripe webhook handlerbackend70.9
Zero-downtime schema migrationfull-stack73.0
Fix flaky test suitedebugging91.2
Add cursor-based pagination to REST APIbackend69.0
Fix N+1 query in dashboardbackend91.7
Fix memory leak in event handlerdebugging66.0
Refactor monolithic handler to CQRSrefactoring80.3
Code review: identify security vulnscode-review88.1
Debug race condition in worker pooldebugging84.0
Fix React hydration mismatchfrontend76.7
Build terminal UI dashboardfrom-scratch59.0
Build REST API from scratchfrom-scratch85.7