Reproducible 2026 Benchmarks Put Toyota’s Fluorite to the Test at 60/90/120 Hz Across Android, iOS, Web, Desktop, and Automotive

A standardized Impeller‑vs‑Skia, Metal/Vulkan/OpenGL methodology uses Flutter’s own tooling and platform profilers to separate engine, renderer, and driver effects for Toyota’s Flutter‑based Fluorite game engine

In 2026, performance claims without airtight attribution don’t fly—especially for a cross‑platform engine built atop Flutter and deployed from phones to instrument clusters. Toyota’s Fluorite is an ambitious Flutter‑based game engine with limited public technical disclosures, making it easy to misread wins or regressions as “engine behavior” when they’re often renderer, driver, or compositor effects. A new, reproducible benchmarking blueprint confronts that head‑on. It isolates Fluorite’s engine‑layer impact from Flutter’s renderer/runtime and platform GPU drivers, hits every major platform from Android to embedded Linux and Android Automotive OS (AAOS), and does it at the refresh rates that matter now: sustained 60, 90, and 120 Hz.

Rather than chase headline average FPS, the framework leans on Flutter’s own profiling stack plus platform‑native tools to capture what developers trust: frametime distributions including 1%/0.1% lows, input‑to‑render latency, shader and asset stutter, startup times, memory/GC behavior, CPU/GPU utilization, power/thermal dynamics, binary size, and throughput ceilings. Every run is pinned to exact Flutter engine hashes, renderer backends (Impeller vs Skia), graphics APIs (Metal/Vulkan/OpenGL), and device OS/driver builds. Raw traces and configuration manifests ship alongside any charts. The goal isn’t a single score; it’s a playbook that survives scrutiny, enables apples‑to‑apples comparisons, and reveals where Fluorite truly moves the needle.

Fluorite’s Identity and the Attribution Problem

Fluorite targets cross‑platform deployment by building on Flutter’s embedders and rendering pipeline. Public, authoritative details are sparse in widely available English‑language sources, so rigorous attribution is essential. The framework treats performance as the sum of three separable layers:

Engine layer (Fluorite): sprite batching and scene orchestration, asset and audio pipelines, scheduling and isolates, and optional physics/3D abstractions.
Flutter renderer/runtime: Impeller vs Skia behavior; Metal/Vulkan/OpenGL/ANGLE/CanvasKit/HTML paths; pipeline creation, rasterization, frame scheduling, and first‑use shader costs.
Platform/driver/compositor: Vulkan/Metal/OpenGL drivers, compositor vsync policies, OS scheduling and frame pacing.

To attribute correctly, the same scenes run across permutations:

Fluorite + Impeller vs Fluorite + Skia
Stock Flutter equivalents (CustomPainter/SceneBuilder/widgets)
Optional Flame/Forge2D equivalents for 2D/physics context

If Fluorite outperforms stock Flutter with the same renderer/API, that delta is credibly engine‑layer. If a delta flips or vanishes when switching renderer/API, it’s renderer/runtime or driver territory—not a Fluorite trait. Hidden fallbacks (e.g., Skia slipping in where Impeller was intended) are explicitly verified via logs and GPU traces on every run.

Methodology Pillars: Control the Stack, Pin the Versions, Repeat the Runs

Reproducibility starts with locked environments and transparent configurations.

Control the renderer/API:
iOS/macOS: Impeller (Metal) with Skia fallback where available.
Android/AAOS: Impeller (Vulkan) where supported; Skia (OpenGL/Vulkan) for comparison.
Web (Chrome Stable): CanvasKit (WebGL/WebGPU when exposed via CanvasKit) and HTML renderers.
Windows/Linux: Skia via ANGLE/D3D and OpenGL or Vulkan where exposed; macOS also exercises Impeller/Metal.
Pin the toolchains and OS:
Flutter stable channel with the exact flutter —version, engine commit hash, and Dart SDK noted.
Toolchains captured: NDK/Xcode/Chrome build; per‑device OS build and GPU driver versions; display mode and resolution.
Honor refresh targets—then compress them:
Default vsync adheres to device configuration; tests explicitly cover 60/90/120 Hz when panels and OS permit. Adaptive refresh is disabled where possible, and compositor/vsync modes are documented to avoid frame‑pacing ambiguity.
Structure the runs:
Profile and release builds only (debug excluded).
Warm‑up: 30–60 seconds to fill caches and settle pipelines.
Measure: 2–5 minutes per scenario; plus 10–15 minute sustained runs for power/thermals.
Repeat: At least five runs per scenario/renderer/API combo; report means and confidence intervals.

Every artifact that shapes a result—Perfetto configs, AGI versions, Instruments versions, device thermal modes—is recorded and published with traces.

Workloads That Matter: From Sprite Batching to Optional 3D Shader Sweeps

A single harness app drives all scenarios, toggled at runtime to ensure identical content across platforms and backends. Each workload sweeps parameters to expose both typical and stress behaviors:

Sprite batching and animation:
Variable sprite counts, multiple atlases, transform/opacity animations, overdraw stress.
Throughput target: maximum sustained sprites at 60/90/120 Hz with a fixed jank threshold.
Particle systems:
Multiple emitters, blending and heavy overdraw; CPU‑ vs GPU‑driven updates where applicable.
Designed to surface fill‑rate, bandwidth, and thermal throttling dynamics.
Physics (conditional):
If Fluorite bundles physics, collision‑heavy scenes and constraints measure CPU and scheduling costs.
If not, Flame/Forge2D provides an optional baseline for physics overhead in Flutter content.
Input handling:
High‑frequency multi‑touch/gesture streams (up to 120 Hz) overlaying animated content to produce end‑to‑end latency distributions and reveal coalescing behavior under load.
Audio mixing:
Continuous multi‑channel mixing, including sample‑rate conversion, to test scheduling contention against frame deadlines.
Asset loading/streaming:
Cold/warm loading of images/audio; incremental streaming; compressed formats; background‑isolate decoding to quantify stutter, GC activity, and I/O impacts.
Optional 3D/shader complexity:
If Fluorite exposes 3D features, animated skinned meshes, multipass materials, and fragment complexity sweeps exercise pipeline caches and GPU saturation.

Microbenchmarks isolate hot subsystems (layout, painting, compositing, raster), following established Flutter macro/microbenchmark patterns. Real‑world scenes then combine sprites, particles, input, audio, and streaming for content‑like behavior.

A 2023–2026 Device and Platform Matrix That Mirrors Reality

Coverage aims to generalize across modern hardware and drivers:

Android phones/tablets:
Snapdragon 8 Gen 2/Gen 3 class, Tensor G3, a Dimensity flagship, and Snapdragon 7‑series midrange.
Multiple refresh settings (60/90/120 Hz). GPU driver versions and vendor frame pacing are documented. Perfetto with FrameTimeline is enabled.
iOS/iPadOS:
iPhone 14 Pro/15 Pro (ProMotion) and iPad Pro M2/M4 for 120 Hz Metal/Impeller.
Core Animation and Metal System Trace capture per‑stage times.
Web (Chrome Stable):
Desktop Chrome on Windows (ANGLE/D3D), macOS (Metal‑backed via Chrome on Apple Silicon), Linux (Mesa), and Android Chrome.
Both CanvasKit and HTML renderers are measured.
Desktop:
Windows 11 (ANGLE/D3D, Vulkan/OpenGL where available), macOS (Metal with Impeller/Skia), Linux (Wayland/X11 with OpenGL/Vulkan).
Compositor and vsync settings are captured.
Embedded Linux:
Jetson Orin (Vulkan) and Raspberry Pi 5 (OpenGL ES via flutter‑pi) as exemplars.
Kernel/driver/compositor stacks and power modes (e.g., nvpmodel) are recorded.
Automotive:
AAOS reference hardware or OEM IVI, with SurfaceFlinger/FrameTimeline and vendor frame pacing verified.

For each device: model, SoC/GPU, RAM/storage, OS build, display mode, GPU driver version, and thermal/performance mode are logged. Specific model‑level metrics are not published here; where necessary, “specific metrics unavailable” applies.

Metrics That Developers Trust: Frametime Percentiles, 1%/0.1% Lows, Latency, and Power

Frame budgets compress as refresh rises: 16.67 ms at 60 Hz, 11.11 ms at 90 Hz, and 8.33 ms at 120 Hz. All pipeline stages—UI, raster, GPU submission, present—must collectively clear these deadlines to avoid jank. The framework reports:

FPS and frametimes:
Mean FPS and frametime distributions (P50/P90/P99); 1%/0.1% low FPS converted from worst‑percentile frametimes (FPS = 1000/ms).
Dropped‑frame and jank rates from platform counters (SurfaceFlinger/Choreographer, Core Animation, DevTools timelines).
Input‑to‑render latency:
P50/P90/P99 from input delivery through vsync, UI/raster, GPU submission, to present.
Measured via Perfetto FrameTimeline on Android and Metal System Trace/Core Animation on Apple platforms.
Shader compilation and asset stutter:
First‑use stalls are compared across Skia (with/without SkSL warm‑up) and Impeller’s precompiled pipelines on Metal/Vulkan.
Startup time:
Cold/warm “time to first frame” and “first interactive frame” via flutter —trace-startup and platform traces.
Memory footprint and GC:
Dart heap and RSS over time, peaks/transients, isolate overhead, GC pauses; correlated with frametime spikes.
CPU/GPU utilization:
CPU core loads, GPU active/idle per frame, per‑stage timings, and frequency scaling.
Power/thermal:
Energy counters and battery drain per minute where available; thermal states and throttling onset/impact over sustained runs.
Binary and asset sizes:
App size breakdown via flutter build —analyze-size and store tooling.
Throughput limits:
Maximum sustained sprites/draw calls that maintain target refresh with predefined jank thresholds, per renderer/backend.

These are captured consistently across platforms using Flutter DevTools’ timeline and FrameTiming APIs for in‑app redundancy.

Instrumentation by Platform: Perfetto, AGI, Instruments, and Chrome DevTools

Tooling parity is central to credible attribution:

Android and AAOS:
Perfetto traces capture SurfaceFlinger, FrameTimeline, input, CPU/GPU schedulers and frequency domains.
Android GPU Inspector (AGI) supplies GPU frame timing, pipeline stats, and hardware counters on supported Vulkan devices (Impeller) and for OpenGL/Skia comparisons.
dumpsys gfxinfo framestats snapshots validate jank and dropped frames.
Energy Profiler and Battery Historian characterize power; thermal state APIs log throttling transitions.
iOS/iPadOS/macOS:
Xcode Instruments’ Core Animation shows frame pacing and render server times.
Metal System Trace exposes GPU workload and pipeline stalls for Impeller/Metal and Skia fallback.
Time Profiler attributes CPU hot paths. ProMotion refresh is fixed where possible for clean 60/120 Hz tests.
Web (Chrome):
DevTools Performance panel provides FPS, main/raster/compositor timelines, and WebGL/CanvasKit activity.
Chrome build and flags are recorded.
Desktop:
Windows captures include ANGLE/D3D or Vulkan/OpenGL; optional RenderDoc captures validate draw call/batching.
macOS uses Instruments; Linux uses Perfetto and driver tools, with Wayland/X11 and compositor vsync specifics documented.
Embedded Linux:
Custom embedders use Perfetto where available; flutter‑pi specifics on Raspberry Pi and Vulkan stacks on Jetson are recorded; RenderDoc assists with GPU inspection.

All platforms also log in‑app FrameTiming for cross‑checking with system traces.

Analysis Framework: Engine vs Renderer vs Platform

Attribution rules keep conclusions honest:

Engine layer (Fluorite)

Signs of improvement: higher sustained throughput and lower UI/raster main‑thread times with the same renderer/API; more consistent frametimes under identical GPU backends.
Likely mechanisms: sprite batching and culling efficacy; asset/audio pipeline efficiency; isolate/threading orchestration.

Flutter renderer/runtime (Impeller vs Skia; Metal/Vulkan/OpenGL/ANGLE/CanvasKit/HTML)

Signs of improvement: fewer first‑use shader stalls; more stable P90/P99 frametimes during cold runs; lower draw‑submission overheads.
Verification: Metal System Trace and AGI pipeline events confirming precompiled pipelines (Impeller) vs runtime compilation (Skia).

Platform/driver/compositor

Signs of impact: device‑specific variability in Vulkan/Metal/OpenGL drivers; compositor vsync and OS scheduling effects; frame pacing differences visible in SurfaceFlinger/Core Animation timelines and driver counters.

Expected Behaviors by Renderer and Platform

Certain patterns should emerge as runs scale across devices and refresh budgets:

Impeller/Metal on iOS/iPadOS/macOS should exhibit minimal first‑use shader stalls and tighter P90/P99 frametimes compared with Skia paths, especially on cold starts. Metal System Trace should show fewer pipeline creation blocks and steadier GPU submission cadence.
On Android/AAOS, Impeller/Vulkan should reduce shader‑related jank, but outcomes remain driver‑dependent; AGI traces identify hotspots such as pipeline creation, descriptor updates, or barriers on specific GPUs.
SkSL warm‑up improves Skia’s cold‑start stutter but adds operational complexity and may not cover all shader variants.
Higher refresh rates (90/120 Hz) compress budgets to 11.11 ms and 8.33 ms, exposing batching limits and CPU/GPU contention that are less visible at 60 Hz.
Particle scenes with heavy transparent overdraw typically tax fill‑rate/bandwidth and provoke thermal throttling over 10–15 minute sustained runs; frequency scaling and rising frametimes should make the onset visible in traces.
On web, CanvasKit (WebGL) tends to outperform the HTML renderer for graphics‑heavy scenes, albeit with larger binaries and sensitivity to driver quirks. Measuring both is necessary for a balanced view across desktop and mobile browsers.

Snapshot of expectations

Platform	Renderer/API	First‑use stutter	Frametime stability	Notes
iOS/iPadOS/macOS	Impeller/Metal	Low	High	Verify via Metal System Trace
iOS/iPadOS	Skia/Metal or OpenGL fallback	Medium without SkSL warm‑up	Improves with warm‑up	Sensitive to shader variety
Android/AAOS	Impeller/Vulkan	Reduced; driver‑dependent	Good on robust drivers	Validate with AGI events
Android/AAOS	Skia/OpenGL or Vulkan	Common without SkSL warm‑up	Driver‑sensitive	Compare warm vs cold
Web (Chrome)	CanvasKit/WebGL	Lower CPU pressure	Stable on desktop	Larger binary; driver quirks
Web (Chrome)	HTML	Smaller binary	Lower throughput in heavy scenes	Measure both paths
Desktop (Win/macOS/Linux)	ANGLE/D3D, Metal, Vulkan/OpenGL	Varies by path	Generally stable	Compositor/vsync matter

Risks, Validity Threats, and How This Framework Mitigates Them

Cross‑platform measurement is fraught with traps; the blueprint anticipates the big ones:

Hidden fallbacks:
Risk: Intended Impeller/Vulkan path silently falls back to Skia/OpenGL, corrupting comparisons.
Mitigation: Validate renderer/API per run via logs and GPU traces; publish captures.
Compositor and vsync interference:
Risk: Desktop compositors and Linux display servers can alter frame pacing; browser flags can skew web results.
Mitigation: Document compositor/vsync modes; prefer exclusive full‑screen where feasible; record Chrome versions/flags; capture platform timelines.
Thermal drift and background load:
Risk: Sustained runs heat devices and trigger throttling; background services introduce noise.
Mitigation: Control ambient conditions; precondition devices; randomize scenario order; airplane mode and restrained background services; verify via sched/frequency domains in Perfetto.
Version churn:
Risk: Browser, driver, OS, and Flutter updates shift baselines.
Mitigation: Pin software stacks; re‑run a subset after updates to assess impact; publish hashes and versions.
Incomplete shader warm‑up:
Risk: SkSL warm‑up doesn’t cover all shader variants, leaving residual jank.
Mitigation: Compare cold/warm runs explicitly; contrast with Impeller’s precompiled pipelines.

Why This Blueprint Matters for Studios, OEMs, and the Flutter Community

This is more than a testing checklist. It’s a contract for trust between engine developers, studios, and platform stakeholders:

For studios shipping on Fluorite:
Clear attribution means faster root‑cause analysis. If a 120 Hz regression appears only under Skia/OpenGL and vanishes with Impeller/Metal, the fix path changes radically.
Frametime percentiles, 1%/0.1% lows, and input latency distributions reflect player experience better than averages. Shader/asset stutter and GC spikes are captured with concrete evidence.
For OEMs and automotive platforms:
AAOS and embedded Linux runs reveal driver, compositor, and thermal behaviors under real game‑like pressure. That feedback loop informs driver updates, power modes, and refresh‑rate policies in vehicles and devices.
For the Flutter community:
Controlled Impeller‑vs‑Skia runs clarify where Flutter’s renderer/runtime advances reduce jank—especially first‑use shader stalls on Metal/Vulkan—and where ecosystem guidance (e.g., SkSL warm‑up) helps.
Publishing harness code, raw traces, and full manifests invites independent replication and longitudinal tracking as Flutter, drivers, and devices evolve.

When benchmark data travels with its perfetto traces, Instruments captures, AGI sessions, and build logs, it stops being anecdote and becomes engineering evidence. That’s the bar this blueprint aims to set. 🎯

Conclusion

A modern cross‑platform engine lives at the intersection of app logic, a shared renderer/runtime, and a fickle universe of GPU drivers and compositors. Fluorite’s ambition—and the opacity that comes with limited public disclosures—demands a methodology that controls the stack, pins versions, and repeats runs until confidence intervals mean something. By standardizing scenes that matter (sprites, particles, input, audio, streaming, and optional 3D), running them across Android, iOS, web, desktop, embedded Linux, and AAOS at 60/90/120 Hz, and instrumenting with Flutter’s own tools plus platform profilers, this blueprint breaks performance down into what the engine controls and what it doesn’t.

Expect Impeller’s precompiled pipelines on Metal/Vulkan to curb shader‑induced jank relative to Skia, especially in cold‑start paths. Expect 90/120 Hz budgets to expose batching limits and CPU/GPU contention that never show at 60 Hz. Expect heavy overdraw to reveal thermal management policies over 10–15 minute sustained runs. Most of all, expect the results—frametime percentiles, input latency, memory/GC, power/thermals, throughput—to be independently verifiable because the traces and manifests ship with them.

Studios can make sharper trade‑offs. OEMs can tune drivers and policies for real workloads. The Flutter ecosystem can see where renderer/runtime work pays off. Reproducibility isn’t a nice‑to‑have anymore; it’s the only way to tell whether a silky frame came from Fluorite, from Flutter, or from the GPU driver doing its job. And that clarity is precisely what 2026 demands.

Sources & References

Impeller — Flutter’s new rendering runtime Supports claims about Impeller’s precompiled pipeline design on Metal/Vulkan and expected reduction of shader‑induced jank.

Flutter performance best practices and profiling overview Backs use of Flutter performance guidance, profiling flow, and best‑practice metrics collection.

Flutter DevTools Performance (timeline, CPU, memory) Supports timeline capture, frame analysis, and memory/GC instrumentation used across platforms.

Flutter build modes (debug/profile/release) and versioning Justifies running benchmarks in profile/release and pinning Flutter versions/hashes.

Analyze Flutter app size Supports reporting binary/app bundle size and analyzing contributors.

Flutter Web renderers (CanvasKit vs HTML) Backs measuring CanvasKit and HTML renderers on the web and trade‑offs like size and performance.

Flutter macrobenchmarks (reference structure) Informs macrobenchmark patterns for workload/harness design and measurement structure.

Flutter microbenchmarks (reference structure) Supports isolating subsystems (layout, painting, raster) to attribute time budgets accurately.

Reduce shader compilation jank (SkSL warmup and shader behavior) Supports discussion of SkSL warm‑up trade‑offs and shader jank mitigation versus Impeller.

Flutter FrameTiming API Backs collection of frametime statistics, FPS, and jank calculations at the app level.

Perfetto (system-wide tracing for Android/Linux/Chrome) Supports system‑wide tracing, FrameTimeline capture, and sched/frequency analysis on Android/Linux.

Android GPU Inspector Backs GPU pipeline event analysis and counters for Vulkan/OpenGL paths on Android/AAOS.

Android FrameTimeline and frame pacing/jank attribution Supports end‑to‑end frame pacing, jank attribution, and input‑to‑render latency breakdowns.

Android dumpsys gfxinfo framestats Supports validation of jank and dropped frames outside app‑level instrumentation.

Android Studio Energy Profiler Backs power draw/energy measurement on Android during sustained runs.

Android Battery Historian Supports battery drain analysis and power characterization over time on Android.

Android thermal guidance for games/apps Backs methodology for detecting and interpreting thermal throttling and performance decay.

Xcode Instruments (profiling on Apple platforms) Supports use of Instruments for Core Animation, Time Profiler, and energy analysis on iOS/macOS.

Core Animation Instrument (diagnosing frame rendering) Backs per‑stage timing, frame pacing, and jank analysis on Apple platforms.

Metal System Trace Instrument Supports verification of pipeline creation, GPU workload cadence, and Impeller vs Skia behavior on Metal.

Dart DevTools Memory Backs memory/GC profiling, heap/RSS tracking, and correlation with frametime spikes.

Dart isolates and concurrency model Supports analysis of isolate usage, scheduling, and their impact on frame deadlines.

Flutter Embedder (C/C++ custom embedders) Backs references to custom embedders on desktop/embedded and control over rendering paths.

flutter-pi (Flutter on Raspberry Pi) Supports inclusion of Raspberry Pi 5/OpenGL ES via flutter‑pi in the embedded matrix.

RenderDoc (GPU frame capture and analysis) Backs validation of draw calls, batching, and pipeline state via frame captures on desktop/embedded.

Chrome DevTools Performance (web tracing) Supports FPS/timeline capture on web and analysis for HTML vs CanvasKit renderer behavior.

Android Automotive OS overview Backs inclusion of AAOS in the target matrix and the need to verify SurfaceFlinger/FrameTimeline on automotive hardware.