core: Optimize CPU bitmap blending and copy operations by jarca0123 · Pull Request #23006 · ruffle-rs/ruffle

jarca0123 · 2026-02-11T15:53:30Z

Per future CONTRIBUTING.md, I want to state that this code was generated by Claude Code, specifically Claude Opus 4.6. My workflow is iteratively profiling Ruffle with various SWFs, feeding the profiler output to the LLM, having it identify hotspots and generate optimizations, then re-profiling to verify the improvement. This is one of many PRs I intend to submit that were produced this way.

I have reviewed the code to the best of my ability, though in full transparency, I have not audited every individual line in detail.

EDIT: Here is a description of what it does: This replaces per-pixel get/set_pixel32_raw loops with row-based slice operations: memcpy for copies, and a two-lane u32 blend with div255 bit-trick for alpha compositing.

Here are some benchmarks:

Benchmark	This PR (ms)	Nightly (ms)	Change
copypixels_merge_alpha	468	2459	-81.0%
copypixels_merge_alpha_small	33	168	-80.4%
colortransform_alpha_tint	388	360	+7.8%
sprite_render_pipeline	420	534	-21.3%
draw_alpha_blend	1076	1694	-36.5%
draw_alpha_matrix_blend	1499	1639	-8.5%
particle_fade_composite	113	131	-13.7%
multi_sprite_composite	195	817	-76.1%
locked_buffer_composite	32	158	-79.7%

Excerpt from my "comprehensive" SWF benchmarks (however I can share the full SWF if you want):

  private function benchCopyPixelsMergeAlpha(n:int):void {
      var src:BitmapData = new BitmapData(256, 256, true, 0x80FF0000); // 50% alpha red                                                                                                                                                                                                                                     
      var dst:BitmapData = new BitmapData(256, 256, true, 0xFF000000); // opaque black                                                                                                                                                                                                                                      
      var rect:Rectangle = new Rectangle(0, 0, 256, 256);                                                                                                                                                                                                                                                                   
      var pt:Point = new Point(0, 0);
      for (var i:int = 0; i < n; i++) {
          dst.copyPixels(src, rect, pt, null, null, true); // mergeAlpha=true
      }
      src.dispose();
      dst.dispose();
  }

kjarosh · 2026-02-11T18:49:12Z

core/src/bitmap/bitmap_data.rs

+        // Fast division by 255: (x + 1 + (x >> 8)) >> 8
+        // Exact for all values in 0..=65025 (255*255).
+        #[inline(always)]
+        fn div255(x: u16) -> u8 {


Why do you think this is better? It produces more complicated assembly:

regular_div: movzx eax, di imul eax, eax, 32897 shr eax, 23 ret div255: mov eax, edi movzx ecx, ah add eax, ecx inc eax shr eax, 8 ret

I would imagine the compiler knows how to optimize a division by 255, why is it an issue? Which architecture are you optimizing for?

I am optimizing for no specific architecture in mind, I want to take every target into account. However, this truly is an oversight that I am reverting.

If you really want to extract a function, you can do it in such a way that will prevent us from casting integers in every line, e.g.

let r = source.red() + scale(self.red(), inv_sa);

kjarosh · 2026-02-11T18:51:40Z

core/src/bitmap/bitmap_data.rs

+            .wrapping_add(div255(self.red() as u16 * inv_sa));
+        let g = source
+            .green()
+            .wrapping_add(div255(self.green() as u16 * inv_sa));


Why are you changing it to wrapping_add? Are we expecting an overflow? The assembly is identical (regular addition is wrapping in release mode, it adds overflow checks in debug mode).

Ooh, didn't know that. Reverting this too.

Lord-McSweeney · 2026-02-12T16:01:33Z

core/src/bitmap/operations.rs

    }
 }

+/// Blend a single pixel (src-over, premultiplied alpha, two-lane u32 trick).


Is there a difference between this method and Color::blend_over?

n0samu · 2026-02-14T07:51:39Z

(however I can share the full SWF if you want):

I do want, yes 😄

jarca0123 · 2026-02-15T03:48:39Z

(however I can share the full SWF if you want):

I do want, yes 😄

Well, in order to keep everything in one place, here are the benchmarks:
test.zip

Also, I found a quicker algorithm, so I'll update the PR.

kjarosh reviewed Feb 11, 2026

View reviewed changes

kjarosh added T-perf Type: Performance Improvements A-core Area: Core player, where no other category fits llm The PR contains mostly LLM-generated code waiting-on-author Waiting on the PR author to make the requested changes labels Feb 11, 2026

core: Optimize CPU bitmap blending and copy operations

9d5796f

jarca0123 force-pushed the optimize-cpu-bitmap-blending branch from 1fcddcd to 9d5796f Compare February 12, 2026 06:27

Lord-McSweeney reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: Optimize CPU bitmap blending and copy operations#23006

core: Optimize CPU bitmap blending and copy operations#23006
jarca0123 wants to merge 1 commit intoruffle-rs:masterfrom
jarca0123:optimize-cpu-bitmap-blending

jarca0123 commented Feb 11, 2026 •

edited

Loading

Uh oh!

kjarosh Feb 11, 2026

Uh oh!

jarca0123 Feb 12, 2026

Uh oh!

kjarosh Feb 12, 2026

Uh oh!

kjarosh Feb 11, 2026

Uh oh!

jarca0123 Feb 12, 2026

Uh oh!

Lord-McSweeney Feb 12, 2026

Uh oh!

n0samu commented Feb 14, 2026

Uh oh!

jarca0123 commented Feb 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

jarca0123 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjarosh Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

jarca0123 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

kjarosh Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

kjarosh Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

jarca0123 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Lord-McSweeney Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

n0samu commented Feb 14, 2026

Uh oh!

jarca0123 commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jarca0123 commented Feb 11, 2026 •

edited

Loading

jarca0123 commented Feb 15, 2026 •

edited

Loading