Reverse Engineering Binaries With AI

# Getting Into Security

In middle school the gateway drug known as Halo 3 and unapproved mods shared on the game's File Share system got me interested in programming and security.

Through these mods and YouTube tutorials I discovered the (long defunct) Xbox-Tampers forum, followed tutorials on how to edit map files in a hex editor and make very basic MSN Messenger nudge bomb applications, and met some great people through that site who I'm still friends with.

I remember the struggles in the late 2000s and early 2010s of using ~~cracked~~ expensive copies of IDA Pro to reverse engineer bits of the Xbox 360 to better write modding tools, and begging my friends who were much better than me (like @carrot_c4k3, Xenomega, and @Grimdoomer) to help me reverse some Xbox 360 OS functions that I could not understand.

I've invested a lot of time and energy to try to get better at these things over the past 18 or so years. It's not what I live and breathe anymore and I'm not insanely cracked at RE, but I feel I've gotten pretty decent at being able to throw a binary into a static analysis tool and with disassembly/pseudocode can reasonably understand what's going on in an an application.

I've worked professional as a security engineer for the past 10 years and of course, AI "happened" and it's shaken up a lot of things, but I've still felt fairly comfortable about my skills not being terribly threatened. I've seen people try traditional static analysis tools, data flow analysis, etc., and all tooling falls short somehow, so surely this won't be too different in the near-term -- especially for such a niche skill -- right?

I think I was wrong.

# Programming w/ AI

For non-ethical reasons I've felt pretty anti-LLM for coding. Mostly because it made me feel disconnected, dumb, and lose context. When Copilot first came out I gave it a shot but I felt like a monkey just smashing tab to have it complete some code. My subscription was soon cancelled as I felt like I was losing my edge when I didn't have AI assistance readily available.

At my job, however, AI adoption is being driven fairly hard and I've taken to using Claude Code for some of my side projects to understand AI usage better.

A couple years ago I created a program called WoWs Toolkit which for about the last year went without any major updates because I just lost the steam to context switch all the time.

It's a fairly straightforward application designed to datamine the naval warfare videogame World of Warships, but has some slightly complicated internals which are used to read game files, understand the match replay format, etc:

WoWs Toolkit

I've had people in the community ask for various features over time which were mostly filed away in the "Nice To Have Eventually" folder as some require significant time investment.

One requested feature was the ability to read the game's proprietary 3D model format to understand the armor layout of ships in the game better. I started to reverse engineer the format about a year ago, lost steam, and just kind of dropped it.

There are various moving parts for that, some of which I'd already done:

Reading the game's virtual filesystem (VFS) without intermediate tools.
Understanding the game's "GameParams" database which holds meta information about ships, upgrades, each component's 3D model path, etc.

And parts I'd not done:

In the main VFS is a file called "assets.bin" which is a format very similar to the VFS it's contained within, but different enough to break my parser and it also had a second string table that I hadn't yet understood.
Any research on the .geometry, .visual, or .model files -- all of which are required to paint a complete picture of the mesh, model positions, world transform, etc.
Any understanding of how ship armor was represented and how it works in relation to the mesh.

I was making some changes to the app and thought, "Screw it, why not see if Claude can figure it out?"

# Claude Code + Binary Ninja MCP

I installed one of the MCP plugins for Binary Ninja, set it up in Claude Code, and gave it the following prompt:

We have the Binary Ninja MCP running with WorldOfWarships64.exe running. There's some code I'd like you to reverse engineer in order to figure out a file format. Please document findings in G:\dev\wowsunpack\MODELS.md

We're going to be reverse engineering the WoWs .geometry file format. An example file can be found here: G:\wows_dump\res\content\gameplay\uk\ship\aircarrier\BSA013_Colossus_1945\BSA013_Colossus_1945.geometry

It's a 3D model format that I believe is proprietary to Wargaming's BigWorld engine. There are some magic constants in the file like 45 4E 43 44 that I've used to find a possible candidate function to start at: sub_140a5a940

This has some interesting strings such as: bufferHeader->m_magicVal == EncodedBufferHeader::EncodedMagicVal

There are other functions with this magic value too that I have not looked at. Please remember if you search the binary in binary ninja for this magic value, it will need to be little-endian encoded.

If you want to write code to test, feel free to add a new command to the wowsunpack CLI and start adding code to a new models module.

We should use winnow for our binary parsing. Do not commit any changes.

Then came some early results:

Early AI analysis

Note: I'm using Claude Desktop here for no good reason. Normally I'd use the CLI or ACP interfaces.

And after processing for a bit:

Additional AI analysis

After about an hour or two (part of that time was waiting on me to accept some action prompt) I had some 3D models extracting into a format I could load into Blender piece-by-piece:

First 3D Models

And after more prompting, it was able to export the entire ship with its paint textures:

Ship Model

And as a side note, it did label things/create structs!

Binja Labeling

As far as code is concerned, it tried to do some very dumb things along the way. For example, in one of these files there's some XML data at some point. It generated code where after parsing a section header it would scan forward looking for the XML blob's opening tag and do the same from that position to find the closing tag. I had to curse and yell telling it, "You're screwing this up, stop taking shortcuts. Do this the the right way. No heuristics."

Now obviously this binary has checked assertions (which I hope remain in the binary after this blog post :) ) and those make analysis overall easier. So maybe this is an extremely positive case.

But honestly, I'm impressed with what Claude was able to do. From this very dumb prompt I was able to go from a bit of info from myself + a very surface-level parser for a dependency of the model loading pipeline (not even the 3D model itself!) to code which is able to export 3D models of ships with their textures and armor models, at any LOD. And it built a custom 3D model viewer into WoWs Toolkit.

It even did some weird things I never thought of, like cracking hashes from a set of known strings in the binary to figure out items in the tree structure.

I barely wrote a line of code for this and it cost maybe ~$150 in tokens:

Early AI analysis

If you would like to read its full analysis of the 3D model format, here is its braindump. And here is the code.

# Learnings

While I really don't have much to share about the specific thing being reverse engineered, I did learn quite a bit about interacting with Claude. I'm sure people who live and breathe LLMs have their own advice, but what I found personally valuable:

Developing custom CLIs for rapid testing/iteration. I already had a CLI for interacting with game files which Claude was able to easily extend to test its analysis of the binary.
The existing tooling beyond just that single CLI was critical. Other CLI tools for examining related file formats or deobfuscating data were crucial. This wasn't just a "point it at Binja and profit" sort of deal to achieve the polished result.
Watching the code as it streams in, or ensuring a mostly comprehensive review for reasons I've already mentioned like the AI trying to take shortcuts, is very important. I interrupted Claude many times when I noticed it was taking a subpar approach.
My project is in Rust, and Rust has a strong type system. Having the AI use strong newtypes for things like identifiers helped quite a bit with ensuring bug-free code.
My code was originally split into a couple different repos (one for the CLI tool and libraries, one for the UI). Swapping editor windows before all of this was kind of annoying, and it was even more annoying when trying to have the agents do cross-repo changes.
If you want to sniff out AI-assisted or generated code changes, look for unicode symbols like the fancy Unicode equivalent of <-> or ->, comments in Rust starting with //!, or sudden introduction of section dividers like the following:

// ---------------------------------------------------------------------------
// Helper: parse an array at a given offset, wrapping winnow errors
// ---------------------------------------------------------------------------

And most importantly: there were still bugs along the way. Trying to debug some of these things and prompt for fixes without knowing specifics of how the pieces were interconnected was really annoying (e.g. ship turrets were sometimes not facing to the correct default orientation). For things I really care about being done right, I think it's still worthwhile to go through the pains myself first and have AI assistance later if I can if for not other reason than to save time in correcting its mistakes.

# Mixed Feelings

I would consider myself a low-level person who thinks about allocations in my application, the performance cost of various approaches, and I like knowing the details of how something works. I might do the wrong thing at times for certain tradeoffs, but I like to be aware of that tradeoff existing and which decision was made. For this task though?

I have no idea how it works.

Before starting this project I had zero knowledge of 3D rendering, model formats, and while I did manually reverse engineer the game's general virtual filesystem format for its packed files, I got about halfway through reversing the similar virtual filesystem used for 3D model assets. I have no idea how it works beyond what I started.

I read the code as Claude went along, pointing out things that I thought were obvious anti-patterns or incorrect... but I couldn't tell you how the different ship mesh sections are read, then paired to their transformation matrix (which is located in another file).

That kind of sucks.

Don't get me wrong -- I am happy that I'm giving people something they want and it otherwise may not have ever gotten done without Claude helping accelerate things. I'm also happy that the barrier to these types of tasks are lowered for a modest fee. There's just not "my code" and the iteration over failure to be proud of, and I want people to feel confident that the person behind the software fully understands what it's doing. Or supposed to be doing.

Even back in the days of asking my friends for help, I was still building a relationship with them in some capacity and they could help guide me. In some cases they would even just reverse engineer things, hand me the completed package, and explain to me the bullshit they had to understand, and I'd happily give them a shoutout for helping with the project. Now it's a one-way trip with Claude and asking it to do a brain dump.

I suppose that just like I told the AI not to take shortcuts though, I find it hard to accept <Insert This Month's Best Model> doing these things at the cost of building my own context, expanding my skillset to new areas, and keeping my existing skills sharp.