Introduction: A Tiny Wasm File Walks Into wasm2c
While working at Seyra Online, I discovered a pretty amazing behavior exhibited by wasm2c.
At first glance, it looks like a harmless readability choice. wasm2c takes WebAssembly modules and emits C code, so when it encounters Wasm export names that are not valid C identifiers, it needs to turn those names into something C can legally compile. That makes sense. The funny part is how it does that.
A small mistake in how wasm2c achieves readable symbol names causes a very aggressive expansion behavior inside the generated C output. With the right input, a tiny Wasm module can turn into an absolutely cursed C file.
Depending on how hard you push the behavior, this can result in:
- Core pinning — even at relatively low amplification levels.
- RAM shorting / RAM pressure — scales with the size of the generated output.
- Massive dumped files — in some cases, thousands of times larger than the input.
- Downstream tool pain — editors, compilers, language servers, and formatters all get dragged into the blast radius.
This is not a browser escape, Wasm runtime exploit, VM bug, or memory corruption issue. It is much dumber and, honestly, much funnier. It is a tooling bomb. A small binary asks a translation tool to print a gigantic amount of text, and the translation tool says: “sure, absolutely, no problem.”
Heaven: wasm2c Is Actually Pretty Nice
One of the first tools I learned about while working on the WASM VM for Seyra Online was wasm2c. Overall, it is a genuinely useful tool. It takes a WebAssembly module and turns it into C, which makes it easier to inspect, compile, instrument, or reason about in environments where raw Wasm is annoying to work with. For reverse engineering, sandbox work, compiler testing, and VM development, that is extremely useful. You can take a compact binary format and get something much closer to a normal systems programming representation.
When you dump a file and inspect the exports, you may see a list like this:
Export[7]:
- memory[0] -> "memory"
- table[0] -> "__indirect_function_table"
- func[19] <_get_exports> -> "_get_exports"
- func[20] <_opaque> -> "_opaque"
- func[21] <_opaque_invoke> -> "_opaque_invoke"
- global[1] -> "__data_end"
- global[2] -> "__heap_base" this is given that you’re working in freestanding WASM
The important thing to notice here is the function index being 20 on _opaque. In the wasm2c dump, we get a generated method signature that looks something like this:
w2c_guest_0x5Fopaque_0 There are a few fun parts to this symbol that are worth breaking down:
w2c→ generated bywasm2c.guest→ the name of the module we dumped.0x5Fopaque→ the original export name, with invalid C-token characters turned into hexadecimal-looking text._0→ part of the uniqueness / suffixing scheme.
The funny part is this section:
0x5F That is the escaped representation of _. So a single character can become four visible characters in the generated C symbol.
That does not sound dangerous by itself. It sounds like normal escaping. If you have one weird character, expanding it into 0xNN is fine. But if the original name is attacker-controlled, very long, full of C-hostile characters, and referenced many times, this stops being a readability feature and starts becoming an amplification primitive.
Interestingly, as observed in this export name, when you name your export with certain C-hostile tokens, those bytes can become significantly larger in the output.
What’s even better? The Wasm spec and surrounding APIs allow export names to be strings, not C identifiers. That means you can put a lot of inconvenient data in an export name, including characters that are horrible for C symbol generation. That is amazing for making exports garbage for people disassembling or translating the module.
But surely there is a nice, clean limit to this, right?
Earth: Where Is The Export Name Limit?
I’m going to take this one step further. Just how many weird tokens can we inject into an export name? Let’s take a look at the official documentation regarding this exact question.
There is not a neat little rule saying: Export names must be short, cute, and friendly to C compilers.
In the WebAssembly binary format, names are length-prefixed byte strings. They are not designed around C’s idea of valid identifiers. That distinction matters a lot.
A normal export name might be something like:
_opaque That is boring, readable, and cheap.
An adversarial export name can instead be made from characters that are valid as Wasm string data but terrible as C identifier data.
For example, conceptually:
" + + + + + + + + + + " That is not a meaningful C symbol, but as a string name in Wasm metadata, it is the kind of thing a translator has to deal with somehow. wasm2c deals with this by escaping the bad bytes into a readable representation.
Again, escaping is not automatically wrong. For normal names, it is actually nice. It means the generated C still has some relationship to the original Wasm module. If you are debugging normal code, that helps. The issue is that this readable escaped name remains proportional to the attacker-controlled name. Even worse, it does not just appear once. It can get repeated throughout the generated C output wherever the translated function, export, wrapper, or helper path needs to refer to that symbol.
This means the expansion happens before the C compiler, syntax highlighter, formatter, language server, or editor ever gets involved.
The output is already cursed by the time those tools see it.
Hell: Turning 10KiB Into A Problem
Now let’s scale this behavior. Let’s say we make our export name around 10 KiB. Then let’s make it appear 1,000 times in the generated output.
If we are abusing characters that expand roughly four times in the emitted C symbol, then a 10 KiB export name can become something closer to a 40 KiB generated symbol. By itself, that is already gross. But the real issue is repetition.
If that expanded symbol is printed one thousand times, then just the symbol text alone contributes around:
10 KiB * 4 * 1,000 = 40,000 KiB Which is roughly:
39 MiB That is not counting surrounding syntax, generated wrappers, declarations, tables, helper functions, whitespace, comments, or any other C scaffolding.
So the rough model becomes:
expanded_name_size = original_name_size * escape_multiplier
output_cost = expanded_name_size * reference_count Or, simplified:
small Wasm input + long escaped name + repeated references = stupidly large C output This is the core of Wasm2Bomb.
The Wasm file stores compact data. The C backend emits verbose text. The input pays for short references and binary structure. The output pays for the fully expanded escaped symbol over and over again. That asymmetry is the bug.
At larger sizes, this stops being a cute file-size trick and starts breaking tools:
- The translator can pin a CPU core while generating output.
- The output file can become absurdly large.
- Editors can freeze trying to open the C file.
- Syntax highlighters can eat memory tokenizing huge identifiers.
- Language servers can become useless.
- Compilers can spend excessive time parsing and storing symbol data.
- CI systems can waste disk, RAM, and CPU on something generated from a tiny artifact.
This is how something like 5.7KiB can become something completely unreasonable.
Impact
The obvious impact is disk usage.
A small Wasm input can produce a huge C file. If this happens in an automated analysis environment, that alone is enough to matter. The more interesting impact is toolchain denial of service.
Any system that accepts Wasm and automatically runs wasm2c can be forced into an extremely wasteful translation path.
That includes:
- Reverse engineering pipelines.
- Malware analysis sandboxes.
- CI systems.
- Wasm compatibility testing systems.
- Automated decompilation services.
- Developer tools that generate C output for inspection.
This also makes the behavior useful as an anti-analysis trick.
A malicious or intentionally annoying Wasm module does not need to exploit the runtime. It can simply make the analysis tooling generate output so large that nobody wants to open it. That is a different kind of failure, but still a real one.
The module may validate cleanly. It may execute normally. It may not contain any runtime exploit at all. But the moment someone tries to translate it for analysis, the tooling gets punished.
The funniest part is that this behaves like a decompression bomb, but it is not really decompression. A classic decompression bomb uses compression ratios to hide a huge output behind a small compressed file. Wasm2Bomb does something dumber.
It abuses the gap between a compact binary format and a verbose human-readable translation target. The Wasm module is small because Wasm is binary and references are compact. The C file is huge because generated source code is text, and text gets very expensive when you print hostile identifiers repeatedly.
The other funny part is that the problem comes from a feature meant to be helpful. Readable escaped names are nice for normal people doing normal work. They preserve context. They make generated output easier to follow. But when the name is hostile, readability turns into an attack surface. A tool trying to be helpful ends up printing the world’s worst identifier thousands of times.
Mitigations
This class of issue is not hard to mitigate. The main rule is simple:
Do not let untrusted names scale directly into generated source code forever.
The easiest mitigation is to cap generated symbol length. For normal names, preserve readability:
w2c_guest_opaque_0 For weird but still reasonable names, escape them:
w2c_guest_0x5Fopaque_0 For pathological names, stop preserving the whole thing and switch to a digest:
w2c_guest_export_ab12cd34 A better version keeps a side mapping so the original export name is still recoverable:
export_ab12cd34 -> original Wasm export name That keeps the generated C bounded while preserving debugging information somewhere safer.
Other useful mitigations include:
- Hash long names instead of fully escaping them.
- Preserve only a short prefix plus a stable digest.
- Intern generated strings so the translator does not repeatedly allocate the same giant symbol internally.
- Emit original names in comments or metadata, not in every identifier.
- Add output-size amplification tests to CI.
- Reject pathological names in C-output mode if preserving them would create unreasonable output.
The important thing is to avoid this cost model:
untrusted_name_length * reference_count * escape_multiplier At least one of those values needs a hard ceiling.
Conclusion: Readability Needs Limits
Wasm2Bomb is a small example of a larger tooling problem. Translators often take compact binary formats and turn them into verbose human-readable formats. That translation step is useful, but it can also become an attack surface. Here, the issue comes from preserving arbitrary Wasm export names inside generated C identifiers. For normal programs, this is convenient. For hostile programs, it becomes amplification.
A tiny Wasm module can force wasm2c to generate a C file wildly disproportionate to the original input size. Once that happens, every downstream tool inherits the mess. The fix is straightforward:
- Bound generated symbol length.
- Hash pathological names.
- Preserve original names separately.
- Test for output amplification.
The real lesson is even simpler:
Human-readable output is still output. If it contains untrusted data, it needs limits.
A 5.7KiB Wasm file should not be able to casually ask a toolchain to write gigabytes of C.
