Reverse engineering SBF file format

Top

If you look in the game assets of Delta Force: Black Hawk Down, you'll see some sbf files (menumus.sbf, gamemus.sbf, EXP1.sbf) that store the music tracks. I wrote a program that converts the tracks inside these files to mp3. This page discusses the structure of these files. You can listen to the results here. Note that all the information in this page was acquired through reverse engineering. Some of it might be wrong. If you find any glitches in the output music files, create an issue.

Header and index

Each SBF file starts with a header of this format: struct SBFHeader { magic: [u8; 4], i1: u32, i2: u32, i3: u32, index_offset: u32, index_count: u32 } The magic is always "SBF0" in ascii. I didn't find any significance of i1, i2, i3. Index offset and index count declare the location of the index of music tracks in this file. Each index entry has this format: struct SBFIndexEntryBin { ident: [u8; 8], z1: u32, z2: u32, start: u32, size: u32, block_size: u32, z3: u32, } ident is again an ascii identifier. z1, z2, z3 are always zero. start and size declare the location of that music entry inside the file. block_size is always 4104 (= 4096 + 8), and size is always a multiple of block_size. So the segment of file between start and (start + size) is a list of chunks of size 4104. The track identifiers are named like this: (MARKA001, MARKA002, ...). Marka breakdown is the first mission of the game. So all the segments of a mission need to be extracted, grouped, and the sound data of all these segments is to be concatenated.

Chunk format

Each segment has a list of chunks of this format: struct SBFChunkData { size: u32, scale1: u8, scale2: u8, two_fifty: u8, zero: u8, content: [u8; 4096] } content is the actual PCM data. size is always 4096, except for the chunks at the tail of a segment, meaning it's the count of valid bytes inside the chunk.

Experimentation reveals the PCM data format is unsigned 8 bit, 2 channel, at 22050Hz sample rate. If you convert it to mp3 based on this spec, you get something like this:

It has the right rhythm but the amplitude is all over the place. It turns out the scale fields in the chunk header are modifiers that needs to be applied to the PCM data to get proper sounding audio. The digital signal needs to be divided by 2 s c a l e 2^{scale} . I scaled the signal up to signed 16 bit and applied this modifier to get the result linked above.

Why did they encode it like this?

The decision to choose 8 bit PCM was clearly a space optimization. But the problem with 8 bit audio is precision. There are only 256 values that the amplitude can have. For parts of the track where the volume is relatively loud, this isn't that big of a problem, but it adds a lot of noise to the quieter parts of the track and sounds worse. (relative difference between 1 and 2 is much bigger than difference between 115 and 116). For example, this is a downscaled 8 bit version of the first track in the list. Notice the hissing and cracking in the first few seconds.

This is unacceptable because this game has a lot of ambient music (e.g. GASA track).

Their solution was to divide the tracks into 4096 byte chunks, which stores around 100ms of sound, use full 256 level precision for quiet parts as well, and add metadata to the chunk describing how it must be scaled in a high precision space to get the final result.

Why didn't they just use mp3?

mp3 was under a patent until 2017. Maybe they didn't want to bother with the licensing. Ogg vorbis is a free alternative but it came out in 2000. Considering the game came out in 2003, they might have already implemented this solution by the time ogg became popular.