The WAD file format of WipEout Pure and Pulse (PSP)

Previous post: Running Red Hat 8 from 2002 on Linux 5.7.0-rc3 from 2020
Next post: Filename hashing in WipEout Pure/Pulse WAD files

The two PSP entries in the WipEout series use "WAD" files to store their game data. Now, "WAD" usually stands for "where's all the data" and is associated with the game DOOM.

The WipEout "WAD" files, however, have nothing in common with the Doom WAD file format (or any other WAD file format), except that it serves the same purpose: It is a container for all the data that the game uses.

Googling for "wipeout pure wad" results in some useful resources such as this thread from all the way back in 2005. This post tries to add some missing documentation on the file format and the meaning of fields.

The Header

The WAD file format is very simple, at the beginning of the file there is a simple header:

struct WADHeader {
    uint32_t version;
    uint32_t nfiles;
    struct WADEntry entries[];

All values are stored in little-endian (the PSP uses a MIPS processor in little endian mode; the PS3 uses PowerPC in big endian mode; your x86 or ARM machine also uses little endian, so you can ignore endianness issues when loading PSP data files on your little endian machine).

The version field is always 0x00000001.

nfiles is just the number of files contained within the WAD file.

Index Entries

Immediately after the two 4-byte header fields comes an array of entries (there will be nfiles entries in total):

struct WADEntry {
    uint32_t name;
    uint32_t start_offset;
    uint32_t length;
    uint32_t compressed_length;

As far as I know, the meaning of name was never documented anywhere, so here's some potential new information: It is a modified CRC32-like hash of the filename (more on that in a future post).

The start_offset is relative to the beginning of the file. In my tests, the WAD file is sometimes padded between payload entries, which might be required for alignment. Looking at the different file formats, a 16-byte alignment might be good enough, but I haven't found conclusive evidence for how the padding and alignment is calculated.

The length field contains the uncompressed size of the WAD entry data.

In case the file isn't compressed at all, compressed_length will contain the same value as length. If they differ, it means that the payload is compressed in the WAD file and needs to be uncompressed.

Based on my research, two compression algorithms seem to be available:

For the zlib variant, the uncompressed length can be calculated as length & 0x7fffffff, since the MSB is used as a flag and (obviously) not part of the uncompressed size.

Decompressing the zlib-compressed data is straightforward, just pass the compressed bytes to inflate() from zlib or use zlib.decompress() in Python.

Interestingly, the zlib compression seems to only be available in Pure EUR, but not in the USA and JPN releases. If you look at the release dates, it might very well be that the developers had a bit more time for the EUR release (according to Wikipedia):

This will become important once we start talking about converting DLC between different regions, as the JPN and USA releases will not be able to read zlib-compressed data. Thankfully, if you have the EUR version, you do not have to worry about this, since it has the widest selection of compression algorithms (LZSS or zlib instead of just LZSS), and the USA/JPN DLCs only use LZSS.

The LZSS Compression

The LZSS compression variant used in the WAD files has a 8192-byte circular lookback buffer, with 1 bit determining whether we read a verbatim byte or look back, and for the lookback case, a 13-bit offset for the lookback and a 4-bit lookback length (the lookback length is stored "minus 3", so that up to 18 bytes can be repeated).

Here is some pseudo-code to illustrate the decompression algorithm:

while (remaining_output_bytes > 0) {
    control_bit = read 1 bit
    if (control_bit == 1) {
        verbatim_byte = read 8 bits
        output verbatim_byte

        insert verbatim_byte into lookback_buffer (possibly rollover)
    } else {
        lookback_buffer_offset = read 13 bits
        repeat_count = 3 + (read 4 bits)

        repeat (repeat_count) times {
            lookback_byte = lookback_buffer[lookback_buffer_offset++] (possibly rollover)
            output lookback_byte

            insert lookback_byte into lookback_buffer (possibly rollover)

With this, it should be possible to read all WAD files from WipEout Pure and WipEout Pulse, plus the WipEout Pulse DLC files (EDAT file extension).

Encrypted and Signed WAD files

The WipEout Pure DLC files named "PI.WAD" (for "plugin" WAD) also use the WAD file format, but they do have a region-specific encrypted 256-byte signature at the end, and the payload itself is encrypted using a key from the signature. More on that also in a future post. Once the signature and encryption is removed, the file can be treated like any other WAD file as described above.

Continue reading: Filename hashing in WipEout Pure/Pulse WAD files
Thomas Perl · 2020-12-19