Kaitai Struct is a general-purpose declarative language for describing binary data structures. With it we can parse binary file formats, in-memory data structures, network packets, etc.

The target format to be parsed is first described in the Kaitai Struct language (KSY) and then compiled to source files that can be imported as libraries in one of the several programming languages it supports, such as Python, C++, Java, Lua, JavaScript, etc. The resulting file(s) from the compilation, together with the specific bindings for the chosen language, will allow us to easily access the parsed binary format fields and structures.

Quick introduction to the KSY format

The full user guide to the KSY format can be found on the Kaitai website. We’ll only outline the (simplified) very basics here:

KSY is a YAML-based format describing types and structures, distributed in several sections:

meta: Contains metadata about the target binary format we are parsing, such as identifiers or the default endianness

seq: Describes an ordered sequence of elements (attributes), such as the element identifier, type and size (or literal contents, e.g. magic numbers).

enums: Maps integer constants to symbolic names for clarity, which can then be referenced using the enum key.

types: Declares user-defined named types, each of which can contain any of the elements above, including other types elements.

instances: Describes structures that lie outside of normal sequential parsing flow or just need to be loaded only by special request.

Practical example: ESP8266 firmware image

To demonstrate how an arbitrary binary format can be described with the KSY format we will use the ESP8266 firmware image header format. A brief file format description by the manufacturer (Espressif) can be found here:

The firmware file consists of a header, a variable number of data segments and a footer. Multi-byte fields are little-endian.

However, that’s not entirely accurate. As it turns out, a firmware image can contain multiple sections, each one containing a header, one or more segments and a footer, as depicted below:

For example, when an image supports OTA upgrades (which is pretty common these days) a bootloader must be flashed in the first 64KB of the memory.

Each section header contains the necessary information to map the segment(s) it contains into RAM (see the memory layout). Right after the section header, the first segment starts, with a small header containing the memory offset where it should be mapped and its size. If there were more segments present, they would be laid out immediately afterwards.

When the last segment ends, the whole section is padded with zeros until its size is one byte less than a multiple of 16 bytes. A last byte (thus making the section size a multiple of 16) is the checksum of the data of all segments. The checksum is defined as the xor-sum of all bytes and the byte 0xEF.

Kaitai Struct Language

As we saw in the section above, a firmware image is made up of sections. First, we describe the section header, which always starts by a magic number followed by a number of fields:

section_header:
 seq:
  - id: magic
    contents: [0xE9]
  - id: num_segments
    type: u1
  - id: flash_interface
    type: u1
    enum: e_spi_flash_mode
  - id: flash_size
    type: b4
    enum: e_flash_size
  - id: flash_speed
    type: b4
    enum: e_flash_speed
  - id: entrypoint
    type: u4
 enums:
   e_spi_flash_mode:
    0: qio
    1: qout
    2: dio
    3: dout
   e_flash_size:
    0: size_512k
    1: size_256k
    2: size_1m
    3: size_2m
    4: size_4m
   e_flash_speed:
    0: speed_40mhz
    1: speed_26mhz
    2: speed_20mhz
   0xf: speed_80mhz

We use the contents key to check for the correct value (0xE9) at the segment start. Both num_segments and flash_interface are single unsigned bytes so we use u1. Same thing for flash_size and flash_speed, which are 4 bits each: b4. Since the flash size and speed are named integer constants, we can use the enum type for clarity.

Now, a segment has three different fields: the memory offset where it should be located, its size, and the actual segment data:

segment:
 seq:
  - id: memory_offset
    type: u4
  - id: segment_size
    type: u4
  - id: data
    size: segment_size

We can just back-reference the segment_size field using the size key, and Kaitai will do all the heavy lifting when we compile our KSY file to our target programming language!

The last part of a section is the footer, which is just a zero padding to 16 bytes, the last byte being the checksum of all segments on that section:

section_footer:
 seq:
  - id: padding
    type: u1
    repeat: expr
    repeat-expr: 15 - (_io.pos % 16)
  - id: checksum
    type: u1

Now, we can easily glue all three parts together (header, segments, footer) as follows:

section:
 seq:
  - id: header
    type: section_header
  - id: segments
    type: segment
    repeat: expr
    repeat-expr: header.num_segments
  - id: footer
    type: section_footer

In a similar way as we did in segment, we reference the num_segments value from the header field to be used as the repeat-expr target, that is, Kaitai will read as many elements as instructed by that value.

Inspecting and parsing a binary file

The Kaitai Web IDE is a handy tool when it comes to inspect our target, because it allows us to write the KSY file and watch how the different fields and structures we declare get parsed in real-time. We will use the Tasmota firmware for the ESP8266-based Sonoff devices.

As we mentioned before, ESP8266 firmware images can contain more than one section; in this case a bootloader lives in the first 64kB, followed by the rest of the code and data:

seq:
  - id: bootloader
    type: section
    size: 0x1000
  - id: code
    type: section

And that’s it! If we load both our target file and KSY code in the Web IDE, we can see how the parsing gets done:

Compiling the KSY source to a Python class

Now that we have a fully functional KSY source, we can compile it to any supported language of our choice.

We will use Python to demonstrate how it works:

kaitai-struct-compiler --target python esp8266-image.ksy

The command above will generate an esp8266_image.py file containing a Esp8266Image class which will allow us to parse and manipulate all the fields we described from a Python script:

from esp8266_image import *

# Instantiate Esp8266Image object
target = Esp8266Image.from_file("sonoff-basic.bin")

# Function to print several values from a Section object
def printSectionInfo(section):
  print(f"[+] Flash size: {section.header.flash_size}")
  print(f"[+] Flash speed: {section.header.flash_speed}")
  print(f"[+] Entrypoint: {hex(section.header.entrypoint)}")
  print(f"[+] Num. segments: {section.header.num_segments}")

  for i, segment in enumerate(section.segments):
    print(f" |----[Segment {i}]")
    print(f" |----> Offset: {segment.memory_offset}")
    print(f" |----> Size: {segment.segment_size} bytes")
    print(" .")

# Print bootloader section info
print("nBootloader")
print("----------")
printSectionInfo(target.bootloader)

# Print code section info
print("nCode")
print("----")
printSectionInfo(target.code)

The simple code above will look like the following when executed:

Bonus: compiling to graphviz

It’s also possible to compile the KSY file to graphviz as follows:

kaitai-struct-compiler --target grahpviz esp8266-image.ksy

The resulting file then can be rendered with any graphviz engine such as dot, producing images similar to the following:

Full KSY file

meta:
 id: esp8266_image
 file-extension: esp8266_image
 endian: le
seq:
 - id: bootloader
   type: section
   size: 0x1000
 - id: code
   type: section
types:
 section:
  seq:
  - id: header
    type: section_header
  - id: segments
    type: segment
    repeat: expr
    repeat-expr: header.num_segments
  - id: footer
    type: section_footer
    section_header:
 seq:
  - id: magic
    contents: [0xE9]
  - id: num_segments
    type: u1
  - id: flash_interface
    type: u1
    enum: e_spi_flash_mode
  - id: flash_size
    type: b4
    enum: e_flash_size
  - id: flash_speed
    type: b4
    enum: e_flash_speed
  - id: entrypoint
    type: u4
 enums:
  e_spi_flash_mode:
    0: qio
    1: qout
    2: dio
    3: dout
  e_flash_size:
    0: size_512k
    1: size_256k
    2: size_1m
    3: size_2m
    4: size_4m
  e_flash_speed:
    0: speed_40mhz
    1: speed_26mhz
    2: speed_20mhz
    0xf: speed_80mhz
 segment:
  seq:
   - id: memory_offset
     type: u4
   - id: segment_size
     type: u4
   - id: data
     size: segment_size
 section_footer:
  seq:
   - id: padding
     type: u1
     repeat: expr
     repeat-expr: 15 - (_io.pos % 16)
   - id: checksum
     type: u1

Official documentation

Other interesting resources