I’ve recently been working a lot with parsing Mach-O files, so I’m begining to understand in a fair bit of detail how they are structured and how they work. I’ve been developing a library, called libhelper, which can parse Mach-O files. Libhelper-macho also powers Img4helper, and HTool.
- Mach was initially hosted as additional code written directly into the existing 4.2BSD kernel, allowing the team to work on the system long before it was complete. Work started with the already functional Accent IPC/port system, and moved on to the other key portions of the OS, tasks and threads and virtual memory.
- Mach Write The iCloud RTF, TXT, and PDF Editor for iOS and macOS. Powerful Text Editor, on all your Devices. Mach Write is a powerful new RTF, TXT, PDF (Rich Text Format, Plain Text, PDF) Editor for iOS and macOS! Featuring full iCloud document synchronization, RTF in multiple styles, sizes, and colors; as well as PDF Viewing/Creating.
This is not a complete writeup or documentation covering everything about Mach-O’s, and I appreciate this has probably been covered to death. It’s not aimed at those who already have an advanced knowledge of how Mach or Darwin works, rather it’s aimed at those who are in a position I was a few weeks ago, having limited knowledge of how Mach-O’s are structured. However I felt this would be a useful resource, and a good way to kick-off my Blog.
There are multiple types of Mach-O, such as Executable or KEXT Bundles, so I can’t cover them all. My aim for this post is to discuss the basics - namely Header, Load Commands and Segment Commands. I may discuss other areas in the future but this is a start.
What are Mach-O files
Nov 15, 2020 Mach Write is a powerful new RTF, TXT, and PDF editor (Rich Text Format, Plain Text, PDF) for iOS and OS X! Featuring full iCloud document synchronization, RTF in. Mach number (M or Ma) (/ mɑːk /; German: max) is a dimensionless quantity in fluid dynamics representing the ratio of flow velocity past a boundary to the local speed of sound. Mach Write is a powerful new RTF, TXT, PDF (Rich Text Format, Plain Text, PDF) Editor and Viewer for iOS and OS X! Featuring full iCloud document synchronization, RTF in multiple Global Nav Open Menu Global Nav Close Menu.
Mach-O files, or Mach Object Files, are an executable format used on Operating Systems based on the Mach Kernel. This includes Apple’s Darwin iOS, macOS, watchOS etc. There are multiple types of Mach-O file, such as executables, object-code, shared and dynamic libraries, kernel extension (KEXT) bundles and even debug companion files.
Mach-O Format
Mach-O files are simply binary files, there isn’t particularly anything special about them in that regard. You can read in some bytes into a C structure and boom, you’ve parsed a Mach-O (or at least part of it). Natively, they can only be run on Mach/Darwin/XNU-based systems, however there are some implementations for loading and executing Mach-O files on Linux. Although you can run simple applications this way, the majority of applications will not work due to reliance on certain macOS libraries, such as /usr/lib/libSystem.B.dylib
.
A Mach-O is made up of one Mach header, a number of load commands (specified in the header) and the data. The data is organised into Segments, which are made up of 0 to 255 Sections, and there special load commands to describe them. Mach-O files are organised as follows:
- Mach-O Header
- Load Commands
- Data
The purpose of this article is to discuss, at a higher level, each of these areas of a Mach-O file, how data is organised and how to load this data from a given Mach-O file into relevant C structures.
Header
Starting with the Mach Header. It’s purpose is to describe what the file contains, and how the Kernel and Dynamic Linker should handle it. The first 4 bytes are, like with any file, it’s “Magic Number”. A Magic Number is used to identify a file format. In the case of Mach-O’s there are three Magic Numbers that one may come across. 0xfeedface
for 32-bit, 0xfeedfacf
for 64-bit and 0xcafebabe
for Mach Universal Binaries / Object files.
Other properties of a Mach-O Header include the cpu type and sub type which define the architecture the Mach-O is built for (e.g. arm64
, x86_64
, arm64_32
), the number of Load Commands and the size of that area and flags to be passed to the Dynamic Linker. The layout of the header is shown below:
The Mach-O header takes up 32 bytes for 64-bit files, at 28 bytes for 32-bit files. You can populate the the header structure by memcpy()
the correct size into a mach_header
structure, and you’ll be able to access the header elements as normal.
Load Commands
Load Commands are placed directly after the Mach-O header in the file. They specify the logical structure of the file and the layout of the file in virtual memory.
All Load Commands have a common 8 byte structure which identifies the type of the command and it’s size. This common structure is defined as follows:
There are over a dozen Load Commands, some are common across all Mach-O’s and some are only found in certain cases. Load Commands placed after the Mach-O header, with the first being Segment Commands. These are discussed further under Segment Commands.
But Segment Commands are not the only commands that are included in the majority of Mach-O files. The LC_DYLD_INFO
and LC_LOAD_DYLINKER
commands specify information such as rebase, bind, weak, lazy and export information for the Dynamic Linker, and the path of the Dynamic Linker the Kernel should use to execute the binary respectively. Mach-O’s frequently require Dynamic Libraries, especially /usr/lib/libSystem.B.dylib
. The LC_DYLIB
command defines the path for Linker to find the Dylib, and there can be however many of these commands as are required for the number of Dynamic Libraries.
The offset and sizes for both the symbol table and the string table are defined with LC_SYMTAB
, and offsets for local, external, undefined and other types of dynamic symbols are defined with LC_DYSYMTAB
The last command that I will discuss here is LC_MAIN
which defines the offset for the entry point, so where the Kernel should start executing the binary from. This is only used for MH_EXECUTE
filetypes.
Below is output from an experiemental version of htool showing all of the Load Commands from itself. I’ve ommited some parts because the output is rather long.
Mac Writer
Going back to struct load_command
. Looking at it from the perspective of trying to parse Mach-O’s having a constant format for the first 8 bytes of each Load Command makes detecting and parsing them easier. The following is an example of how we can parse a command, using LC_MAIN
as an example. The code is based off XNU’s loader.h
rather than libhelper
.
If you are interested in learning more about the different types of Load Commands, you can either checkout EXTERNAL_HEADERS/mach-o/loader.h
in the XNU sources, or include/libhelper-macho/macho-command-types.h
from Libhelper.
Segment Commands
Going back to Segment Commands, the first couple of Load Commands in a Mach-O are either LC_SEGMENT
for 32-bit, or LC_SEGMENT_64
for 64-bit. These define an object files Segments.
If you are unfamiliar with how object files work, you have a number of these segments. The __TEXT
segment contains the instructions that will be executed by the CPU, and the __DATA
segment contains both static local variables and global variables. These are both standard, however you may find additional segments such as __PAGEZERO
and __LINKEDIT
, and in XNU Kernelcaches, you’ll get even more funky segment names like __PRELINK_INFO
and __LAST
.
Segments are further divided into sections, so for example you’ll find __cstring
in the __TEXT
segment, formatted as __TEXT.__cstring
, as a common one.
The Segment Commands in a Mach-O define what regions of the binary data should be mapped into memory as what. So looking at the segment_command_64
struct, there’s the segments name as segname
, but then we have two sets of address/sizes.
The vmaddr
and vmsize
define the virtual memory address and size for this segment And fileoff
with filesize
for the segments location and size within the file. maxprot
and initprot
define virtual memory protection for the segment in memory, so this may prevent it from being both writable and executable at the same time. Finally is the flags, which are just a way of giving the Kernel options for loading the segment into memory.
Like I said, we have segments which are divided into sections. These sections are placed directly after the segment command, are included in the cmdsize
and are counted with nsects
. Again, sections essentially dividing up segments into more meaningful chunks, for example __TEXT.__text
or __TEXT.__const
.
To load these, we must take the offset of the segment command in the file, add the size of the segment structure, and then loop through nsects
times, incrementing the offset by the size of the section struct each time.
To start, the section structure is defined as follows. Again, there are both section_64
and section
structures, with the difference being the 64-bit section_64
struct uses uint64_t
for both addr
and size
, and has a third reserved
property at the end of the structure although it is not designated for any optional properties:
As I just stated, we can load the correct data into that structure by adding sizeof (segment_command_64)
to the offset of the command in the file, then add sizeof(section_64)
for each of segment->nsects
. Here is an example of what I mean (note this time I am using libhelper code to demonstrate):
The mach_segment_info_t
struct is not implemented in XNU’s standard loader.h
, so if you’re writing your own Mach-O parser, please ignore references to Libhelper structs.
Looking at this function in more detail. Two arguments are passed to mach_segment_info_load
, an unsigned char *data
pointer to the Mach-O loaded in memory, and an uint32_t offset
which points to the start of the segment command within that data
pointer. This offset is relative to the start of the Mach-O, not the start of the load commands.
Ignoring the code that checks and sets up the mach_segment_command_t
, it starts by calculating the offset of the first section. This is done by adding the offset
passed to the function to the sizeof()
the segment command structure.
The segment command has nsects
containing the amount of sections placed after the command. So, we loop round the number of sections from segment->nsects
and create mach_section_64_t
’s for each one. We can use memcpy()
to to copy the ssize
amount of bytes we need. We can set the start point for the copying by adding the offset to the data pointer. By doing this, we are incrementing the pointer by the offset, resulting in it pointing to, in this case, the start of the current section struct.
Calling h_slist_append()
can be ignored. This is simply adding the section to a Statically-linked list in a libhelper macho_t
structure.
Mach_vm_write
The last bit of interest here, make sure to increment sectoff
by the size of the mach_section_64_t
struct, so sectoff
will point to the next section structure.
If you are interested, please take a look at libhelper. It has a Mach-O parser that I wrote, and you’ll find the example above.
Data
The actual data, so that is instructions and variables, in a Mach-O are stored after the Load Commands region. Depending on the type of Mach-O, the way this region is used varies.
So, for example. An executable - meaning a Mach-O with the filetype
of MH_EXECUTE
- would have the segment commands laying out the data region, and a LC_MAIN
command specifying the offset of the entry point instruction the Kernel should jump too when loading. The Kernel will also start the Dynamic Linker specified in the LC_DYLD_INFO
command, and link any specified dylib’s with LC_LOAD_DYLIB
.
This entire region is mapped out by the segment commands. We can inspect this mapping with Mash, or Mach-O Shell, which is part of HTool. Loading the file, we can inspect a particular segment like so.
To print a segment, we use p seg __TEXT
. This is the short version, if you prefer print segment __TEXT
would also work fine. The first line of the output display’s the start and end addresses of the __TEXT
segment, and it’s total size in bytes.
Underneath, slightly indented, are each of the sections contained within the segment. For example, we can see that the __TEXT.__stubs
section is 390 bytes, and is located from 0x10000f4f0
to 0x10000f676
.
Mach3 Write
Two things to note about these addresses, first they are the virtual memory addresses, and second they are relative to the start of the data, not the start of the Mach-O. Before this __TEXT
segment is a __PAGEZERO
segment ranging from 0x000000000
to 0x100000000
.
Summary
This is only an introduction to Mach-O files. I’d like to continue writing about them and maybe even write a Mach-O loader for Linux.
I hope I covered this fairly well, any feedback would be greatly appreciated. I aim to write these blog posts more often and hopefully they’ll improve over time - both in quality and technical accuracy. For now, you can download Img4helper which you can use to extract Apple Image4 files from the Downloads page linked above, Libhelper sources are available here if you’d like to look at my Mach-O parser, and htool
will be available soon.
You can contact me either via Twitter (@h3adsh0tzz), Email (me@h3adsh0tzz.com), my iOS Security Discord server (https://discord.gg/CfNnCs8) or on irc.cracksby.kim :-).