Suppose you want to reverse-engineer some of Darwin’s binary source code. I want to show you the technique I used to find the binary source code quickly.
We know Apple put a part of its open-source code in this https://github.com/apple-oss-distributions. When it comes to Darwin, some components are also available.
The open-source code is split into “projects.” For instance, the binary mv
source code lies in the file_cmds
project.
Then, if we know the project, we can quickly look for it on the Apple OSS distribution repositories. The framework is the following:
Let’s try with cat
command.
cat
For this first example, I’ll detail the whole workflow, and then you can conceptualize how we get the final result.
First, we should locate where the binary is using where
command.
➜ ~ where cat
/bin/cat
You will notice that a lot of binaries are in /bin
. But it is not always the case.
Now that we know where the binary is let’s inspect its content. There is a __const
section in the __TEXT
segment of the binary that contains all nonrelocatable data. This is where you will find the project’s name associated with this binary.
💡In macOS, executable files are stored in this Mach-O format. The
__TEXT
segment contains the executable code and__const
is a subsection designated for constant data. A “segment” is a broad division of a program’s memory space, while a “section” is a subdivision within a segment.
Let’s inspect the content of the binary with otool
:
➜ ~ otool -v -s __TEXT __const /bin/cat
/bin/cat:
Contents of (__TEXT,__const) section
0000000100003ef0 40 28 23 29 50 52 4f 47 52 41 4d 3a 63 61 74 20
0000000100003f00 20 50 52 4f 4a 45 43 54 3a 74 65 78 74 5f 63 6d
0000000100003f10 64 73 2d 31 35 34 0a 00 00 00 00 00 00 40 63 40
0000000100003f20 24 46 72 65 65 42 53 44 24 00
Not very useful, right? Let’s try to convert this:
➜ ~ otool -v -s __TEXT __const /bin/cat | xxd -r -p
>?@(#)PROGRAM:cat ? PROJECT:text_cm?ds-154
@c@? $FreeBSD$%
Still ugly, right? This is not a proper conversion of the binary content, but it could help to use strings
command that “find the printable strings in an object, or other binary, file.”
➜ ~ strings /bin/cat | grep \#
@(#)PROGRAM:cat PROJECT:text_cmds-154
💡What
@(#)
means? Originates from a convention used in the Version Control System known as an SCCS identifier or keyword. While SCCS is not as widely used today, its conventions, such as these identifiers, can still be found in older codebases or systems that have evolved from Unix-based environments.Hopefully, we have a more straightforward way to do it :) Usingwhat
command:
The what utility searches each specified file for sequences of the form
“@(#)” as inserted by the SCCS source code control system. It prints the
remainder of the string following this marker, up to a NUL character,
newline, double quote, ‘>’ character, or backslash.
Let’s use it:
➜ ~ what -s /bin/cat
/bin/cat:
PROGRAM:cat PROJECT:text_cmds-154
Now we have the project name. Let’s find it on GitHub:
launchd
Let’s repeat the previous steps but with launchd.
➜ ~ where launchd
/sbin/launchd
💡sbin
stands for “system binaries.” These are typically commands used by the system administrator tasks that are not intended for regular users. In contrast, /bin
contains essential user binaries.
➜ ~ what -s /sbin/launchd
/sbin/launchd:
PROGRAM:launchd PROJECT:libxpc-2462.141.1.701.3
Let’s find libxpc
on Github:
Oops! Unfortunately libxpc
is close-source:
[…] which used to be open source but was closed in 10.10 following its integration into the libxpc project. J.Levin - *OS internals
Once I realized that @(#)
could be defined by code maintainers; I tried to find for @(#)
occurrences into cat
code. But unfortunately, I did not find the project name constant in the code, but just:
I wonder if the build system adds the string into this specific segment. I need to dig more to understand this. Maybe the topic for the next post?