Page cover

Linker Scripts

How to write a Linker Script.

Why Learn about Linkers?

Learning to write a linker script is vital when developing an operating system or hypervisor. Usually, user-space development will include linker scripts with the toolchain; however, these are unsuitable when writing a kernel.

When writing a kernel, we must write our linker to link the bootloader and kernel object files together to produce a kernel image. Admittedly, writing a linker is not a skill many possess; nevertheless, it is vital when developing a kernel or hypervisor.

The GNU Linker (ld) and Linker Script are well-documented and include more information than you probably need. I will do my best to synthesize this to include the most important parts. If you're interested in a comprehensive study, here are a few helpful resources.

Additional Resources

  • Linkers and Loaders (Levine): This book is old, but it is gold! Look no further to learn about linkers and loaders.

  • Advanced Compiler Design (Muchnick): This is the gold standard for learning about compilers and includes a lot of good information on linkers. You'll learn about PLT and GOT, the Procedure Linkage Table, and the Global Offset Table, which are crucial for understanding dynamic linking and reverse engineering.

  • Blog Series (Ian Lance Tayler): Ian is the developer of GOLD, a linker for ELF binaries, and is also included in GNU Binutils. He knows his stuff!

What Is a Linker?

Linkers are an important part of a development toolchain used by any low-level programmer. Other important components include the compiler and assembler.

In a future post, I'll discuss toolchains and cross-compilers, which are important when targeting an architecture different from the local computer. My computers have Intel, M2, and AMD/NVIDIA(GPU) architectures, and I will target a RISC-V architecture.

Most developers include GNU binutils in their toolchain, which includes ld, the GNU linker. To understand linkers, you must first understand why they are needed. Compilers generate object files for each source code file that contain information about that source file. The object file information, however, is incomplete; most source files reference other source files to include part of the code. Source files from other programs can also be referenced to include portions of the program code in the current program.

The linker is responsible for combining these object files into a single object file or binary. It is also responsible for reorganizing memory so that the combined pieces fit together; this is done by combining similar sections. Finally, the linker must modify the addresses to allow the program to run under the new memory organization.

Object Files

Here is an example from a recent program that I wrote. After running make, the compiler produces three object files (main.o, diffiehellman.o, rsa.o, and utility.o) and combines them into a single binary named crypto_pk.

Make
Object Files and Binary

Types of Linking

Linkers can link statically or dynamically. Static linking implies that references are fully resolved before runtime. Dynamic linking means that the location where the library will be loaded isn't known until runtime; instead, the references are resolved dynamically during runtime.

When the compiler's assembler runs, it doesn't know the address of external references, so it places a zero in the object file. This is the incomplete information that I alluded to in the previous section. The linker is responsible for solving this reference, and it does so either dynamically or statically. In most cases, references to shared libraries are solved dynamically during runtime. The linker builds a jump table, and a dynamic loader fills the table.

I will include additional notes that explore linkers and loaders more in-depth.

Analyzing a Linker Script

On Linux, run ld --verbose to view the default linker script your system uses. I will briefly introduce the linker script, covering only the main parts. If you're interested in learning more, check out some of the resources listed above.

GNU Linker Script

GNU Linker Script Explanation

  • OUTPUT_FORMAT(): This declarative allows you to provide an output format for your executable. For a listing of acceptable formats, run objdump -i.

  • ENTRY(): ENRTRY allows you to include the symbol defined in your program in the .text section that represents the first byte of executable code, the entry point. Check out the disassembled example below, which shows the start entry point. Note: The program (that only prints) was written in C and then compiled in the disassembled code. The start section includes a lot of extra work to prepare the program, such as initializing registers, getting command-line arguments, calling main(), and handling exit().

  • SECTIONS(): This allows you to define a structured format in the output (object) file by segmenting the data in memory. The linker script allows the developer to control the data type in each section. To learn about ELF binaries and the included sections, visit the Linux manual page. If you are interested in reverse engineering, bookmark the link for ELF binaries.

    • Declaring a section follows the format .section and sections are interpreted in the order they are listed. For example, .text is the section in ELF binaries that includes your program's executable code.

    • You can also map subsection names using a wildcard to a specific object or file. For example, *(.text.unlikely .text.*_unlikely .text.unlikely.*) will match any section that matches the pattern. Notice the wildcard before the parenthesis; alternatively, we could specify the object file like a startup.o(.text.unlikely ...); however, the wildcard is more common because file names can change. Why are there so many alias names? It allows the developer to organize the code by controlling where it is placed in memory. In the example above, sections matching the text pattern are unlikely to be executed. Grouping that code together can, for example, improve cache locality by letter hot code (code that is frequently executed) being grouped together.

    • PROVIDE(): provides a symbol that can be referenced code.

  • MEMORY(): The memory declaration is not included in the example above; however, it is very important for kernel developers. The example below shows how a memory region is declared with access attributes. Sections defined in SECTIONS can map to a specific region by adding the following after the closing bracket: >RAM AT>ROM. The first ( >RAM ) means to store the preceding section in RAM, and ( AT>ROM ) sets the LMA (load memory address) to ROM or read-only memory.

Code Examples

Memory Declarative

Disassembled Example (Objdump Output)

Last updated

Was this helpful?