Analysing functions called through Procedure Linkage Table
The focus of this is post is on extracting symbols of functions called through the Procedure Linkage Table (PLT) section in ELF binaries. As with many of my posts, I do not claim that the described approach is the most proper or best solution to the problem. Instead, I aim to share what worked for me — firstly, so I can document what I did, and secondly, in the hope that someone else may find it helpful. Although I focus on AArch64, the described approach should work for any architecture, though the exact instructions used by the PLT section will vary.
Let’s start with a simple example of a binary disassembled using objdump
:
[...]
Disassembly of section .plt:
00000000000005d0 <.plt>:
5d0: a9bf7bf0 stp x16, x30, [sp, #-16]!
5d4: 90000090 adrp x16, 10000 <__FRAME_END__+0xf768>
5d8: f947d211 ldr x17, [x16, #4000]
5dc: 913e8210 add x16, x16, #0xfa0
5e0: d61f0220 br x17
5e4: d503201f nop
5e8: d503201f nop
5ec: d503201f nop
00000000000005f0 <__libc_start_main@plt>:
5f0: 90000090 adrp x16, 10000 <__FRAME_END__+0xf768>
5f4: f947d611 ldr x17, [x16, #4008]
5f8: 913ea210 add x16, x16, #0xfa8
5fc: d61f0220 br x17
[...]
0000000000000630 <printf@plt>:
630: 90000090 adrp x16, 10000 <__FRAME_END__+0xf768>
634: f947e611 ldr x17, [x16, #4040]
638: 913f2210 add x16, x16, #0xfc8
63c: d61f0220 br x17
Disassembly of section .text:
[...]
0000000000000754 <main>:
754: a9be7bfd stp x29, x30, [sp, #-32]!
758: 910003fd mov x29, sp
75c: 90000000 adrp x0, 0 <__abi_tag-0x278>
760: 911e8000 add x0, x0, #0x7a0
764: f9000fe0 str x0, [sp, #24]
768: f9400fe1 ldr x1, [sp, #24]
76c: 90000000 adrp x0, 0 <__abi_tag-0x278>
770: 911ea000 add x0, x0, #0x7a8
774: 97ffffaf bl 630 <printf@plt>
778: 52800000 mov w0, #0x0 // #0
77c: a8c27bfd ldp x29, x30, [sp], #32
780: d65f03c0 ret
[...]
What we are interested in here is how objdump
identifies that call to address 0x630
is, in fact, a call to an external function, printf
, called though the PLT. The question is: how does it do this? If the answer were simply to find the address in the symbol table and retrieve the associated string, I probably would not be writing a blog post about it. The truth is that, while it is not overly complicated, it does involves a bit of ELF magic.
To achieve this, we need to examine three sections from the ELF file: plt
, rela.plt
and dynsym
. The plt
section is used to identify a specific entry accessed by the bl
instruction. The dynsym
section contains symbols for all external functions called by the binary, while rela.plt
holds the relocation information for the called external functions. The idea is to use the PLT section to index into the relocation section, then use the address in the relocation entry to access the corresponding symbol in the dynamic symbol table. But how exactly can we do this?
Entries in the rela.plt
section have the following structure, and they are laid out in the order in which they are accessed by the PLT section:
typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
Disassembly of section .rela.plt:
0000000000000540 <.rela.plt>:
540: 00010fa8 .word 0x00010fa8
544: 00000000 .word 0x00000000
548: 00000402 .word 0x00000402
54c: 00000003 .word 0x00000003
...
558: 00010fb0 .word 0x00010fb0
55c: 00000000 .word 0x00000000
560: 00000402 .word 0x00000402
564: 00000005 .word 0x00000005
...
570: 00010fb8 .word 0x00010fb8
574: 00000000 .word 0x00000000
578: 00000402 .word 0x00000402
57c: 00000006 .word 0x00000006
...
588: 00010fc0 .word 0x00010fc0
58c: 00000000 .word 0x00000000
590: 00000402 .word 0x00000402
594: 00000007 .word 0x00000007
...
5a0: 00010fc8 .word 0x00010fc8
5a4: 00000000 .word 0x00000000
5a8: 00000402 .word 0x00000402
5ac: 00000009 .word 0x00000009
Relocation section '.rela.plt' at offset 0x540 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000010fa8 000300000402 R_AARCH64_JUMP_SL 0000000000000000 __libc_start_main@GLIBC_2.34 + 0
000000010fb0 000500000402 R_AARCH64_JUMP_SL 0000000000000000 __cxa_finalize@GLIBC_2.17 + 0
000000010fb8 000600000402 R_AARCH64_JUMP_SL 0000000000000000 __gmon_start__ + 0
000000010fc0 000700000402 R_AARCH64_JUMP_SL 0000000000000000 abort@GLIBC_2.17 + 0
000000010fc8 000900000402 R_AARCH64_JUMP_SL 0000000000000000 printf@GLIBC_2.17 + 0
But how do we determine which function is called, considering the PLT section is just a block of code? First, we need to observe that the PLT table (in the case of AArch64) consists of repeated sequences of the same four instructions, with few additional instructions at the beginning of the section. To calculate the index into the relocation section, given the address used by the bl
instruction, we use the following formula:
Index = (Called Address - PLT Start Address - 0x20) / 16
The first two values are fairly self-explanatory. As for the “magic” numbers: 0x20 represents the size of the initial “extra instructions” at the start of the plt
section, and 16 (4 instructions x 4 bytes each) is the size of each instruction sequence used to call external functions. Once we have this index, the rest is straightforward. We use it to find the address of the function in rela.plt
and then that address to get the symbol in .dynsym
.
Finally, we can implement this in Python using pyelftools. Below is a snippet of code, source from the mambo-lift repository:
get_plt_symbol_by_addr(filename, addr):
with open(filename, "rb") as binary:
elf = ELFFile(binary)
dynsym = elf.get_section_by_name(".dynsym")
rela_plt = elf.get_section_by_name(".rela.plt")
plt = elf.get_section_by_name(".plt")
offset = int((addr - plt["sh_addr"] - 0x20) / 16)
idx = rela_plt.get_relocation(offset)["r_info_sym"]
return dynsym.get_symbol(idx).name.strip()
And that is it! ELF files can initially be quite challenging to work with, but once you understand how the different sections fit together, everything becomes much easier. Since the aim of this blog post is to address specific problems related to ELF files, I linked below some resource on understanding how ELF binaries work:
- k3170 Series on ELF Format and Runtime Internals
- Wikipedia
- ELF for the Arm® 64-bit Architecture (AArch64)
- Oracle Documentation