Building an ELF File
This is the next installment in my GSoC project! This week, attempting to make an ELF file!
The most useful resource for me in working on this bit (besides the always helpful Linkers & Loaders by Levine) has been this hello world ELF tutorial which has you generate an ELF executable file and then walks you through some of its components. If you want a more general overview of ELF files before diving into the nitty gritty of that tutorial, I recommend this guide. It doesn't go as in depth, but is a bit easier to read than the tutorial and gives you enough information to follow along easily with the next bits without having to look through an ELF file yourself.
My initial approach was to use the .interp section to call perl6 to interpret arbitrary source code included in the .text section, rather than using the system's standard loader, ld-linux.so. This approach did not wind up working, but the method used to create the ELF header and program section headers will be exactly the same in future versions. Additionally, attempting to subvert the .interp section is what allowed me to find a different path forward.
You can use the endianness (and the other system information) you find there in your homemade ELF header. Now, if you're trying to make a more generalizable ELF file builder, that system information needs to be changeable to match whatever system the file is going to be run on. But for now, this is a good enough starting place to see if a homemade ELF file will work. Once all the header information has been populated in the ELF header struct, write that struct to a file. Then, try using the same readelf command as before on your generated file to see if your system can read it. If it works, all your information is correct so far! If not, check back over the included info, particularly making sure that your magic number, OS/ABI, and machine type are set correctly.
The most useful resource for me in working on this bit (besides the always helpful Linkers & Loaders by Levine) has been this hello world ELF tutorial which has you generate an ELF executable file and then walks you through some of its components. If you want a more general overview of ELF files before diving into the nitty gritty of that tutorial, I recommend this guide. It doesn't go as in depth, but is a bit easier to read than the tutorial and gives you enough information to follow along easily with the next bits without having to look through an ELF file yourself.
My initial approach was to use the .interp section to call perl6 to interpret arbitrary source code included in the .text section, rather than using the system's standard loader, ld-linux.so. This approach did not wind up working, but the method used to create the ELF header and program section headers will be exactly the same in future versions. Additionally, attempting to subvert the .interp section is what allowed me to find a different path forward.
Making the ELF Header
I hit quite a few roadblocks while trying to make this ELF file. For one, I didn't initially realize that the magic number here actually included additional information than the raw "magic number" ("\177ELF"), which wound up meaning my computer was unable to read the file as an ELF file.
If you're following along at home in building an ELF file, what I would recommend is that you open <elf.h>, pick the appropriate header for your machine (in my case, I used the 64 bit header), and go down the list of things in the header and set them one by one until you've run through the list. If you hit something where you aren't sure which option you should select (for example, if you don't know the endianness of your machine), try making a c_hello_world.c program, compiling it, and running the following command to view its hello_world executable: readelf -h c_hello_world
If you're following along at home in building an ELF file, what I would recommend is that you open <elf.h>, pick the appropriate header for your machine (in my case, I used the 64 bit header), and go down the list of things in the header and set them one by one until you've run through the list. If you hit something where you aren't sure which option you should select (for example, if you don't know the endianness of your machine), try making a c_hello_world.c program, compiling it, and running the following command to view its hello_world executable: readelf -h c_hello_world
Running that command will return something looking a bit like this:
You can use the endianness (and the other system information) you find there in your homemade ELF header. Now, if you're trying to make a more generalizable ELF file builder, that system information needs to be changeable to match whatever system the file is going to be run on. But for now, this is a good enough starting place to see if a homemade ELF file will work. Once all the header information has been populated in the ELF header struct, write that struct to a file. Then, try using the same readelf command as before on your generated file to see if your system can read it. If it works, all your information is correct so far! If not, check back over the included info, particularly making sure that your magic number, OS/ABI, and machine type are set correctly.
Making the Program Section Headers
I felt really lost when I hit this point. I struggled a lot with figuring out which program sections were required and which I could leave out. I knew for certain that, for my approach, I needed a INTERP program section, but I had no idea if sections like GNU_EH_FRAME and GNU_STACK were required or optional (turns out they were optional for my use case). Another issue I ran into here was that I had conflated program headers and section headers in my head, making it very confusing to see where I was supposed to put the hello_world code.
The section headers are optional, named components of the ELF file. The program headers describe the parts of the program which are executable and need to be loaded into memory at runtime. In this case, since we're making a simple executable and not actually engaging the loader (ld-linux.so), we have very few program headers. Instead, we're using three standard sections: the .text section, the .interp section, and the .shstrtab section. If you want to read more about the other sections which can be used in an ELF file, you can find a summary of them here.
The .text section contains the actual code to be executed. This should normally be machine code. I am massively cheating here and am instead putting the Perl 6 source code for "Hello world!" here instead. This is not a permanent solution, but is being done to see if, when trying to execute the ELF, if the system calls the MoarVM as the interpreter, as specified in the .interp section.
The .interp section tells the system what loader to use. By default, this is /lib64/ld-linux-x86-64.so.2 . However, we're cheating again here and telling it that instead of calling that loader, I called perl6 instead. As mentioned before, this did not actually work, for reasons I'll get into in the "Where Things Went Wrong" section, but for now, assume that providing the location of perl6 here could work.
The .shstrtab section is where we store the names of the sections so the system knows what kind of sections it's dealing with.
Here comes the tricky part: what in the world do you actually put in the section header for each of these? The name is easy enough, it's the index into the string table at which you can find the name of the section. The next two (the type and flags) are also pretty simple, you just look them up here. The section offset is the offset from the beginning of the file to where the section being described by the header is located. The size is the size of that section. But what about the virtual address at execution? By looking at the previously mentioned c_hello_world ELF as well as another couple samples, I realized that the virtual address and the section file offset are typically set to be the same thing, the location within the file where the actual section being described by the header starts. The size is the size of the section being described by the header. For the three sections I used (.interp, .text, and .shstrtab), the sh_link, sh_info, and sh_entsize can all be set to 0. Finally, I figured out the sh_addralign by looking at functioning, sample ELFs again.
Where Things Went Awry
It turns out that subverting the normal function of the .interp section and using it to call a different interpreter is not as easy as it looked. This is partially because the execution entry point specified by e_entry in the elf header is intended to point to a short script called "_start" which appropriately set the machine registers for program execution and jumps to point at the "main" function of the program before handing over execution to the loader specified by .interp. Because I was functioning under the assumption that execution was passed over to the specified interpreter at the location specified by e_entry, I thought that I could use the perl6 executable to just directly execute whatever was in the .text section.
When I looked for examples of cases where others had successfully used a different loader, I was unable to find an example of anyone using a loader other than ld-linux.so. Anyone I found who had altered the .interp section had done so to call an older version of ld-linux.so.
What I was I able to find examples of was people changing the _start subroutine to call a different section of code. When I looked again at .NET Core, it seemed like this approach was much closer to what they were doing.
When I looked for examples of cases where others had successfully used a different loader, I was unable to find an example of anyone using a loader other than ld-linux.so. Anyone I found who had altered the .interp section had done so to call an older version of ld-linux.so.
What I was I able to find examples of was people changing the _start subroutine to call a different section of code. When I looked again at .NET Core, it seemed like this approach was much closer to what they were doing.
The Path Forward
It seems like the path of least resistance from this point is to switch gears and attempt to use a rewrite if the _start subroutine typically included in the .text section to call the perl6 executable to execute the hello world line which will be written in an optional, additional section. This mimics the way .NET Core uses a manifest to list and execute files included in the system dependent executable. Once I am able to directly use the call to perl6 to execute source code included in the ELF, I will move towards using MoarVM to interpret generated MBC included in the ELF. Thankfully, a lot of the work I have done so far can still be saved and used in the new approach. I'm feeling a bit up the creek without a paddle, but hopefully this new approach will work, or give me insight towards a technique that will.
If you're interested in looking over the code for my linker, I've put it up on GitHub here. If you clone it, you will find that you can read the ELF header and the sections using readelf -a hello_world, and that the sections and values in the section headers are properly populated. Sadly, because of the issue described with _start and the .interp section, attempting to execute the ELF will result in an error.
Comments
Post a Comment