The Linker for Perl 6

As mentioned in my last post, I'm working on making a linker for Perl 6 user programs as part of the Google Summer of Code. I'll be making weekly progress updates here explaining what I've done and In this post, I'm going to cover why a linker is useful/what a linker is, how I'm approaching the project, and what progress I've made so far.

I'm learning a lot of this as I go, so if you see something that's confusing or incorrect, please let me know so I can fix it!

What does a linker do?

The TLDR version is that a linker is what takes all the object files generated by compilation, resolves the calls to functions in libraries and other files, and combines them into a single executable file. Being able to generate this single executable file is extremely useful as it means that once the program has been "translated" from whatever language it was written in into bytecode and grouped into an executable order, you can run that program over and over again without ever having to translate it again. It's easy for someone to just run it. To go a bit more in depth, a linker is responsible for two main things: symbol resolution and relocation. I'm also going to touch on what a loader does, as the linker and loader work hand in hand to produce and run the executable.

Symbol Resolution: Symbol resolution by the linker is what allows a programmer to make calls to external libraries. During the compilation stage, a symbol table is made which indicates what symbols (functions, variables, methods, etc) are referenced in the code, whether they are declared in the file they are called in or if they're externally declared, and what line they are referenced on. This table makes it much easier for symbol resolution to occur. The linker goes through the symbol table and finds the actual address of the referenced symbols. The lines within the program that call those symbols are then updated in the relocation step to point to the actual address. Symbol resolution can be trickier when dealing with shared libraries (.so's) because the actual address of a shared library won't be determined until runtime. Linkers have a way of dealing with this (essentially putting in a placeholder) that I will likely go more in depth on in later posts.

Relocation: Modern programs are written with the assumption that they are starting at memory address 0. In practice, this can't actually happen because then all the programs would get loaded on top of each other and you would only be able to have one program in memory at a time. So, to deal with this without requiring each of the programs to know exactly where it'll wind up in memory, the linking step relocates the program to another address in memory and accordingly fixes the address-referencing instructions.

The Loader: The loader is what is responsible for copying the program from disk in memory and then beginning execution of the program.

In summary, the linker takes all object files produced by the compiler, identifies all function calls in those object files, updates the lines calling those functions with the appropriate address, and then updates all address-referencing instructions in the bytecode with the appropriate address. This results in a single executable file which can be run on the machine it was compiled on.

This is a very very brief overview of linkers. If you're interested in learning about them more in depth, I strongly recommend Linkers & Loaders by John R. Levine and Ian Lance Taylor's Blog. If you want to try your hand at making a very simple linker, Linkers & Loaders has a project, broken up into pieces, that walks you through making a simple linker. Levine's website for Linkers & Loaders has sample solutions, all implemented in Perl 5, for the first several parts of the project.

Overview of how I'm making the linker

My approach to the Perl 6 linker is similar to .NET Core's recent implementation of creating single-file executable binaries. I picked this as my approach since it should help lead to a solution that will work on Linux, Windows, and OSX. It works by creating a manifest of information which is written into the operating system dependent executable binary format. This system dependent binary serves as a wrapper for PE bytecode, which is interpretable by .NET. The manifest is then linked to .NET’s runtime library, and handed off to a .NET “main” function which functions as the loader and executes the program.

Similarly, my implementation will create a system dependent executable which wraps bytecode which is interpretable by MoarVM. The ELF .interp header, a subsection of the ELF executable file, will be used to call a locally installed instance of MoarVM, which will function as the loader.

I'm starting by targeting Linux systems so that I can make use of the .interp header. Once I have the linker fully functioning on Linux, time permitting, I will move to getting it working on Windows, using a similar strategy to .NET's to use the locally installed instance of MoarVM as the loader.

This month, I am focusing on making a proof of concept which takes the MoarVM bytecode (MBC) for a Hello World program and produces an ELF executable that outputs "Hello World!". Once I have this working, I'm going to expand to a slightly more complicated program that uses multiple files.

In the coming months, I will enable linking of static libraries, then the linking of shared system libraries, and then hopefully the linking of shared user libraries. I expect to run into some difficulties with shared user libraries that make use of Perl 6's ability to inline different programming languages, but we'll see when we get there!

Where I am now

Successes so far:

I've been able to generate and save an MBC file for say "Hello World!";. To do this, run this command:
perl6 --target=mbc --output=<file_name> -e 'say "Hello World!"'
If you want to be able to look at a slightly more human readable version of this file, you can run:
moar --dump <file_name>
I've been able to write a program that reads in the MBC file and correctly extracts the header and the other segments contained in it. If you're interested in the structure of MBC files and what sort of information they contain, you can read more about them here.
I've been able to extend the program to make an ELF executable with the correct header information that is properly formatted and contains the correct information to be potentially executable on my system (Ubuntu 18.04.02, x86_64, little endian). It also takes the information from the MBC header as well as the actual byte code and puts it in the ELF executable. The ELF executable I've made does not run yet, as I still need to add the section which tells it to use MoarVM as its loader.

Things I'm working on this week:

I'm working on adding the .interp section that will make the call to the locally installed MoarVM interpreter so I can test to see that the MBC bytecode wrapped inside the ELF executable is actually runnable in that form by MoarVM. Some extra work may be needed to make the wrapped MBC bytecode interpretable by MoarVM.
I'm going to go back and finish fixing the ELF header generation so that it will pull the architecture and endianness from the computer it's being compiled on.

More to come soon!

Comments

Tom BrowderJune 6, 2019 at 4:12 AM
I'm supposed to be one of your mentors but haven't been around. I see that you have been doing work that I probably couldn't have helped with anyway! I am pleased with your progress and know the community appreciates it.
UnknownJune 10, 2019 at 2:47 PM
You may not get much feedback because we're Perl folk but it's great to read you're having fun. :)

I'm curious whether any of your decisions relate to whether the duo NQP/MoarVM can be used and packaged in the manner you propose for the trio Rakudo/NQP/MoarVM. Of course, from one perspective the whole point is that it turns a program into a standalone so why wouldn't one just use the trio? What could possibly be appealing about packaging just the duo?

Maybe I'm imagining things but I think of NQP as a new generation grammar powered Unicode aware PCRE. And this, at a time when the world really needs just such a thing. So I think of it as being quite plausible that NQP/MoarVM will see widespread adoption among OS distros and programming languages in coming years, much as PCRE was popular 2 decades ago.

For example, Devanagari is already the most widely used (in terms of number of languages) writing system in the world. But it is poorly supported by current languages, and to a degree regex engines, which don't generally understand or deal well with Devanagari "characters". This is so despite credible projections suggesting there will be more Indian devs than US devs within 5 years and that India will be more populous than China by 2030.

Anyhoo, perhaps it makes zero difference whether the duo or trio is packaged up first. I just know it makes me happy that you're enjoying doing this and to dream that a billion people might benefit over the next decade from the work you're doing this year. :)
XliffJune 15, 2019 at 10:21 AM
This comment has been removed by the author.
XliffJune 15, 2019 at 10:22 AM
Madeleine,

I have been hoping someone would tackle this issue for a long time, and am glad to see you doing it. I will be following your progress with great interest.

Good luck!
October 10, 2019 at 9:59 AM
Nice. great Article Thanks..
www.kaavannan-perl.blogspot.comNovember 28, 2022 at 5:15 AM
Nice, great Article

Search This Blog

Yak Shaving Cream