Summer in Review

Hello! My name is Madeleine Goebel and I have spent the past three months working on a GSoC project to enable the creation of single file, self contained executables for Perl 6. If you would like to look at the code for my contributions, you can check them out on the self-contained-executable branch of NQP.  As the summer comes to an end, I thought it would be appropriate at this point to summarize what I have accomplished so far, and what additional capabilities I would like to add in the future.

There were four major phases to this project: generating ELF files (May/June), handling bytecode directly (July), adding the --compile capabilities to NQP (July), and enabling the use of modules (August). In this post, I will start with the original premise of my project, go over the highlights from the four major phases of work, and then discuss what improvements that would be nice to make in the future. I will also touch on my general experience in the GSoC and my experience working with the Perl 6 community.

Please feel free to skip around to the sections most relevant to you! While this is a (mostly) chronological accounting of the project, I've attempted to write everything in a modular way so that if only one section is relevant to the reason you're reading this post, you don't need to read too much beyond that section. At the bottom of each section, I've also included the links to the earlier blog entries most relevant to that section in case you would like more detail about that particular chunk.

Building a Linker for Perl 6

Back in April, I submitted a proposal to address the Perl Foundation's "Linking Perl6 Compiler and Programs" project idea. My stated deliverable was to "Add a --compile option to perl6 which will generate a foo executable PE or ELF formatted binary for a given foo.pl6 user program which will facilitate self-contained deployment and/or deployment dependent upon the system shared version of Perl 6 which is installed."

Because the world of linkers was fairly new to me, I did quite a bit of research up front to come up with an approach that wouldn't require me to entirely reinvent the wheel, and could allow me to reference existing projects that were attempting to accomplish the same thing. My two most useful resources while trying to initially create a plan of action were Linkers and Loaders by John Levine and .NET Core's recent implementation of creating single file executable binaries.

The basic concept of the original plan was to create an executable ELF file which contained the Perl 6 bytecode and any required dependencies. The ELF's .interp header would then be used to call to the locally installed MoarVM interpreter to execute the bytecode, which I hoped would avoid needing to include any dependencies that wouldn't be known until runtime.

Over the course of the summer, this plan evolved and changed as I ran into difficulties, but the basic concept of generating an ELF that contained MoarVM bytecode which could then be run using an already existing executable (perl6 or moar), remained the same.

Generating ELF Files

My very first step was to figure out a way to take an existing MoarVM bytecode file and turn it into an executable ELF file. My first attempt at this was much much harder than it needed to be. I started by creating a simple C function that could read and print out the contents of a MoarVM bytecode (MBC) header. My thought was that in order to "convert" my MBC file into an ELF file, I needed to parse it and populate the appropriate parts of the ELF with information taken from the MBC file. I quickly realized that this was excessively difficult approach, and that it made more sense to approach creating the ELF file from a different angle. While this small parsing function was eventually written out of the project, this experience gave me a starting place for messing around with binary files that proved useful throughout the rest of summer. 

My second attempt took a different approach that was heavily inspired by .NET Core's method of running their binaries on Linux systems. When the binary is generated, it is always written as a Portable Executable (PE), which is the format required to run on a Windows machine. However, if you are compiling the executable for a Linux, .NET Core essentially encapsulates the PE within an ELF which, when executed, calls .NET Core's main function to execute the encapsulated PE binary. In my implementation of this technique, I took the MBC file and embedded it in my executable ELF's .text section, and was planning on using the ELF's .interp section to call a custom loader (moar) to execute the .text section.

Sadly, I hit another roadblock. It turned out that the execution entry point for the ELF header points to a short machine dependent script called "_start" which sets up the computer for program execution and then jumps to point at the "main" function of the program before handing over execution to the loader specified by .interp. This meant that my attempt to circumvent the normal flow of execution through an ELF file wasn't possible in the way I had hoped. My first thought after some research into how others had approached using a different default loader was to attempt to modify _start. Thankfully, a helpful conversation on the #moarvm channel helped me realize this would be a painful and un-portable approach, and gave me the push in the right direction I needed to find a path forward.

My next attempt had me take a step back from attempting to develop a program that would generate an ELF file that would encapsulate the MBC file, using objcopy instead (see the first attempt section in this post for more details). From there, I made a small C program that would access the embedded MBC file, and would attempt to run it using moar. Sadly, it turned out that the MBC file I had been able to generate using perl6 --target=mbc --output=hello.bc hello_world.pl6, was not directly runnable by MoarVM. So, for the sake of forward progress to see if this objcopy approach could work, I switched to embedding Perl 6 source code into the ELF, and using perl6 to run it. And it worked!

From there, I started work modifying my ELF file generation program to create a relocatable ELF similar to the one produced by objcopy. This was fairly straightforward, and I was able to reuse a lot of my progress from my earlier encapsulation approach attempt. If you would like to see details on how this was done, please check out the "Second Attempt" section of this post

And with that, I had a way to embed Perl 6 source code into an ELF file, and run it! It was time to move onto the next steps, figuring out how to run MBC directly, and adding the --compile flag to perl6.

(Building an ELF file & Hello World!)

Running MBC directly

When I directly handed moar a MBC file, I was greeted with this exception:

Nine, brrt, and mornfall helped me realize that this is because perl6 (which is what usually calls MoarVM) does a good deal of necessary setup, including doing things like identifying the library search and repository paths to all needed components. I thought that the simplest path forward would be to isolate the parts of perl6 that did that initial start up, figure out a way to retrieve its MBC, and embed that at the beginning of my .rodata section in the ELF before the user program. However, after some digging with the help of a tool recommended by timotimo, and after some points made by patrickb, I realized that isolating and embedding the set-up stuff (which touched over 200 files) would be too big of a task, and would result in an unnecessarily large ELF, so I went back to the drawing board.

After a couple of false starts, I figured out a way to move forward. I started tracking down how Rakudo wound up with bytecode to hand to its backend, tracing from rakudo/src/main.nqp, to where command_line is defined, to command_eval, to command_eval's parent implementation, to eval. At this point, I determined that, to move forward, I needed to add a special flow of execution to eval to handle bytecode and to add a command line option that would tell perl6 when the file it was being handed was bytecode.

To add the command line option --bytecode/-b, I mimicked the changes made to NQP by this pull request. Then, I did some digging to find the function that NQP uses to evaluate bytecode, and used that function (nqp::loadbytecode, which is implemented here) to directly handle the passed in file.

At this point, perl6 was now capable of running bytecode! Now, I needed my tool, p6_linker, to actually make use of that tool. Since --bytecode relied on perl6 being passed a file, I needed to modify my start-up script to pass perl6 a temporary file. After some bumbling around (which you can read about in the Altering the p6_linker section of this post), I made and pushed the necessary changes.

(And Now For Something Completely DifferentModifying Perl 6 Executable to Run BytecodeFlags and Syscalls and Modules, Oh My!)

Adding the --compile capabilities

After a quick modification of NQP, once again mimicking the process used by this pull request, I was in business and ready to start adding functionality. Next up, I needed to get --compile to generate bytecode. Since perl6 already does this if you call perl6 --target=mbc --output=foo foo.pl6, I simply modified command_line  to set those two flags appropriately. 

Next up, I had to figure out how to get perl6 to call my linker when --compile was set. It took me a while to find the documentation on how to make an asynchronous system call using NQP, and sadly, the documentation didn't provide much in the way of explaining how to use spawnprocasync. Thankfully, its test case was extremely helpful in figuring out all the set up that needed to be done before calling the process. To reduce the mental load of using spawnprocasync, I extracted all that set up into a separate function, syscall, which could handle calling it for me.

At this point, I was able to add the calls that generated the ELF and the call that make it executable.

Finally, I modified enabled custom naming for executables.

In NQP, I changed my declaration of the --compile flag to require an input, changed the name of the file the MBC was outputted to, and changed my calls to the linker to use the provided name. In the linker, I changed the naming of the intermediate and final file to be determined based on the name of the MoarVM bytecode file it was passed.

To browse all the changes that made --compile functional, check out this commit.

Enabling the use of modules

At this point in the project, I had successfully created a linker that was able to take a user program and package it as a runnable ELF. All that was required from the end user to compile that program, and then to run it was to do the following:

However, this capability didn't extend to user programs that used modules. If a program made use of a module, attempting to run it would produce the following error:

This was because the path to the module was being incorrectly resolved, and perl6 was failing to locate it. At this point, I decided to narrow scope a bit and focus solely on a self-contained executable, since I believed it might be easier to locate modules I packaged up in the ELF with user program, rather than trying to locate modules which may or may not exist on an end user's system.

To be able to move forward, I needed to do a couple of things. First, I needed to find all the necessary module files. Next, I needed to be able to pack and unpack those module files in and out of the ELF file. Then, I needed to figure out how to get perl6 to locate those unpacked modules.

It turned out that figuring out how to accomplish those three steps resulted in tons of digging through NQP and Rakudo, and quite a bit of struggling with C, since I was attempting to get it to do a task better suited for a scripting language.

Finding the modules

To figure out how to locate the necessary modules, I started by installing a module, Date::Names, on my system. Then, I searched, starting at the root of my perl6 directory, for anything that matched the strange hex strings that appeared in the output spat out when I attempted to compile a program that used Date::Names

From there, I was able to find a number of files in the following locations:
  • [path_to_perl6_root]/share/perl6/site/dist
  • [path_to_perl6_root]/share/perl6/site/precomp
  • [path_to_perl6_root]/share/perl6/site/short
  • [path_to_perl6_root]/share/perl6/site/source.
For an in-depth explanation of these files and how modules are stored locally, check out this blog post. In summary, I had successfully located all the precompiled bytecode files that I needed to include in my ELF file.

At this point, I needed to figure out how to package those files into my ELF, and, with brrt, came up with a scheme to do so. I was going to concatenate all of the files together with a separator containing an index, and maybe some minimal information about the file, between each of them. This was a completely reasonable solution to arrive at, but one that caused me a great deal of frustration, as it needed to be accomplished by my ELF building program, which is written entirely in C. While basic file I/O is not particularly difficult in C, I ran into a lot difficulties because I was handling binaries, not normal text, and those binaries contained null terminators.

Packing and unpacking modules

Eventually, after an attempt on my own, and a more successful attempt with a friend, I was able to come up with a way to pack the files into the ELF. However, attempting to unpack them in my start-up program into a temp directory made me realize that this was really not a task I should be doing in C. I started out by trying to use a call to awk similar to the one in this StackOverflow issue to parse what I had packed.  But after some staring at the monstrosity I had created, I realized that, if I was going to be using an external tool for part of it, it made a lot more sense to go ahead and use tar to pack and unpack it. In my ELF building file, I made a call to tar and then wrote the contents of the tarball directly into the .rodata section, just like I had initially done with my user program. Then, in my start-up file, extracted the tarball from the ELF file, and made a call to tar to unpack it.

Using the modules

At this point, I was still struggling to get perl6 to locate these module files. I did a lot of tracing through the source code to figure out how it typically identified and loaded the necessary module files (which you can read about here), but none of the changes I made seemed to help. One issue here was that initially misunderstood the purpose of the precompile function, and thought that it was involved in the loading process as well, not just the locating process. As a result, I attempted to modify this function to use a different location (my temp directory). However, all this accomplished was winding up with some precompiled module files being written to different locations during their compilation when I was lucky, and frustrating errors when I wasn't. After realizing my mistake, I came up with a new scheme.

If you go down the rabbit hole of module loading, you'll see that, like the running of the user program, it boils down to calling nqp::loadbytecode on a precompiled bytecode file. The only slight difference is that before the module's bytecode is run, it has to skip the first couple lines of the file before the MoarVM header starts. So, to get a minimal proof of concept working, I added another command line option, -bm, that allowed me to pass the bytecode for a module in when a user was attempting to use the --compile flag to create a program that used a simple, single file module. When present, this file was packaged up with the user program using tar, and written into the ELF. Then, it was unpacked and passed in as a second parameter to perl6, in addition to the user program's bytecode file, when my start-up file called it.

And voila! I was able to run a user program that used a basic module!

It is far from a polished and pretty solution, but it demonstrates that not only is it possible to create executables for user programs that do not use modules, but it will also be possible to create executables for programs that do.

Summarizing my contributions

Over the course of this summer, I have added both the capability to make an executable out of any user program that does not use modules:

And the capability to make an executable out of a user program which uses a single file module:

What's left to do

Moving forward, there's still a lot of work that can be done on this project.

Obviously, the most crucial first step is adding the capability for user programs that use multiple modules, or modules made up of multiple files. This will hopefully be fairly straightforward, as --bm is already written to handle multiple files.

After that is done, there are a lot of improvements or features that would be nice to add. I've been keeping a running list of features or non-critical improvements that have been suggested over the course of the project. If any of these spark your interest, please feel free to make a pull request to the self-contained-executable branch of NQP!

Thank you all

I have greatly enjoyed becoming involved in the Perl 6 community. I never expected to find such a welcome and accepting group of people. I couldn't have accomplished even half of what I managed this summer without the help of my mentor, brrt, and all the others (timotimo, jnthn, nine, mornfall, patzim, lizmat, Xliff, vrurg, cygx, ggoebel, and more) who contributed their time and expertise to helping me move forward. It's wonderful that programs like the Google Summer of Code exist to help students like me have the time to start contributing to open source communities. This summer has been an amazing experience, and I hope to remain involved in the Perl 6 community for many years to come.


Popular posts from this blog

Getting Started: Developing for Perl 6

Modifying Perl 6 Executable to Run Bytecode