Flags and Syscalls and Modules, Oh My!

Hello! I am a CS student working on a GSoC proof of concept to modify perl6 to create executable binaries. This blog documents my journey and struggles. There are likely more correct and maintainable ways to do what I've done so far. If you know a better way, please let me know in the comments section.

I have a couple of updates this week! I've gotten --compile working and have made progress on packaging the code for Modules inside the ELF (though I haven't quite coaxed things into using those packaged modules). Fair warning, this post is going to be a bit long.

If you want to hear about the changes I made to NQP, jump to Adding the --compile functionality. If you want to hear about the changes I made to p6_linker, jump down to Altering the p6_linker. If you want to hear about the progress towards module functionality, jump down to Module troubles. If you just want a brief summary of what I did, jump down to In summary. And as always, if you want to hear about all the highlights, go ahead and read full steam ahead :).

Performance Update!

kpberry very kindly did a speed comparison of running perl6 -e 'say "Hello World!";' (called Unlinked Perl), perl6 hello_world.pl6 (called Literal Perl), and ./hello_world (called Linked Perl). This was the result of those each being run a hundred times:

This indicates that for the simple hello_world program, the precompiled version generated using the linker is approximately 1.5 times faster than using perl6 -e, which is an exciting result.

Overview of this week's main issues

As of last week, the way to generate an executable was to execute the following code:

perl6 --target=mbc --output=hello_world.moarvm hello_world.pl6
./elfmaker hello_world.moarvm
make linker

And to run that executable, you had to execute the following code:

./hello <path_to_perl6_exe>

Clearly, there are a couple problems with this flow. For one, it's too complicated, meaning that it wouldn't get used if it stayed in that state. I wanted to make a simple, one line command that someone could use to generate an executable, without them ever having to know anything about the p6_linker. Second, the way it was working at the time resulted in everything having hardcoded names. It would be unfortunate if every single compiled perl6 program had to be named hello in order for it to work. Finally, it is kind of silly to have the location of the perl6 executable on the target system be a parameter of the executable as that reduces the executable's usability by an unskilled user.

Adding the --compile functionality

The solution to the first two problems this week was to add a --compile flag which would handle the three commands needed to generate the executable. This can be broken down into four major steps: adding the flag itself, generating bytecode as output, making syscalls to interact with the p6_linker, and allowing custom naming for the executable.

Adding a Flag

I briefly went over how to do this in the And Now For Something Completely Different post, but think that this process warrants some clarification. There are two main methods for adding a command line option to perl6, either by altering Compiler.nqp or by altering main.nqp. Since both the --bytecode and --compile options have been added in Compiler.nqp, that is the method I will be explaining here:

Add the option to this line in Compiler.nqp.

If you would like to have a short name and a long name for it, separate the two versions of the option with a |. For example, when I added the --bytecode option, I wrote bytecode|b.
If you would like the flag to take an input, put "=s" after the name. For example, the --output flag is followed by the "=s".
If you would like the flag to optionally take an input, put "=s?" after the name. --profile is an example of this.

Add the functionality of option in the appropriate location in Compiler.HLL. For a more in depth description of the flow of execution, please read the What does running perl6 do? section of this post.

To get things to work conditionally based on the presence/value of your option, you can use either if (%adverbs(my_flag)) or if (nqp::defined(adverbs<my_flag>)) to determine if the flag was set.

For an example of how to add a command line option to main.nqp, check out this pull request.

Getting --compile to generate bytecode

This was probably the simplest part of this entire project. In the command_line function, I added a chunk of code that hijacks the values of target and output and assigns them to be mbc and the name of the executable.

Modifying --compile to call the linker

Not going to lie, this threw me for a bit of a loop. It took me a bit to find the documentation on how to make an asynchronous system call using NQP, and sadly, the documentation doesn't provide much in the way of explaining how to use spawnprocasync. Thankfully, its test case is extremely helpful (thank you, tbrowder!).

In the example, there is a good deal of set up that is done before actually calling the process. Then, upon calling the spawnprocasync, the developer then must manipulate the queue that is passed as an argument to the function. All of this required enough brainpower to examine that I decided it made more sense to extract it into a separate function responsible for making the syscall. While this may result in unnecessary duplication of setup, it helped reduce the mental load of using spawnprocasync. It would be nice to eventually come back and improve my syscall function as well as relocate it to a more appropriate location. In the meantime, I am satisfied with its ability to perform the needed task.

I then added the call which generates the ELF, and the call that links it to the C script to make it executable. Finally, I added a call to remove the generated MBC file so the developer wouldn't wind up with extraneous, unnecessary files.

A word to the wise: If you're making changes to NQP, you will be much less frustrated if you double and triple check your code for errors before attempting to compile it. While simple NQP syntax errors will be caught fairly quickly when rebuilding it, semantic errors that break the Rakudo build may take until the very end of the Rakudo rebuild to appear, which can be very disheartening if it happens multiple times.

Modifying --compile to customize naming of executables

Up until this point, the names of the generated MBC file, the name of the intermediate ELF file generated by main.c, and the name of the generated executable were hardcoded to be hello_world.moarvm, hello_world, and hello respectively. This is very clearly unideal. Thankfully, it's a pretty easy fix. In NQP, I changed my declaration of the compile flag to require an input, changed the name of the file the MBC was outputted to, and changed my calls to the linker to use the provided name. In the linker, I changed the naming of the intermediate and final file to be determined based on the name of the MoarVM bytecode file it was passed.

Altering the p6_linker

This week, I needed to make two main changes to the p6_linker. First, I changed it to use the --bytecode flag I added last week to execute code. Then, I modified it to search for the location of perl6 on the target machine, rather than requiring it to be passed in as a parameter.

Using --bytecode

As of last week, the p6_linker expected the code it was handed to be runnable using the -e flag. Now that I had added the --bytecode/-b flag, I needed to swap out the perl6 -e call to a perl6 -b call. Up until this point, the p6_linker worked by taking the embedded bytecode and copying directly into the command line call to perl6 -e. However, --bytecode requires the code to be passed to it in a file.

So I started down the path of figuring out how to make a temporary file that I could write all the code embedded within the ELF to. After a brief conversation checking if there were any existing patterns of how this was done in MoarVM, I began googling.

My first approach involved using tmpfile, but sadly it was doomed from the start, as there was no way to retrieve the name of the created file and that was crucial to be able to call perl6 -b <tempfile>. My next approach used tmpnam to generate the name for the file, which I then managed the creation and deletion of. This approach, unfortunately, was also doomed. When compiling the program, I was greeted with the a warning which stated "tmpnam is a potential security risk to your program".

After some googling, I discovered that the reason behind this warning was that it left the program vulnerable to a race condition, the security concerns of which can be read about here. In the process of figuring out why tmpnam wasn't an ideal option to solve my problem, I discovered its replacement, mkstemp. This function modifies a template string you provide it to generate the name for a file which is guaranteed to not already exist on the host machine.

After some muddling around with frustratingly small mistakes which led to bizarre errors (thank you for catching it, brrt), I was able to build an executable containing MoarVM bytecode that was able to run!

Finding perl6

The existing implementation of p6_linker required users to pass the location of the perl6 executable as an argument to the generated executable. This was unideal, because it complicated the process of running the program for the end user. Fixing this issue was fairly simple. Instead of passing the perl6 location as an argument, I retrieved its location using "which" and used that retrieved perl6 to run the packaged program. This means that the following now works:

Module Troubles

A comment by Xliff on my post last week led to me test and see if the executables still worked if they used modules. I had hoped that in the straightforward version which only packaged the one file and none of the modules, the ModuleLoader would work as expected, correctly locating the used modules on the target machine. Unfortunately, it was not meant to be. When attempting to run a simple program making use of the Date::Names module, I was greeted with the following error:

After some digging, I believe that the issue was that the path to the module is being incorrectly resolved. This started me down the path of figuring out what was going on, and how I could adjust the current functionality of ModuleLoader.nqp to work for this use case. I am still hoping to come up with a solution to this problem, but have decided to slightly alter the use case to focus on finding packaged modules, rather than existing modules on the target system. This will hopefully allow me to resolve the issue while still making progress towards the end goal of having a self contained executable that includes all the modules that the program calls for. I will have more updates on this next week.

Bonus! Packaging modules

After some conversation with brrt, it was decided that the simplest approach of packaging up the module code would be to add an index to the .text section of the ELF. Before every bytecode file copied into the ELF, a line with the number is inserted. For example, if I had a program which used two modules, I would have the the number 1 followed by the first file, the number 2 followed by the second file, and the number 3 followed by the third file. When they are unpacked, each file is placed into a different temporary file in the same temporary directory. If it proves necessary for later steps, I may change the index lines to also include a small header containing version information about the modules.

Because this post is getting pretty long, I'm going to make another in the next few days going into more detail on how this was accomplished, hopefully with an update on the Module Troubles described above.

In summary...

This week I altered the executable building and running process from being this:

To being this:

Additionally, I started the module packaging process. I have been able to retrieve the code for the modules being used and I am also currently able to embed multiple files into the ELF file, and write each files to its own temporary file.

What's left to be done?

We are four short weeks away from the end of the GSoC, which means we are coming up on the end of things. In the next week, my primary goal is to get the executables to contain all the modules they require and to use the self-contained modules. I expect to hit a few road bumps with this as the method used for locating the modules struggles with finding modules not packaged with the executable, and I haven't yet been able to ascertain why. I think some of my issues may be related to an outstanding issue which nine recently mentioned on irc, but I'm not sure as of now.

The next step after getting modules functioning will be to begin getting everything ready to be pulled into the offical repos for Rakudo and NQP. This is likely going to involve writing a lot of tests, and fixing up some of the more hacky solutions I came up with along the way.

These are the two most crucial tasks I have ahead of me for the rest of the program that I will be focusing on. If there is extra time (or interested contributors :) ), I have a couple more tasks I would like to get done in the future to have a more versatile solution for building Perl 6 executables. I've ranked them in loose order of least to greatest difficulty (in my eyes):

Eliminate the bash dependency. Not all machines have bash, and it would be nice to reduce the number of things required to use the executable.
Eliminate the gcc dependency for the target computer. It is very possible that a person would want to deploy an executable on a system that doesn't have gcc installed.
Convert the existing code in p6_linker to NQP or Perl 6. Since it is a long term goal to have the compiler for Perl 6 written entirely in Perl 6, it would be nice to go ahead and move this chunk of code in that direction.
Make packaging of the modules optional. This may actually be solved by my efforts this coming week, but there are no guarantees. This would require either overloading an existing flag to designate whether you want a fully packaged executable including all modules, or an executable which only contains the original file. Then, the issues with determining the path to the modules on the host computer will have to be resolved.
Get it to work on Windows. This will likely be a pretty substantial undertaking. Everything in the main.c file will need to essentially be replicated using PE instead of ELF. Thankfully, while it is a large task, it's a fairly straightforward one.
Get it to work on OSX. Again, everything in the main.c file will need to be replicated, this time using Mach-O instead of ELF.
Alter --bytecode to use any IO:Handle. (Suggested via conversation with lizmat) This is currently a bit beyond my understanding of how perl6 works, so I'm not sure how complicated this would be to implement.
Eliminate unpacking packed bytecode to temporary files (suggested by timotimo and ggoebel) Use something like memfd_create to copy packaged files into memory and pass a file descriptor
Explore additional command line options (suggested by cygx). Some suggestions for additional command line options related to this project have been listed here. Final decisions on these command line options would likely need to be made by someone more qualified than me, but it would definitely be nice to have an option for a fully self contained executable versus the option I have right now, which requires Rakudo to be installed on the target system.

Please, if one of these strikes your interest, feel free to give it a whirl! I will happily accept pull requests to the p6_linker and to my fork of NQP. If there's additional information that you feel you need to accomplish one of these, I'm always happy to help. I would happily accept help in pushing forward to making this a more well-rounded, finished project, beyond something that fulfills the very basic need.

Other Interesting Highlights

In other news, Perl 6 community has made a jump back to the 90s and introduced their very own webring! Currently, there are only four blogs in the ring, but hopefully that will grow! I had never heard of a webring before timotimo started talking about it (though I think I may have stumbled across one about whales a few years back via the now defunct StumbleUpon). The conversation lead to cygx starting up the webring! I think that it's a fun solution to the issue that it's sometimes hard to keep track of all the various blogs writing about Perl 6. If you'd like to add your own, follow the instructions here.

Thank you guys for sticking with me to the end of this one! This really was a week of yak shaving, but I think I'm substantially closer to a shippable solution.

Search This Blog

Yak Shaving Cream