Modifying Perl 6 Executable to Run Bytecode
Exciting news! I've successfully modified perl6 to run bytecode! You can look through my changes on Github here. In this post, I'm going to briefly go over what happens when you run perl6, what I added to get the additional functionality, a few issues I ran into along the way, and what I'm doing next.
At this point, I started to feel like maybe I was going down an unnecessary rabbit hole, and trying to reinvent a wheel I didn't need to. After all, perl6 is already capable of loading pre-compiled modules and using them. So I started looking for the code that loaded those modules, which was very conveniently in the file ModuleLoader.c. It turns out that to load bytecode from a file, Rakudo calls the op loadbytecode (again, this is the MoarVM implementation). This op in turn calls the function MVM_load_bytecode, which reads the bytecode in from the file and then runs it. Since this does exactly what I need, I modified evalfiles to use this function whenever the --bytecode flag was set and exit when the function returned.
And there we have it! A working version of perl6 capable of running bytecode.
What does running perl6 do?
Let's imagine you are using MoarVM as your backend, Rakudo as your compiler, and have just written the program foo.pl6 and want to run it. You call perl6 foo.pl6. What happens next?
- main.nqp begins to execute. This is the very beginning of the initialization process necessary for your program to run. An instance of the compiler is created and set up. The path to the various Perl6 and NQP libraries that you may need are determined and bound to environment variables. Several command line options are added. Then, the compiler is actually entered with a call to command_line.
- command_line begins to execute. This method looks at several of the options that you can set when running perl6, including --help, --target, --perl6-runtime, --libpath, and --execname, and does the configuration necessary for those special options before calling command_eval.
- command_eval begins to execute. This method determines whether you're trying to run code written directly into the command line using the -e flag, whether you are trying to run code via stdin, or whether you are trying to run a program from a file. If you're using the -e flag, it calls eval. If you're using stdin, it uses a series of logic operators to determine whether it should call interactive or evalfiles. Assuming stdin is not a TTY display, it will call interactive. Finally, if you're running a program from a file, like we're assuming, it calls evalfiles.
- evalfiles begins to execute. This method confirms that you are attempting to open a file, not a directory. Then, it reads in the contents of your file. Finally, it calls eval.
- eval begins to execute. If you have the --profile-compile flag set, it compiles the code slightly differently than it would otherwise. Otherwise, it simply compiles the code. It takes the compiled code, called the comp_unit, and calls compunit_mainline with that comp_unit.
- compunit_mainline executes and calls compunitmainline. (Note, I am referencing the MoarVM implementation of compunitmainline. If you are using the JVM or the JS backend, the implementation being used will be different.)
- compunitmainline begins to execute. This is an OP implemented by the MoarVM. It begins by checking the the argument passed to it is, in fact, a comp_unit. Assuming it is, I believe this is where the bytecode is passed to MoarVM to be loaded and executed using MVM_load_bytecode.
This is the point at which I'll stop the description of what happens when perl6 is run since this sufficiently covers everything necessary to understand the changes I made. If you notice any inaccuracies, or if anything needs further clarification, please reach out to me and let me know!
Modifying Compiler.nqp to run bytecode
The change I was trying to make was pretty simple. Instead of handing perl6 a file containing source code, which needs to be compiled before it can be loaded and executed, I wanted to hand it a file containing bytecode (MBC). Now, instead of perl6 reading in the file, compiling the code, and loading it, I wanted to skip the compilation step. This meant I was focusing on the evalfiles flow of execution from step 3 (above).
I went about this a bit backwards. Because I knew that the existing flow of execution would have evalfiles call eval to do the compilation step, I started off by seeing how I could modify eval to skip that step and jump straight to calling compunit_mainline. I modified the eval function to check to see if the bytecode flag was enabled. If it was, it would set the local variable output (which is later passed as the argument to compunit_mainline) equal to the code eval had been passed by evalfiles. Otherwise, it would compile the code as usual, setting value of output equal to the output of the compile method.
While this may have been a decent approach, it neglected to consider that there might be difficulties loading a binary file using evalfiles. When I attempted to run an MBC file using perl6 to see what it would do, I was greeted with the following error:
The error produced indicated that it couldn't read in the MBC file because it was was expecting it to be UTF-8 encoded. This set me down the path of figuring out how I could change evalfiles to load something that wasn't UTF-8 encoded. I discovered that the function it was using to open the file allowed you to set the expected encoding, and that if you were handing it a binary file, you should set the encoding to 0. However, after I made that change, I was still getting the same "Malformed UTF-8" error. It turns out the the slurp function, used to read in the file, couldn't read in simple binary. So I started digging to find a function that could, and found the commit where evalfiles had been updated to care about the encoding of the file it was reading in.
At this point, I started to feel like maybe I was going down an unnecessary rabbit hole, and trying to reinvent a wheel I didn't need to. After all, perl6 is already capable of loading pre-compiled modules and using them. So I started looking for the code that loaded those modules, which was very conveniently in the file ModuleLoader.c. It turns out that to load bytecode from a file, Rakudo calls the op loadbytecode (again, this is the MoarVM implementation). This op in turn calls the function MVM_load_bytecode, which reads the bytecode in from the file and then runs it. Since this does exactly what I need, I modified evalfiles to use this function whenever the --bytecode flag was set and exit when the function returned.
And there we have it! A working version of perl6 capable of running bytecode.
What's next?
Right now, I'm working on modifying my tool to use this new functionality. Because loadbytecode expects to be reading from a file, my current approach is to alter the ELF to take the embedded MBC in the .text section, write it to a temporary file, call perl6 to execute it, and delete the file. Ideally, I would like to eventually change the --bytecode flag to allow bytecode loading from anything that can map to an IO Handle (see a brief discussion on this here), but as of now, this is the simplest path forward I can think of towards making user program perl6 executables.
Updates to follow soon!
Great progress! Will this also work for code that loads modules?
ReplyDeleteThanks!
DeleteSorry for being a bit slow getting back to you. As of now, it doesn't currently work for code that loads modules, but hopefully will shortly.