Trials and Tribulations with Modules: Part Two

Hello! I am a CS student working on a GSoC proof of concept to modify perl6 to create executable binaries. This blog documents my journey and struggles. There are likely more correct and maintainable ways to do what I've done so far. If you know a better way, or if you see an inaccuracy, please let me know in the comments section.

This post has been incoming for a while, and as a result, has grown to a massive size. To make it a bit easier to digest, and more useful for those of you reading this hoping to find information about a particular aspect on module loading, I have chopped this post up into multiple smaller posts focusing on each of the smaller aspects.

In this particular post, I'm going to focus on how module loading works.

What's a module?

For a complete explanation of modules, please refer to the Perl 6 documentation. This tutorial is a short and sweet explanation of modules, and this one is a bit more in depth, so pick your poison :).

In brief summary, a module is a set of files in a namespace. So for example, if there was a module named Foo that contained the function bar(), I could use it from inside my program by first importing the module by writing "use Foo;" and then calling the function by writing "Foo::bar()".

There are three different keywords that you can use to indicate you want to import a file: use, need, and require. For my purposes, the distinctions between these three haven't been super significant, as all three indicate that a module is needed. If you would like to get clarification on the minor differences between then and how you should use them in your code, please refer to the docs. Additionally, if you'd like an explanation of how to make a module, tbrowder and ttjjss both have good posts on this topic.

How do we know a module needs to be loaded?

When you hand a program to perl6, it runs through a number of steps before eventually compiling and running the program. This section goes over how it gets to the point of parsing the program, and some significant steps after the program is evaluated.

The first five steps of this section may look familiar! That is because they are the same as the first five steps in this blog post on what running perl6 does. I've included them here for ease of reading. Please note that this is a bit of a simplification of how module loading happens, as there are a lot of steps along the way that have to do with exactly how the compiler parses the user input, which is a bit above my pay grade :)

main.nqp begins to execute. This is the very beginning of the initialization process necessary for your program to run. An instance of the compiler is created and set up. The path to the various Perl6 and NQP libraries that you may need are determined and bound to environment variables. Several command line options are added. Then, the compiler is actually entered with a call to command_line.
command_line begins to execute. This method looks at several of the options that you can set when running perl6, including --help, --target, --perl6-runtime, --libpath, and --execname, and does the configuration necessary for those special options before calling command_eval.
command_eval begins to execute. This method determines whether you're trying to run code written directly into the command line using the -e flag, whether you are trying to run code via stdin, or whether you are trying to run a program from a file. If you're using the -e flag, it calls eval. If you're using stdin, it uses a series of logic operators to determine whether it should call interactive or evalfiles. Assuming stdin is not a TTY display, it will call interactive. Finally, if you're running a program from a file, like we're assuming, it calls evalfiles.
evalfiles begins to execute. This method confirms that you are attempting to open a file, not a directory. Then, it reads in the contents of your file. Finally, it calls eval.
eval begins to execute. If you have the --profile-compile flag set, it compiles the code slightly differently than it would otherwise. Otherwise, it simply compiles the code, calling compile.
compile begins to execute. It does this by looping through the different stages (start, parse, syntaxcheck*, ast, optimize*, mast*, and mbc*).
start begins to execute. This function appears to be a stub at the moment.
parse begins to execute. This calls $grammar.parse (thanks timotimo for helping me locate it).
$grammar.parse begins to execute. As timotimo pointed out, the interesting bit here is the invocation of the TOP grammar rule. At this point, I'm going to hand off explanation for a minute as I'm a bit out of my depth when it comes to exactly how the Perl6 grammar works. To get an introduction to how the Perl6::Grammar in particular works for parsing a simple program, please proceed to Shitov's post. To get an explanation of how a simple grammar implemented in Perl 6 works, please proceed to Lenz'.

At this point, we are fully in the land of compilers. The program will be parsed, piece by piece, and the keywords use, require, and need will identify that the token following those words is the name of a module that needs to be loaded. I'm not going to go into depth on how the compiler works, as that is outside the scope of this blog, but if you'd like to learn more, here are a couple of resources:

The Rakudo and NQP Internals course is a very useful resource developed by jnthn which goes over the outlines of how the Rakudo Perl 6 compiler works
Let's Build a Compiler is a useful book that goes over in general how compilers work, and several different techniques on making them.
Jnthn's blog in general is very interesting and informative when trying to get an understanding of the bigger picture of Rakudo. One of the posts I found interesting is one of the early roadmap posts from 2010. Obviously, it's a bit out of date, but still very informative and interesting.

* These are added to the list of stages during set-up, and may be slightly different from the stages that occur on your machine depending on your configuration of perl6.

Then what?

So, now we're at the point where everything has been parsed and processed and we know that a module needs to be loaded. What happens from there?

I uncovered a lot of this trail through tracking down a bug mentioned by vrurg and Xliff and by hitting error messages like this while I was trying to get perl6 to look for the modules in a temp directory:

As always, if you see an inaccuracy or a misunderstanding on my part, please let me know!

So, thanks to the wonderful magic that happened during compilation, we now know that we need to load some modules. What happens now?

At some point, the flow of execution gets handed to need. This method is responsible for locating all unresolved dependencies and putting them into a newly generated comp_unit so they're loadable. As part of this process, the method try_load is called.
try_load begins to execute. This is the method responsible for attempting to locate and load the precompiled bytecode for unresolved dependencies. If the precompiled bytecode for the dependencies already exist, it calls load and carries on its way. If that bytecode does not, it calls precompile and then calls load
precompile begins to execute (Thanks nine for helping me find this.) This method performs the necessary precompilation steps depending on what --target is set to, and then produces the dependency string I was hunting down from the previous post. It also writes the bytecode and dependencies of the module source file. Then, it returns to try_load, and load is called.
load begins to execute. This is the method responsible for, you guessed it, loading the files into memory. First it verifies that the precompiled files it is handed are up to date, and then it calls load_dependencies.
load_dependencies begins to execute. Fun fact! This is where one of the errors ("could not find $dependency") that I ran into while trying to point perl6 at a temporary directory that contains precompiled module files is produced. This function loads any dependencies into memory that have not already been loaded. In order to do that, it must attempt to find the individual precompiled files. Thankfully, that information, returned by precompile, was passed on to it and it is able to iteratively call load-handle-for-path on each of those known dependencies.
load-handle-for-path begins to execute. This function in turn calls load-precompilation-file.
load-precompilation-file begins to execute. Because the function has been handed a path, and not a file handle, it executes this version of it. This is where the process begins to differ from Xliff's bug, as it was handed a file handle. This version of load-precompilation-file then proceeds to call nqp::loadbytecode. This results in MVM_load_bytecode being called.
MVM_load_bytecode begins to execute, and loads the bytecode into memory.

At this point, it is possible to continue chasing down the rabbit hole from MVM_load_bytecode to MVM_cu_map_from_file to MVM_cu_from_bytes to MVM_bytecode_unpack to dissect_bytecode to find some more of the errors I encountered along the way, but for the vast majority of people, I don't think that's a rabbit hole worth devoting mental energy to, so I'm going to stop here. Please feel free to follow the links and read the code that connects these bits if you're tracking down an error that looks something like this:

Why does this matter?

Now that most of the path of how module loading works is identified, it's much easier to find the information for which modules are needed, rather than relying on having a file that contains the output generated by calling perl6 --target=mbc --output=foo.moarvm foo.pl6. So that's one victory! Simplifying the process of calling perl6 --compile as much as possible is definitely a goal.

It also gives us a better idea of how to get perl6 to use the modules packaged up in the executable generated by perl6 --compile=foo foo.pl6. We have two options at this point, we can either insert the expected files to where perl6 would expect them to be, or we can figure out how to redirect perl6 to look somewhere else for them. As I briefly mentioned earlier, I've spent a lot of time in the past month trying to accomplish the second option with little to no success (causing me to encounter a lot of the errors that helped me track down what was happening).

For the sake of continuing forward progress, I'm going to perform an experiment a bit later today on a friend's computer to see if the first option, inserting the expected files into the correct locations, is feasible. I need to check to see if the names of the files for the pre-compiled module files are the same on different computers (based on my findings, I think they should be), since I still haven't been able to fully figure out how those names are generated. If you'd like to read more about how the source code and pre-compiled bytecode for modules are stored, please refer to my earlier post.

More to follow soon!

Search This Blog

Yak Shaving Cream