Trials and Tribulations with Modules: Part One
Hello! I am a CS student working on a GSoC proof of concept to modify perl6 to create executable binaries. This blog documents my journey and struggles. There are likely more correct and maintainable ways to do what I've done so far. If you know a better way, or if you see an inaccuracy, please let me know in the comments section.
This post has been incoming for a while, and as a result, has grown to a massive size. To make it a bit easier to digest, and more useful for those of you reading this hoping to find information about a particular aspect on module loading, I have chopped this post up into multiple smaller posts focusing on each of the smaller aspects.
In this particular post, I'm going to focus on how the dependencies required by your Perl 6 program are identified and how the dependency tree works.
This post has been incoming for a while, and as a result, has grown to a massive size. To make it a bit easier to digest, and more useful for those of you reading this hoping to find information about a particular aspect on module loading, I have chopped this post up into multiple smaller posts focusing on each of the smaller aspects.
In this particular post, I'm going to focus on how the dependencies required by your Perl 6 program are identified and how the dependency tree works.
Identifying the dependencies
An interesting feature I was accidentally triggering when running perl6 --compile=foo foo.pl6 gave me a bit of a head start in figuring out how the compiler was identifying which modules were dependencies of the given program. Let's say you have the following program, foo.pl6:
When you call perl6 --compile=foo foo.pl6 (or perl6 --target=mbc --output=foo.moarvm foo.pl6), you are greeted with the following output:
When you call perl6 --compile=foo foo.pl6 (or perl6 --target=mbc --output=foo.moarvm foo.pl6), you are greeted with the following output:
The TL;DR explanation is that this is the list of all of the programs dependencies and their locations on your local system. This information is not typically provided for humans, and is instead a generated string containing the needed location information for precompile's caller to load the modules into memory. Because we have requested by setting --target=mbc, and none of the executable code will be loaded into memory, this list is returned to us instead.
If you want a more thorough explanation of the exact structure of the precomp module code, please continue to read full steam ahead. If you're primarily interested in knowing why this is important for the purpose of module loading as it relates to the linker project I'm working on, skip down to Why is this information important?
If you go to /share/perl6/site/dist, you will find a number of files with frighteningly long names. Inside each of these files is a description of a different module which you have installed on your system. The one that describes Date::Names looks like this:
You'll notice in the "provides" section, it lists a number of subcomponents of the module and their file name on your system. You may notice that several of the file names in this list interestingly match up with the frightening beginnings of each of the lines in the dependency output that we produced earlier by calling perl6 --compile.
You'll notice that before the MoarVM header starts, there are several lines. The first two lines are still a bit of a mystery to me (if you know what they mean, please comment below and let me know!). The first number doesn't appear to correspond to any file or the contents of any file I can find. The second number appears in a file in /share/perl6/site/short/*/. The only contents of that file is the version of the module, the github user who produced it, the file name of Date::Names, and that second number.
Every number in the precompiled Date::Names file after those initial two mystery numbers corresponds to a dependency of Date::Names.
The first number present in the file is the same as one of the numbers present in the dependency string from Date::Names. I believe this may be for verification purposes, but I'm not entirely sure. (If you have any more insight on this, please let me know!)
The second number is can be found in a /share/perl6/site/short/*/ file similar to the one for Date::Names.
Every single one of the Date::Names subcomponents has a similar header before beginning the module.
The beginning of each outputted line gives us the name of a file in /share/perl6/site/precomp/<the_only_file_there>/.
This file contains:
I'm going to be posting another entry on how it's identified that a module needs to be loaded, and the actual process by which that module is loaded.
If you want a more thorough explanation of the exact structure of the precomp module code, please continue to read full steam ahead. If you're primarily interested in knowing why this is important for the purpose of module loading as it relates to the linker project I'm working on, skip down to Why is this information important?
Outline of Module Files
Let's go take a look at the definition of the module. If you want to follow along, use zef to install Date::Names. This is a module I arbitrarily picked to mess around with as it had example code for how I was supposed to use it, and is currently functional.If you go to /share/perl6/site/dist, you will find a number of files with frighteningly long names. Inside each of these files is a description of a different module which you have installed on your system. The one that describes Date::Names looks like this:
You'll notice in the "provides" section, it lists a number of subcomponents of the module and their file name on your system. You may notice that several of the file names in this list interestingly match up with the frightening beginnings of each of the lines in the dependency output that we produced earlier by calling perl6 --compile.
The Root Module File (Date::Names)
Now let's take a look at that first file, the one identified as plain old Date::Names. You can find it in /share/perl6/site/precomp/<the_only_other_file_in_there>/<first_two_digits_of_file_name>/<file_name>Every number in the precompiled Date::Names file after those initial two mystery numbers corresponds to a dependency of Date::Names.
Subcomponents (Date::Names::de)
Let's go and take a look at one of those dependency files, say, Date::Names:de.The first number present in the file is the same as one of the numbers present in the dependency string from Date::Names. I believe this may be for verification purposes, but I'm not entirely sure. (If you have any more insight on this, please let me know!)
The second number is can be found in a /share/perl6/site/short/*/ file similar to the one for Date::Names.
Every single one of the Date::Names subcomponents has a similar header before beginning the module.
Summary
Now that we know the basic file structure, let's backtrack to looking at that output we initially got.The beginning of each outputted line gives us the name of a file in /share/perl6/site/precomp/<the_only_file_there>/.
This file contains:
- A bit of information at the beginning of it to verify that it is the file you are looking for
- (Optional) A list of files it is dependent on (for example, what we saw in Date::Names)
- The MoarVM bytecode for the source file it corresponds to. (That source code can be found in /share/perl6/site/sources/ and will have the same file name as the precompiled bytecode file)
It is also worth noting the order of that initial output. Date::Names::de is listed first, followed by the other Date::Names subcomponents in the order that they appear as dependencies within the Date::Names file, and then Date::Names is listed last. This indicates that when use Date::Names was hit during compilation, it entered that file, sought to load each of its dependencies, and then loaded Date::Names itself. This indicates that I can, for the time being, rely on that produced list as a complete list of all dependencies for the program, nested or otherwise.
Why all this information useful?
Having this output and understanding its meaning has a couple important effects. I now have a way to identify all the external dependencies of the user program in question. It also means that I know where to retrieve all of those dependencies from. Finally, it means that I know a bit about the internal structure of those retrieved files, so that if they need to be altered in the future to make them locatable and/or runnable by the ./foo executable, I will have an idea of what sorts of changes to make.
I'm going to be posting another entry on how it's identified that a module needs to be loaded, and the actual process by which that module is loaded.
Comments
Post a Comment