Project - Stage 1: Create a Basic GCC Pass

Project - Stage 1: Create a Basic GCC Pass

In this project stage 1, I am creating and adding a basic GCC pass into the GCC compiler.

What is a pass?

Based on my understanding, a pass is a set of operations or rules that transform the source code during compilation, optimizing it for better efficiency so that the compiler can convert the code into binary files.

However, before diving into more complex passes that optimize the code, we need to first understand how to create a basic pass. To start, I will create a simple pass that performs the following operations:

  1. Iterates through the code being compiled
  2. Prints the name of every function being compiled
  3. Prints a count of the number of basic blocks in each function
  4. Prints a count of the number of GIMPLE statements in each function.

Before Creating a pass

We must ensure that we use a custom GCC build instead of the system's GCC to avoid causing errors in the system compiler, as we are conducting experiments. If you do not have a custom GCC, refer to Lab 4, which I have done earlier.

Creating a pass

First, we need to navigate to the directory where the passes are stored. The passes are located in the gcc/gcc/ folder within your GCC source code directory. For example, my path is:
cd ~/gcc/gcc
I created a C++ file called pass_project.cc which is my custom pass.
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tree.h"
#include "tree-pass.h"
#include "cgraph.h"
#include "function.h"
#include "basic-block.h"
#include "gimple.h"
#include "gimple-iterator.h"
#include "cfg.h"

namespace{

const pass_data pass_data_project = {
        GIMPLE_PASS,    /* type */
        "pass_project", /* name */
        OPTGROUP_NONE,  /* optinfo_flags */
        TV_NONE,        /* tv_id */
        PROP_cfg,       /* properties_required */
        0,              /* properties_provided */
        0,              /* properties_destroyed */
        0,              /* todo_flags_start */
        0,              /* todo_flags_finish */
};

// Pass class
class pass_project : public gimple_opt_pass {
public:
        // Constructor
        pass_project(gcc::context* ctxt) : gimple_opt_pass(pass_data_project, ctxt){};

        unsigned int execute(function* func) override{
                struct cgraph_node* node;
                int func_cnt = 0;

                // Iterate functions
                FOR_EACH_FUNCTION(node){
                        // Iterate basic blocks of the function
                        int bb_cnt = 0, gimple_stmt_cnt = 0;
                        basic_block bb;
                        FOR_EACH_BB_FN(bb, func){
                                bb_cnt++;

                                // Iterate GIMPLE statements in the basic block
                                for(gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){
                                        gimple_stmt_cnt++;
                                }
                        }

                        if(dump_file){
                                fprintf(dump_file, "=== Function %d Name '%s' ===\n"
                                                "=== Number of Basic Blocks: %d ===\n"
                                                "=== Number of GIMPLE statements: %d ===\n\n", ++func_cnt, node->name(), bb_cnt, gimple_stmt_cnt);
                        }

                }
                if(dump_file){
                        fprintf(dump_file, "\n\n### End diagnostics, start regular dump of current gimple ###\n\n\n");
                }
                return 0;
        }
};

}

//Custom pass creation function
gimple_opt_pass* make_pass_project(gcc::context* ctxt){
        return new pass_project(ctxt);
}
This pass has three parts.
1. The GIMPLE pass data structure
const pass_data pass_data_project = {
        GIMPLE_PASS,    /* type */
        "pass_project", /* name */
        OPTGROUP_NONE,  /* optinfo_flags */
        TV_NONE,        /* tv_id */
        PROP_cfg,       /* properties_required */
        0,              /* properties_provided */
        0,              /* properties_destroyed */
        0,              /* todo_flags_start */
        0,              /* todo_flags_finish */
};
2. Pass body
// Pass class
class pass_project : public gimple_opt_pass {
public:
        // Constructor
        pass_project(gcc::context* ctxt) : gimple_opt_pass(pass_data_project, ctxt){};

        unsigned int execute(function* func) override{
                struct cgraph_node* node;
                int func_cnt = 0;

                // Iterate functions
                FOR_EACH_FUNCTION(node){
                        // Iterate basic blocks of the function
                        int bb_cnt = 0, gimple_stmt_cnt = 0;
                        basic_block bb;
                        FOR_EACH_BB_FN(bb, func){
                                bb_cnt++;

                                // Iterate GIMPLE statements in the basic block
                                for(gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){
                                        gimple_stmt_cnt++;
                                }
                        }

                        if(dump_file){
                                fprintf(dump_file, "=== Function %d Name '%s' ===\n"
                                                "=== Number of Basic Blocks: %d ===\n"
                                                "=== Number of GIMPLE statements: %d ===\n\n", ++func_cnt, node->name(), bb_cnt, gimple_stmt_cnt);
                        }

                }
                if(dump_file){
                        fprintf(dump_file, "\n\n### End diagnostics, start regular dump of current gimple ###\n\n\n");
                }
                return 0;
        }
};
3. Pass Creation Function
//Custom pass creation function
gimple_opt_pass* make_pass_project(gcc::context* ctxt){
        return new pass_project(ctxt);
}

Inserting the custom pass into the GCC building sequence

Since the GCC build will not automatically recognize the new passes, we need to manually insert the pass into the build sequence. This requires modifying three files: passes.defMakefile.in, and tree-pass.h. All of these files are located in ~/gcc/gcc subdirectory.

passes.def

passes.def is the pass manager, which defines the order in which GCC runs passes during the compilation process. We need to insert our pass into the appropriate sequence. For instance, I added my pass at the end of the all_lowering_passes sequence using the following line:
NEXT_PASS (pass_project);


Makefile.in

Makefile.in is a template used by the configure script to generate the final Makefile. For example, when we configure the build using the command:
~/gcc/configure --prefix=$HOME/gcc-test-001
It generates the final Makefile in the build folder, which simplifies the process of building GCC using commands like make or make install. We modify Makefile.in by adding the object file of our pass. For example, if my pass file is pass_project.cc, I need to add its corresponding object files, pass_project.o, to the OBJS list.
pass_project.o \
Makefile.in has so many lines. How can I locate the OBJS list faster? Try searching in vim!
Open the Makefile.in with
vi Makefile.in
Press / to enter search mode and type the words we want to find.
/OBJS =
Press n to go to the next match and Shift + N to go to the previous match until finding the OBJS list.

I also inserted my pass to the end of the list.

tree-pass.h

tree-pass.h acts as a registry header that lists all the passes, ensuring the GCC pass manager can recognize and work with them. We declare our pass in this header to make it visible and available to the pass manager. Therefore, we need to insert our pass inside the struct register_pass_info. Since our pass is a GIMPLE pass, I inserted it at the end of the GIMPLE passes, just before the IPA passes. To insert our pass, I follow the declaration format:
extern gimple_opt_pass *make_pass_project (gcc::context *ctxt);

Rebuild GCC

Alright! Everything is ready. It is time to rebuild the GCC and test the result. Do you remember we have made some changes to Makefile.in? We need to generate a new Makefile that includes our custom pass. The easiest way is to remove the Makefile in the build folder and re-generate the Makefile by configuration script.
rm Makefile
~/gcc/configure --prefix=$HOME/gcc-test-001
Then, we run the following command to rebuild the GCC:
time make -j$(nproc) |& tee build.log
This command is the same as the one we used when initially building GCC. It records the build time and saves the full build log to a file named build.log.
The build was completed in just 12m32sec, which is quick!
Oh, it was because it reported 2 errors which caused the process to terminate.

Let us dive into the build.log to see what caused this.
less build.log
The errors are:
make[1]: *** [Makefile:14721: configure-c++tools] Error 1
make[1]: *** [Makefile:4029: configure-fixincludes] Error 1
The errors were reported by the Makefile as two configuration step failures. After some research, I found that these errors likely occurred because I modified important build system files. When changes are made to key build system files, it's necessary to clean the build and rebuild from scratch to avoid conflicts. To fix this, I deleted the gcc-build-001 folder and restarted the entire build process from the beginning.
mkdir ~/gcc-build-001
cd ~/gcc-build-001
~/gcc/configure --prefix=$HOME/gcc-test-001

time make -j 24 |& tee build.log
The fresh build took 47m13sec to complete and it did not report errors.

Then, we need to install the build by:
make install

Testing result

To test the result, we need to make sure we use our custom build GCC instead of the system GCC. We change the GCC by changing the PATH variable to include our custom build bin directory.
PATH=$HOME/gcc-test-001/bin:$PATH
Check which GCC we are using by:
which gcc
You will see the pass is our gcc-test-001/bin/gcc when the system is using our custom GCC build.

Next, we create a simple C file or C++ file with some functions.
For example, I created a C program file with the following code:
#include <stdio.h>

char* getName();

int main(){
        printf("Hello, I am %s\n", getName());
        return 0;
}

char* getName(){
        return "Wing Ho Chau";
}
It is a simple program that prints out my name. the printf() function will call a self define function getName() to return my name. There are three functions in this file, main(), printf(), and getName().

Since our pass writes its output to a dump file, we need to generate the dump file during compilation to check the results. To do this, we use specific flags when compiling the code. We can output all dump files by using the -fdump-tree-all flag, or we can generate the dump file for a specific pass using the -fdump-tree-your_pass_name flag. For example:
gcc -Wall -g -fdump-tree-pass_project hello.c -o hello
Then, we will see a dump file named hello.c.019t.pass_project appear.

To read the dump file, we use:
cat hello.c.019t.pass_project

We can see that the function names, the number of basic blocks, and the number of GIMPLE statements are printed out as expected. This confirms that we have successfully created our custom pass and integrated it into the GCC build.

Reflection

I found Project Stage 1 quite challenging because there wasn't enough documentation, and I was unsure about which header files to include in my pass. There are so many header files, and it was difficult to know which ones are necessary for my pass to work. To figure it out, I decided to look at other passes to see which headers they included. It became a trial-and-error process where I tried to build GCC, encountered errors, and then reviewed the errors to understand which headers were missing.

Although this process was frustrating at times, I learned a lot. I now understand that a pass is a set of operations or rules that transform the source code during compilation to make it more efficient. This allows the compiler to convert the code into binary files. I also learned that when I make significant changes to the build system, I need to rebuild GCC from scratch. This experience has helped me become more familiar with how GCC works and how to troubleshoot issues during the build process.

Comments

Popular posts from this blog

Lab 4 - GCC Build