Gccpy – Call for help!

Hey guys, so its been a while I’ve got a lot of stuff done and many core things working in Gccpy now more links on how to get up and running with it soon.

I am asking for help from you or from your friend.. anyone, all you need is:

all minezz

all minezz

  • 5 minutes
  • text editor
  • basic knowledge in python

I am trying to build up a test suite for this compiler and i would love to have it all built from scratch for this project rather than taking something from CPython where i think licenses will clash and i want to avoid anything like that. So what does this all mean?

I need people to write up their favourite python snippets, but there are requirements: what you can do is use anything EXCEPT any imports. Imports are something which will be handled soon, and isn’t that important  yet for the core language implementation.

So I have set-up this mailing list and i would love it if as many of you would join: http://crules.org/cgi-bin/mailman/listinfo/gccpy

So join the mailing list or simply post a message if you prefer and we will try to remember to cc you with in. One thing i would ask is only submit a small test case if you are ok with this being pretty much un-licensed because GNU projects require copy approval for you to submit code, but we can negate this by going though me just submit this to me under a http://sam.zoy.org/wtfpl/ (Whatever you want to do license) . This just saves any pain of licensing, you will of course still be attributed! Leave your name and an email within the test-case file, since really that’s all we care about as hackers anyway we don’t care about licensing so long as we have some attribution. To give you a flavour of how simple these test cases can be lets look at one i made earlier:

  1. # DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
  2. #  Version 2, December 2004
  3. #
  4. # Copyright (C) 2004 Sam Hocevar
  5. #
  6. # Everyone is permitted to copy and distribute verbatim or modified
  7. # copies of this license document, and changing it is allowed as long
  8. # as the name is changed.
  9. #
  10. # DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
  11. # TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
  12. #
  13. # 0. You just DO WHAT THE FUCK YOU WANT TO.
  14. #
  15.  
  16. # @Author redbrain – redbrain@gcc.gnu.org
  17. # @Date 13/8/10
  18. # @Expected Result: <5>
  19. #   -Tested against Python Version <2.6.5>
  20.  
  21. def foo ( x , y ):
  22.     return x+y
  23.  
  24. print foo( 2,3 )

Things for testing the expression syntax handling or calling functions, parameter passing like keywords parameters and positional parameters, class’s anything! Even if it is as small as one expression and a print for the result it is more than enough for a test case! The reason I want as many people involved is so i can see what features/trends matter most important to python users and then I can focus on making them extra awesome. And if it only takes you 5 minutes to write a small piece of python code and send an email while you get your name within the Gcc sources the more the better:).

When you create a test case simply send an email to the mailing list with the subject “Test Case <name>”. You can submit as many as you like the more the better  and the funnier the better…. * looks at Jezra :) ! @yamatt i am reusing that python while loop you sent me ages ago to freak out my processor! ;) Thanks so much every one, watch this space in the next week for links on how to use and see your test case running in Gccpy, so you can compile your python code to an executable!

So remember the mailing list is over here. http://crules.org/cgi-bin/mailman/listinfo/gccpy

Compiling a Compiler – GCC

From all the work i have been doing on compilers recently i have started to realise a lot of people won’t know how to compile up GCC or even how that works. Since how are you meant to compile a compiler written in a language that itself implements. Well the technique is called bootstrapping. But this is something people try and think its more complicated that it usually is. To understand how it works we have to think how was GCC built; well its as simple as it NEED’s a C compiler :-) . But how did that first c-compiler get there in the first place, well it goes back years of people made the first c compilers in Assembler and then it bootstrapped up!

Within GCC when you start the build we bootstrap 3 times, which means we build it 3 times but why you ask? The generated code for the first compiler run, is by the host compile so the compiler itself cannot be trusted, then the 2nd is done with the newly built compiler and the 3rd for good measure to guarantee we’ve compiled with the new compiler and GCC itself should be running off its own code. Then you can look at something like the GHC the Glasgow Haskell compiler which is written in its own language, but the thing they don’t say is that it is written in a some-what simplified dialect of Haskell and bootstraps off a small amount of C code if you don’t already have a Haskell implementation already present. Which is similar to the bootstrapping languages of GCC which use C90 to maintain language portability.

Then you can look at something like what if someone writes a new operating system how do you port a compiler and libc over to the new system. Well this is done by a mixture of cross complication and complicated kernel work which can take quite some time to get going correctly.

But I digress as usual now onto a little tutorial how do we compile GCC well yes it uses the ./configure, make, make install build system. But it doesn’t quite work like this so we do it slightly different.

First to run experimental I prefer to use git:

# to get my gccpy work use this git repo it merges with gcc master at least weekly into master

$ git clone git://crules.org:gcc-dev.git

# or run the gcc git mirror:

$ git clone git://gcc.gnu.org/git/gcc.git

Now compiling time:

$ aptitude install libgmp3-dev libmpfr-dev

# GCC now requires libmpc which you may need to compile and install from:

# http://www.multiprecision.org/index.php?prog=mpc

$ cd gcc

$ mkdir build

$ cd build

Useful configure options

$ ../configure

–enable-gold | use the GOLD linker

–disable-bootstrap | useful for gcc developers not recommended for users or for your first build

–enable-languages=c,c++,python,gccgo …

….

–with-gmp-lib=/usr/local/lib

–with-mpfr-lib=/usr/local/include

-with-mpc/mpfr …. the same optiosn to point to different library locations.

Now to build:

$ make

$ make install

And you should be good to go. We have to do our build in a separate directory due to the size of GCC some components don’t like being built directly.

Reference Counted Garbage Collector!

Garbage collection is a subject we all wish we knew more about, many software developers think they understand the concept fully and think building it should then be a trivial task. Building a ref-counted garbage collector, was much much more painful than i ever expected in crules. I am going to focus on how this work in an interpreted language and maybe a few months on how it will work in GccPy an Ahead of time implementation of Python on GCC.

So what is reference counting, take for instance the expression in a dynamic language:

x = 2 + 3 + 4 + 5;

So what happens in this expression in an interpreter? You can tell ‘x’ is a variable which will store the result of the expression ‘2 + 3 + 4 + 5′. Well to understand this simple example you first have to look at how this expression is evaluated, for instance these values ‘2,3….’ are still literals at this parse stage. What that means is they are numbers but dont paticularly have any meaning within the language yet, for instance python and crules need to ‘fold’ this literals into objects @see

`extern crl_symbol_obj * crl_rr_literal_fold( crl_symbol_obj * lit, crl_context_table * context )` in rr_bin_eval.c

In this case it looks at the literal object and then looks up for the apropriate object definition to create the Integer Object so the language knows how to do addition subtraction and some other member functions etc.

So ok lets look at this at a more semantic level what will happen is:

t1 = 2;
t2 = 3;
t3 = 4;
t4 = 5;

These address’s ‘t#’ are not actually accessible within the language its just a way of showing these are folded and we have addresses to them. Then we will have:

t5 = t1 + t2;
t6 = t5 + t3;
t7 = t6 + t4;

And now t7 should have the result of: 14 . And now finally:

x = t7;

So ok now you see how this is evaluated within at least a semantic context, what is ref counting?! So you can notice to get that simple result of 14 we had to create quite a few tmp objects as ‘t#’ But how does the runtime, tell which are temporary and which are still needed. Its as simple as adding in a ’signed long refcount’ to your Intermediate Representation.

In crules the notion of a context is the key to understanding how it all works so for each of these folded objects we create them with:

`Crl_Symbol_Init_Ctx( retval, context );` which is a pre-processor macro to initialize the crl_symbol_obj * retval, within this context. So what that is need for is when the garbage collector is invoked we need to be able to sweep over something to look for dead references in other words any objects with reference count ‘<= 0′ i say <= 0, and not simply zero because of how context poping and pushing can work and you can have a ref count of -2 in some cases but that’s besides the point.

So when this object is created within a context the garbage collection is able to sweep over a stack of context’s and look decide what needs deleted, so lets
look at the ref counting itself now.

Semantic | Refcount
——————-
t1       | 1
t2       | 1
t3       | 1
t4       | 1
t5       | 0
t6       | 0
t7       | 0

This shows the resulted ref counts after this is run:

t1 = 2;
t2 = 3;
t3 = 4;
t4 = 5;

Now lets look at what happens for:

t5 = t1 + t2

Semantic | Refcount
——————-
t1       | 2
t2       | 2
t3       | 1
t4       | 1
t5       | 1
t6       | 0
t7       | 0

So this is where it gets a little more tricky once that t5 has been evaluated t1 and t2 are de-referenced again.

Semantic | Refcount
——————-
t1       | 1
t2       | 1
t3       | 1
t4       | 1
t5       | 1
t6       | 0
t7       | 0

So the same idea happens for t6 and t7, and then we have the resultant table of:

Semantic | Refcount
——————-
t1       | 1
t2       | 1
t3       | 1
t4       | 1
t5       | 1
t6       | 1
t7       | 1

But now we know by common sense t1->t6 are all gabage its only t7 which is of any interest since this is what the variable ‘x’ now points to. But this is kind of down to how you implement it how you dec-rement those ref counts when i run in crules they are handled after each use.

Take a peek at rr_bin_eval.c and rr_math.c and rr_runtime.c in crl_rr_evaluate_expression( … )

So ok we should now have the table of:

Semantic | Refcount
——————-
t1       | 0
t2       | 0
t3       | 0
t4       | 0
t5       | 0
t6       | 0
t7       | 1

The key thing to remember is we dont have ref counts on the variable ‘x’ its what x points to! This is the whole key idea of how a dynamic type system works.

Now in crules the garbage collector is invoked when we have executed a full procedure, so in this case if i was to feed in directly this expression into the interpreter the garbage collector would be invoked if this was part of a function or procedure it would be executed once we hit the end of the block.

So now what comes in the idea of Mark-Sweep, this is probably what you understand of a mark sweep algorithm it sweeps over the stack and heap looking for dead refernces. So ok that makes sense but in the end its horrible vague and if you are to do that its an incredibly linear way of doing things. In essence crules has a mark sweep algorithm. But what really happens is i sweep and mark then free.

So to invoke the garbage collector in crules we call:

`extern void crl_garbage_invoke_sweep( crl_context_table * const );`

So what happens in gg_garbage.c, loops through each context in the context_table stack and looks through the symbol_stack and checks for objects of <= 0 ref count and marks them as garbage and sets the vector entry to NULL so it cant be collected again and result in a double free etc. Then once that is once it automaticaly invokes the garbage collector and runs over a vector of marked objects and free’s them one by one this could be speed up with pthreads but i havent got around to doing that.

This does not mean if your writing a builting object type or module as a dynamic lib .so, you have to invoke the garbage collection yourself you dont and if your writing crules code you dont have to do anything its all automatic except for the real intrinsic bits and pieces to the runtime within the interpreter.

Ok this is a good overview of ref counting but there is more lets take the example:

x = 2;
y = x;
z = y;

So as before we fold the literal ‘2′:

t1 = 2;
x  = t1;            t1 – ref count 1
y  = x = t1;        t1 – ref count 2
z  = y = t1;        t2 – ref count 3

Remember what we check is what the accessors point to, we never have a ref count on x or y or z its the value t1. So we have 3 objects simply pointing to the same object, We cannot free t1 at all! or we will surely break something! ;)

We can only safely delete this if each of these accessors went out of scope in the current context example if we stopped the session. we can decrement. But to understand what i mean more deeply we have to look at how functions and argument passing works and how the context stack works.

So i will finish this in a 2nd part tutorial soon :) in a review of how crules internals work. I hope some of you are still awake!