Crules and Intermediate Representation

If you follow my blog you may have noticed i have been quiet and thats because i’ve been working on http://crules.org Its the new homepage for my project Crules, its got some good content on it but nothing really on the language yet. Though today i spend a very long time on the HACKING article And i explain a lot on how Intermediate Representation’s work and 3-address code.

So lets just take an extract since i am lazy: So in compilers or interpreters how do we represent:

  1. (( x + y ) -- ( ( x + y ) * ( x -- y ) ) ) + ( ( x + y ) * ( x -- y ) )

Lets see it in a diagram first then in IR code:

Expression

Expression

So what is this? Well early on when i was getting into compiler construction when doing semantic analysis you will need to find a way of representing these semantics for your language, and you will find generally that there are 3 things you need to know: DESTINATION OPERAND A and OPERAND B. This is what’s called 3-address code. This is what sets compilers apart and can make or break them! Its what gives implementations flexibility and the ability to ‘control the flow‘ of execution! Any good compiler book will give you more detailed discussion on this topic its my main interest in compilers since this is what gives an implementation its logic. So lets give it a syntax we can write down to illustrate how mine works: it just so happens a LISP ‘ish syntax can show this very well!

  1. ( ( IDENTIFIER ( TYPE ) ) => ( OPERAND A ) => ( OPERAND B ) );

Since the parser is precedence aware it will construct the tree as shown in a symbol language which looks like this if it was output ( outputting the IR like this is a future TODO ); It may look a little like GIMPLE but this IR I have as I will demonstrate more examples can encompass much much more.. ( this is a 3-address code by the way )

  1. ( ( NIL ( OP_ADD ) )
  2.   => ( ( ( NIL ( OP_SUBTRACT ) )
  3.     => ( ( ( NIL ( OP_ADD ) )
  4.       => ( ( ( NIL ( SYMBOL_ACCESS ) )
  5.         => ( ‘x’ )
  6.         => ( NIL )
  7.       ); )
  8.       => ( ( ( NIL ( SYMBOL_ACCESS ) )
  9.         => ( ‘y’ )
  10.         => ( NIL )
  11.       ); )
  12.     ) ; )
  13.     => ( ( ( NIL ( OP_MULTIPLY ) )
  14.       => ( ( ( NIL ( OP_ADD ) )
  15.         => ( ( ( NIL ( SYMBOL_ACCESS ) )
  16.           => ( ‘x’ )
  17.           => ( NIL )
  18.         ); )
  19.         => ( ( ( NIL ( SYMBOL_ACCESS ) )
  20.           => ( ‘y’ )
  21.           => ( NIL )
  22.         ); )
  23.       ); )
  24.       => ( ( ( NIL ( OP_SUBTRACT ) )
  25.         => ( ( ( NIL ( SYMBOL_ACCESS ) )
  26.           => ( ‘x’ )
  27.           => ( NIL )
  28.         ); )
  29.         => ( ( ( NIL ( SYMBOL_ACCESS ) )
  30.           => ( ‘y’ )
  31.           => ( NIL )
  32.         ); )
  33.       ); )
  34.     ); )
  35.   ); )
  36.   => ( ( ( NIL ( OP_MULTIPLY ) )
  37.     => ( ( ( NIL ( OP_ADD ) )
  38.       => ( ( ( NIL ( SYMBOL_ACCESS ) )
  39.         => ( ‘x’ )
  40.         => ( NIL )
  41.       ); )
  42.       => ( ( ( NIL ( SYMBOL_ACCESS ) )
  43.         => ( ‘y’ )
  44.         => ( NIL )
  45.       ); )
  46.     ); )
  47.     => ( ( ( NIL ( OP_SUBTRACT ) )
  48.       => ( ( ( NIL ( SYMBOL_ACCESS ) )
  49.         => ( ‘x’ )
  50.         => ( NIL )
  51.       ); )
  52.       => ( ( ( NIL ( SYMBOL_ACCESS ) )
  53.         => ( ‘y’ )
  54.         => ( NIL )
  55.       ); )
  56.     ); )
  57.   ); )
  58. );
busy!

busy!

Though one big note its is ok for me to leave the representation like this since in my interpreter i just evaluate this at run time but how does this affect things like code-generators like proper Compilers? Well its quite simple in a sane compiler like GCC ( although most compilers will follow this idiom i am sure) introduce another layer for IR called RTL ( Register Transfer Language ). The wikipedia article lacks and most compiler books overlook this section very much since, most books are written by the academics and generally stay away from the back-end of compilers since that is the actual hard part since a code-gen is very specific to the instruction set your targeting and register allocation algorithms are very difficult and is a huge research project.  Its said in computer science the most difficult things to build software wise is a good JIT or any Code-Generator for that matter.  Anyways back to the problem lets take a smaller expression for sleep’s sake the expression:

  1. unsigned int retval = ( x * 2 ) / ( 7 + y );

Will generate something like:

  1.  
  2. t1 = x * 2;
  3. t2 = 7 + y;
  4. t3 = t1 / t2;
  5. retval = t3;
  6.  

Anyways you’ll notice in my the IR keywords such as SYMBOL_ACCESS or SYMBOL_ITEM or OP_ADD etc, these are what’s called OP_Codes and are generally represented by some symbolic hex values since its cheap to check against for the implementation to figure out what to do with symbol ‘x’; So ok thats a quick and dirty intro to Intermediate Representation’s!

I think this lolcat will sum up your feelings at the end of reading this blog post:

And i started getting out my old grunge cd’s which include my Silverchair collection and this is on of my favourite bands of all time and this is a pretty good song!

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>