Parsing the Doom Wad File

So it was a fine Christmas, I was at home with family armed only with 2 seasons of xFiles and my laptop added to the factI didn’t upload my ssh key to my laptop so hacking on Crules was out the window. So it got me thinking on the main goals for my Crules Scripting Language, one of which ( the main one ) is that it should be embeddable into existing C/C++ applications like Lua or Python or even C# (mono) if your using its recently added reflection api!

So why not make a game engine using my language with crules as a scripting language so I can see what the API _SHOULD_ look like for the client. So i choose doom since i am terrible with graphics, i can use the doom.wad and concentrate on programming; So in the past i have written two separate 2D game engines one was a Java 2D Monkey Island clone i done for university and it was terrible since Java is a terrible platform in my opinion, then my 2nd was a port  of it over to C++/SDL and the code was terrible ;) so don’t bother asking me about it.

Anyway’s so the long and short i am writing doom engine from scratch, my original idea was to work with an existing game engine like Crystal Space or Doomsday Engine though, Crystal Space i was put off since it already uses Python and i don’t really feel like trawling though lots of code and forking it essentially would be really more work than its worth. Doomsday Engine is the cream of the crop of Doom Ports at the moment, all others are practically the same in essence using the same ancient code and hacks from throughout the 90′s. And really the technology has changed somewhat since then. Though Doomsday Engine is pretty amazing and works well, it seems to need a massive cleanup and i don’t mean to be nasty on the developers or anything they have done a fantastic job!  And they have really made the code much cleaner than any other port about! Plus the lead developer has been really nice to me ;) . Anyway’s Games have always been a great passion to me and really doom can’t be that hard to re-implement from scratch and really it isn’t i already have a basic game engine working using OpenGL and SDL{ _ttf, _mixer }. With audio, and fps counter and input bla bla. But i am just concentrating on differentiating the wad sprite formats etc while revising for exams and working on crules and many other bits and pieces. So in the end  i’ll just stick in an extract of the log of my game engine when it starts parsing the doom.wad and explain a little on how to parse the wad!

debug: main.c:28 -> Trying to start crldoom wad: data/doom.wad!
debug: doom.c:147 -> parsing wad file <data/doom.wad>!
debug: doom.c:80 -> is a wad file :: <IWAD>!
log: doom.c:155 -> wad file data/doom.wad has length 12408292!
debug: doom.c:27 -> debug seeking…
debug: doom.c:30 -> reading….
debug: doom.c:34 -> parsing…
log: doom.c:49 -> header:: ident -> 1145132873, n_lumps -> 2306, infotableofs -> 12371396!
log: doom.c:62 -> directory:: filepos -> 12, size -> 10752, name -> 1497451600, ident -> PLAYPAL!
log: doom.c:62 -> directory:: filepos -> 10764, size -> 8704, name -> 1330401091, ident -> COLORMAP!
log: doom.c:62 -> directory:: filepos -> 19468, size -> 4000, name -> 1329876549, ident -> ENDOOM!
log: doom.c:62 -> directory:: filepos -> 23468, size -> 6854, name -> 1330464068, ident -> DEMO1!
log: doom.c:62 -> directory:: filepos -> 30324, size -> 9402, name -> 1330464068, ident -> DEMO2!
log: doom.c:62 -> directory:: filepos -> 39728, size -> 15466, name -> 1330464068, ident -> DEMO3!
log: doom.c:62 -> directory:: filepos -> 55196, size -> 3286, name -> 1330464068, ident -> DEMO4!
log: doom.c:62 -> directory:: filepos -> 58484, size -> 0, name -> 827142469, ident -> E1M1!
log: doom.c:62 -> directory:: filepos -> 58484, size -> 1430, name -> 1313425492, ident -> THINGS!
log: doom.c:62 -> directory:: filepos -> 59916, size -> 6804, name -> 1162758476, ident -> LINEDEFS!
log: doom.c:62 -> directory:: filepos -> 66720, size -> 19980, name -> 1162103123, ident -> SIDEDEFS!
log: doom.c:62 -> directory:: filepos -> 86700, size -> 1880, name -> 1414677846, ident -> VERTEXES!
log: doom.c:62 -> directory:: filepos -> 88580, size -> 8964, name -> 1397179731, ident -> SEGS!
log: doom.c:62 -> directory:: filepos -> 97544, size -> 956, name -> 1128616787, ident -> SSECTORS!
log: doom.c:62 -> directory:: filepos -> 98500, size -> 6664, name -> 1162104654, ident -> NODES!
log: doom.c:62 -> directory:: filepos -> 105164, size -> 2288, name -> 1413694803, ident -> SECTORS!
log: doom.c:62 -> directory:: filepos -> 107452, size -> 968, name -> 1162495314, ident -> REJECT!
log: doom.c:62 -> directory:: filepos -> 108420, size -> 6948, name -> 1129270338, ident -> BLOCKMAP!

So this is just some of the output of the code i had written it parses out whats called the ‘directories’ in the file and then i also have functions to parse out whats called the ‘lumps’ from these directories the lumps are the actual data. I’ll explain more detail when i talk about how to parse this. And how to convert the music lumps to proper midi so you can play it in SDL_mixer or even in Totem if in in gnome.

So what do we need, if you have yourself a Doom.wad file lying about i am sure if you go through your old boxes you’ll find yourself a copy of doom, or if you cant find it you can find it on torrent sites but you didn’t read that here ;) . Thing is I have bought the game like 3/4 times I am sure hehe. Lets get started.

In the Doom wad there is whats called the ‘HEADER’ which is of length 12 bytes which contains 3 * 4-byte integers which contain the data we care about, so lets do that lets make some code (i just wrote this code in like 10 min to illustrate the idea its ok does the job!):

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4.  
  5. #define WAD_HEADER_LENGTH     12
  6. #define WAD_DIRECTORY_LENGTH  16
  7.  
  8. unsigned long parse_int( unsigned char *p )
  9. {
  10.   return ( (unsigned long)   p[0]
  11.            | (unsigned long) p[1] << 8
  12.            | (unsigned long) p[2] << 16
  13.            | (unsigned long) p[3] << 24
  14.            ) ;
  15. }
  16.  
  17. int main( int argc, char *argv[] )
  18. {
  19.   const char* wad_file= "/home/redbrain/workspace/doom-dev/crldoom/data/doom.wad";
  20.   FILE* wad_fd;
  21.   if( !(wad_fd= fopen(wad_file, "rb")) )
  22.     {
  23.       fprintf(stderr, "error opening <%s>!\n", wad_file);
  24.       return EXIT_FAILURE;
  25.     }
  26.  
  27.   unsigned char header_buffer[ WAD_HEADER_LENGTH ];
  28.   fread( header_buffer, WAD_HEADER_LENGTH, 1, wad_fd );
  29.  
  30.   char* wad_ident= (char*) header_buffer;
  31.   wad_ident[ 4 ]= \0;
  32.   if( strncmp( wad_ident, "IWAD", 4 ) )
  33.     {
  34.       fprintf(stderr, "invalid wad header type <%s>!\n", wad_ident );
  35.       return EXIT_FAILURE;
  36.     }
  37.   else
  38.     {
  39.       printf("doom wad is a <%s>!\n", wad_ident );
  40.     }
  41.  
  42.   unsigned long wad_length= 0;
  43.   fseek( wad_fd, 0, SEEK_END );
  44.   wad_length= ftell( wad_fd );
  45.  
  46.   unsigned long directory_offset= parse_int( header_buffer+8 );
  47.   unsigned long number_lumps= parse_int( header_buffer+4 );
  48.  
  49.   printf("wad directory offset <%lu> with <%lu> lumps!\n",
  50.    directory_offset, number_lumps );
  51.  
  52.   unsigned long t_ofs= directory_offset; unsigned long lump_count= 0;
  53.   while( t_ofs <= ( wad_length -- WAD_DIRECTORY_LENGTH ) )
  54.     {
  55.       fseek( wad_fd, t_ofs, SEEK_SET );
  56.       unsigned char *directory_buffer= (unsigned char*)
  57.         malloc( sizeof(char) * WAD_DIRECTORY_LENGTH );
  58.       fread( directory_buffer, WAD_DIRECTORY_LENGTH, 1, wad_fd );
  59.  
  60.       unsigned long filepos= parse_int( directory_buffer );
  61.       unsigned long size= parse_int( directory_buffer+4 );
  62.       char *directory_ident= strdup( (char*) directory_buffer+8 );
  63.       directory_ident[ strlen(directory_ident) ]= \0;
  64.  
  65.       printf("directory name <%s> at offset <%lu> with size <%lu>!\n",
  66.        directory_ident, filepos, size );
  67.  
  68.       free( directory_ident ); free( directory_buffer );
  69.       t_ofs += WAD_DIRECTORY_LENGTH; lump_count++;
  70.     }
  71.  
  72.   printf("directorys parsed <%lu> total lumps <%lu>!\n",
  73.    lump_count, number_lumps );
  74.   fclose( wad_fd );
  75.  
  76.   return 0;
  77. }

So lets compile and test this code <remember to change the path to your doom wad>:

gcc doom_wad_test.c

./a.out

So right what does any of this code mean i just chucked it up there, i wrote it up very quickly from scratch. So as i stated earlier there is a wad header which is a 12 bytes in length which i refer to as ‘WAD_HEADER_LENGTH’ which contains 3 * 4-byte integers. So from the code we have fopen’d the wad_file and i use fread to read in the 12 bytes in a 12 byte ‘unsigned char[]‘ buffer. So to read out the integers in a neutral endian way i found a nice parse_int function in some of the wad documentation i found and in old doom source code which was GPL so it was like…. yoink… but its pretty easy to understand if you know your representations well enough but i wont go into detail.

The first integer is the IWAD or PWAD string, IWAD is what you will find mostly its the deployment version of the wad, the PWAD is a patch wad and very few of them are still around and i think it has similar structure. Anyways, you parse out the first integer like this

  1. char* wad_ident= (char*) header_buffer;
  2. wad_ident[ 4 ]= \0;

Remember to add null string ‘\0′, so we don’t get buffer overflow! This string is always a length of 4 so no need to mess about! And then the next integer is the number of lumps in the wad the lumps are the actual binary data like music and graphics. Third int is the ‘directory offset’ which contains pointers to where the lumps are their idents and the size of the lumps which have lengths of 16 ‘WAD_DIRECTORY_LENGTH’ until the end of the wad file! so note i find the length of the file and do:

  1. while( t_ofs <= ( wad_length -- WAD_DIRECTORY_LENGTH ) )
  2.     {
  3.     }

So how do we parse out these directories? We move to the directory offset and then read the 16 bytes into a buffer, which is 4 integers this time. But still contains only 3 things means the ident strings can be longer or can be with some padding between each directory lump.

  1. unsigned long filepos= parse_int( directory_buffer );
  2. unsigned long size= parse_int( directory_buffer+4 );
  3. char *directory_ident= strdup( (char*) directory_buffer+8 );
  4. directory_ident[ strlen(directory_ident) ]= \0;

Now we know the file offset of the lump the size of the lump and the ident of the lump and we make sure to add the null terminator again to avoid buffer overflow. And then we can parse out the lumps simply just fseek and fread :) Done its so simple and seems like there would be more to it, yet when you think of it is very much common sense there really isn’t that many ways to pack data into a file like this! If your interested i’ll be talking about it soon… on how to convert the D_E1M* lumps which are music lumps to compatible midi tracks. And finally on how BSP works for the levels and maybe how to draw them in C/SDL/OpenGL… if you ask i prefer C over C++, yeah when it comes to building this game engine there have been times i though woo having an object would be really nice here but then i realise well no because that’s what the scripting engine is going to do… :) Mainly due to the fact i dont like C++ syntax it feels messy to me and C is just nice and simple :)

Finally I just want to wrap up with saying John Carmack has to be one of my biggest Hero’s in Computer Science along with Migel De Icaza, Linus Torvalds, Various GCC hackers… like Ian Lance Taylor But John Carmack in particular since he made Doom! But also he attended the University of Missouri–Kansas City for two semesters before withdrawing to work as a freelance programmer. Which i think that is brilliant! Since University does not teach students to be serious programmers or technology well very few do anyway and i think its great he had the balls to do that and so did Migel De Icaza and they have both done some of the most amazing things in computer science!

Anyways if your interested in Crules or CrlDoom see: http://crules.org code for crldoom is over @ http://code.redbrain.co.uk/cgit.cgi/crldoom

This is a project i love Classic Doom 3 it was Doom re-implemented ontop of the Doom3 Engine what more do you want!!!

3 comments to Parsing the Doom Wad File

  • So, I was updating my computer to Ubuntu 10.04, and ended up cobbing a bunch of old configuration files together into my new home directory, in an effort to take the best of the previous installations and preserve it in this new, clean install.

    One of the side effects is that Firefox opened to this blog post, probably from when I was reading it in January.

    Just thought I should mention how frickin’ cool a post it is. Long live DOOM! :D

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>