Thursday, 13 January 2011

First post! (or: refactoring hell)

So today, I had a quick look at how difficult it would be to get the pref files referencing monsters by name rather than by number, as part of the general effort to remove indexes from the game and make the world friendlier for any budding tileset designers.

The basic sketch should be: make prefs.c output the right file format, convert all the graphics edit files, add a function to get monster race given a name, rename duplicate monsters, and done. Shouldn't take more than an hour.

First step: edit prefs.c to output R: lines with monster names on.

All looks good... oh no it doesn't. I notice lines like:
R:E , the Dark Elf:17:104
That's rubbish, I cry. What's happening here? Well, there's a code module in Angband (to the extent that Angband has code modules) called xchars, which takes strings like "E['o]l the Dark Elf" and turns them either "Eol the Dark Elf" or "Eól the Dark Elf" depending on your platform can support and maybe other factors. In this case, it seems to have failed to do anything useful at all and just be outputting blank spaces.

Why? I do a bit of digging and it turns out that xchars is used when reading anything from any file, so that the monster's name when it's read in from the edit file in the form E['o]l is saved in memory as Eól, in a now-defunct character encoding known as Latin-1.

I assume this has been done to prevent having to make any substantial code changes while gaining a fancy new display feature. A bit like every other feature for the past two decades, then. It would make a lot more sense if the game kept strings in the form they were read in as (with the weird [![?] stuff) and the display code was changed so that some display characters could take up more than one byte in memory.

That would have been the long-sighted, intelligent thing to do that didn't involve just adding another low-level hack that permeates the code. For example, if that had happened, we could make UTF-8 the default encoding of the game with very little difficulty other than wrapping our heads around the Unicode spec and implementing it in C. As it happens, now we have to do that and rewrite the mess that is xchars.


So, in finest Angband tradition, it turns out there's a quick hack around this problem, which is to call xstr_trans(buf, ASCII). That converts a string, in-place, to a new encoding... which presumably may result in a string that is longer than the original string, and since xstr_trans() doesn't take a size for 'buf', could cause overflow and memory bugs. Yay! (There's actually a bare strcpy() in there, I'm not sure how I missed that.)

Looking at the function and the comments above it, to be safe, we need a buffer 1024 bytes in length. OK. We can deal with this if we're strong.

First step, try two: Add xstr_trans() call.

Oh, now the output looks like this:
R:Eol, the Dark Elf:149:214
So we've converted from Latin-1 to ASCII, but not to the 'canonical form' stored in text files. In fact, there is no code to translate from Latin-1 or whatever weird pseudo-native format the game wants to use internally back into the canonical platform-agnostic form that it should have in data files.


So now there are two choices:
  1. Add a function that can convert back, temporarily working around the hack and the other hack with a less smelly hack, all in the name of the greater good
  2. Fix xchars to happen on display, not on processing.
This is why refactoring Angband takes years. Everything is so entangled that you lose interest before you get to the bottom of the tangle. Or, you ignore the tangle and build stuff on top of it and hope it works.

Also, this lovely comment in x-char.h:
* x-char.c function declarations are located in externs.h.
* x-char-c tables are loctaed in tables.c
What's the point, really?

1 comment:

  1. What we really need, then, is someone to replace xchar with Unicode. I've done my bit (ie introduce some code so unbearable that people with finer coding sensibilities have to fix it :)

    A couple of blog comments, too:
    1. This should be advertised to the Angband community. I stumbled on it through narcissism.
    2. The links are invisible in my browser (Firefox on OSX).