Coding games in assembly ("machine code" isn't quite accurate but it gives an idea) is viewed in many different lights. On one hand, there are people who will view it as useless and outdated and point you to the nearest
compiler game development system that they've heard of. In other words, they'll tell you to use Unity, Game Maker, Construct 2, or
whatever other buzzwords they've heard of lately.
However, you may be constrained by certain limits. Yes, I can make a game for Windows which fits on a floppy disk and has music and sound and stuff (if you strip away the non-Windows stuff from Parsnip Theory, I suspect it will actually fit, although you may need to remove the server), but it sure won't use Unity, and it sure won't use Game Maker either (barring maybe Game Maker 4.2a if I decide to take the piss).
Either way, it's more fun writing it in 80186 assembly. Pure 8086/8088 assembly is a bit limited for my tastes, and the oldest x86 CPU I have is 80186-compatible, so the 80186 is actually a really good target if you're taking the x86 route.
Or you may be doing it for the challenge. Which brings us to the next group: There are people who are genuinely shocked that it's actually possible to write software in
assembly machine code techno whizz-kid language and will revere you as a god. The upside is that you get praise. The downside is that they'll refuse to listen to you when you try to explain it.
It's not as hard as people think it is.
It's just something you have to try.
But it
is a little bit harder than what you're used to. Yet if you program it well, get a bit of practice, and follow these principles, you'll be very surprised at how quickly things will come together.
It will also make you a better programmer.
... well, it'll teach you how to comment properly, at the very least.
Use a modern assembler and a modern emulator on a modern computer.
That way, you won't be wasting time rebooting, burning media, trying to remember what address "draw_box" is, trying to remember the format of the ModR/M byte, or any of that crap.
Of course, this part is pretty easy. Your whole toolchain is, at the time of writing, smaller than any Unity game ever produced... barring maybe your text editor, shell, terminal emulator, IDE, and all that crap - but let's just assume you have those already. Either way, it'll be smaller than Titanfall. That game has no right to be 15GB. Heck, it's even bigger than the Unity IDE itself!
But enough ranting about how everything is complete and utter bloatware. Let's get on with the other tips.
Comment your code properly.
Make a comment at every logical step ("Calculate video pointer", "Draw pixels", "Return" for example). You will completely and utterly lose your sanity otherwise.
Also, put a comment before every subroutine call in your code to point out the input registers/flags, output registers/flags, and which registers get trashed. Always assume that, when you call your code, the flags will be trashed - and thus, you don't have to note that the flags will be trashed.
If you've done this before, use your old code as much as necessary.
Another way to word this is "I really, really do not want to have to waste another 24 hours getting scrolling working properly on a Sega game console". If you have a good framework, keep it.
Know a very high-level language and use that to assist with tables and calculations and whatnot.
Python is my weapon of choice, but as a VIM user, I also tend to use some of its nicer features to generate at least
some of the tables.
Still, sometimes you need to make a sine table.
Define named constants instead of using magic numbers everywhere.
You're using a modern assembler. If you need to change something like, for example, the maximum number of objects you can have (you probably will), or the address where something is stored (assuming you aren't letting your assembler handle the addresses or if your assembler sucks), it will be far easier if you have a constant you can change.
Don't prematurely optimise your code.
1MHz is a lot faster than you think.
Either way, good code is more important than fast code. You can make it go faster later.
If you
do need to optimise, the general rule is to only optimise the innermost loop. Oh, and comments will help you make sense of it later.
Make lots of things into subroutines.
Not only does this make your code
easier
to manage, it also tends to make it smaller, too! Are you doing
something twice in two different places? See if you can rewrite it as a
subroutine.
Make structs, arrays, and function pointers.
Those of you who don't realise that C doesn't always have a ++ or a # after it may not know what I mean, so I'll explain them.
A
struct (short for "structure") is, well, a structure. It's basically a definition for a block of memory which has certain values in certain places. I'll give you an example in C:
struct obj {
short x, y;
char vx, vy;
char sprite_slot, sprite_tile, sprite_count;
char flags;
char timer_stun;
char timer_shot;
void (*f_tick)(struct obj *);
}
And a similar example with the WLA-DX assembler:
.struct obj
x dw
y dw
vx db
vy db
sprite_slot db
sprite_tile db
sprite_count db
flags db
timer_stun db
timer_shot db
f_tick dw
.endst
An
array (and if you haven't heard of this,
you do not know how to code well, so put your "but I know C#" shades down and listen because this is fundamental) is a usually-fixed-size list of... something. It can be a list of numbers. But it can also be a list of structs.
See that f_tick thing? If that confuses you, then I guess it would be new information for me to remind you that it's called a
function pointer. Now, if you're going to be programming in assembly, chances are you already know what a pointer is. So, of course, this is a pointer which points to a bit of code. Yep.
Here, have some code. This one's for 68000, sorry, because the Z80 equivalent is a bit ugly... and the x86 equivalent is even worse.
(Mind you, the 68000 version is pretty ugly, too.)
tick_object:
; Push registers to stack
movem.l d0-d7\a0-a6, -(a7)
; Call function
movea.l OBJ_f_tick(a0), a1
jsr (a1)
; Pop registers off stack and return
movem.l (a7)+, d0-d7\a0-a6
rts
... actually, that's pretty ugly. Here's the 16-bit x86 version.
tick_object:
; Push registers to stack
pusha
; Call function
mov ax, .fromcall
push ax
jmp [bp + obj.f_tick]
.fromcall:
; Pop registers off stack and return
popa
ret
...ok, that was actually much less ugly than I thought it would be.
Yeah. That's one of the rare cases where x86 looks nicer than 68000.
Anyway, using an array of structs which have function pointers in them will make your life much, much easier.
Speaking of arrays...
Don't write malloc(). Just use arrays, and a subroutine to look for a free slot.
I've done this plenty of times.
Of course, you have to find some way to denote if a slot is unallocated, but once you've done that, it'll save you a
lot of pain!
Using instructions which write immediate values to RAM makes it much easier.
I've done this several times before. I'll point out the ones for several CPUs:
- x86: mov word [address], value ;;| you can use byte, word, dword, qword, although some aren't available in certain CPU modes
- 68000: move.w #value, address ;;| .b, .w, .l can be used, also did I mention this syntax is backwards? Because it's backwards.
- Z80: ld (ix + offset), value ;;| OK, there are limits to doing this. offset is a signed byte, and IX is a 16-bit pointer to somewhere. But if you're modifying a structure, this helps a lot.
For some CPUs, however, you'll need at least two instructions - one to load the value, and one to store it in RAM.
6502,
MIPS, and
ARM are all like this.
If you have local labels, use them.
NASM (x86) has the .label syntax. I'll give an example here:
; Input:
; ES:DI = pointer to video memory for top-left corner
; CX = vertical length of line
; AL = color of the line (of course it's actually spelt "colour" but by convention we have to spell it wrong)
;
draw_vline:
push di
push cx
.lp_y:
stosb
add di, 319
loop .lp_y
pop cx
pop di
ret
In this case, the
.lp_y label is actually
draw_vline.lp_y, but doing stuff this way
really saves you pain.
The WLA-DX multi-CPU assembler (highly recommended!) has the + and - series of labels. These are amazing, especially when you're doing stuff like this (Z80 given as an example):
ld de, SCREEN_WIDTH_BYTES - WIDTH_BYTES
ld c, HEIGHT
--: ld b, WIDTH_BYTES
-: ld a, (ix+0)
ld (hl), a
inc ix
inc hl
djnz -
add hl, de
dec c
jp nz, --
By the way, when I say "don't prematurely optimise"... I kinda did a little. The improvement is minimal, but simple enough that we can get away with it. See that
jp opcode? It takes up 10 cycles regardless of whether it is taken or not, and a
jr takes up 12 cycles when taken, but only 7 when it isn't. Seeing as the branch is taken more than 40% of the time, I'm using
jp here.
Which leads us to another tip.
Knowing your cycle counts and other timings can be helpful for when you do need to optimise.
The cycle times for Z80 are kinda hard to memorise, but the main points are:
- 4 cycles for M1 access (first op OR the byte after a DD/FD, ED, or CB)
- 3 cycles for regular memory access
- 4 cycles for I/O access
- 5 cycles for (IX+dd), (IY+dd) or JR/DJNZ PC+dd calculation (the calculation for JR only takes place if the condition succeeds)
The cycle times for 6502 are easier, but there are still some snags. Main points:
- 2 cycles for first byte of an opcode
- 1 cycle for memory access
- 1 cycle to calculate a 16-bit carry if need be
Often there are other cycles used, though, so be wary!
But this isn't a tutorial on how to calculate cycle timings. Look elsewhere if you want the timings for your CPU. As for x86 cycle timings... I don't even try.
-
Anyway, there's a lot of things that can be talked about, but that should be enough to give get you somewhere.