I’m currently studying OS development with MIT course 6.828 mostly for fun. I highly recommend it for everyone wanting to have a better understanding of how computer works. It will teach you how an operating system works like xv6 and you will also build your own OS. Check it out here: https://pdos.csail.mit.edu/6.828/2016/schedule.html
After you complete the first lab, you will better understand how hardware initialization is done by firmware BIOS, then how bootloader is loaded, and in turn the operating system.
Earlier IBM PCs couldn’t address more than 1MB of memory, and only the first 640k chunk was actually RAM. The 640k-1MB range was reserved for special use, like mapping devices like VGA display, 16-bit PCI devices, and BIOS ROM.
The layout is more or less as follow:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
It gets more complex for 32-bit physical address space. It’s also amazing how the hardware engineers left holes in the physical address space of the machine for backward compatibility.
When computer is turned on, BIOS is mapped at that 64k region reserved to it and control is transferred over to it. I want to better understand how QEMU emulates that. GDB and strace will be my friends in this journey. The former will be used to step through BIOS instructions, and the latter to see what QEMU is doing internally, like memory maps and files opened.
Using strace first
one of the most interesting things in strace output is the following:
1 2 3 4 5 6 |
|
It’s very interesting that file storing BIOS is 256k in size, even though area reserved for it is 64k. I suppose that only some part of it is actually mapped. There are also some interesting files such as /usr/share/qemu/kvmvapic.bin and /usr/share/qemu/vgabios-stdvga.bin which are opened and read. They all will be used to emulate the layout figure shown above.
QEMU also reports the following through ‘info roms’:
1 2 3 4 5 |
|
Actual dissection of BIOS using GDB
GDB can remotely connect to QEMU instance which waits for it, and that’s what I do here. Let’s get started…
QEMU starts executing at physical address 0x000FF000 and in real mode, which is at the very top of the area reserved for BIOS. CS and IP registers are 0xF000 and 0xFFF0, respectively. CS and IP were both used for addressing memory because registers were only 16 bits, thus the segmented addressing mode which multiples segment by 16 and adds offset. So PC is able to address up to 1MB of memory. The problem is that different values for segment and offset may refer to same physical memory which may easily lead to bugs. That was later addressed in protected mode which had bigger registers and MMU at its disposal.
We have a very basic idea of what BIOS will accomplish which is memory check, device initialization, and lately find a bootable sector to load into a predefined address (0x7C00) for which it will transfer control to. The purpose of this article is to go into the very specific details, because we already know the higher-level overview.
Dissecting…
I’ll only show the most important instructions because this article would get long and boring otherwise. On top of each instruction, there will be a comment to explain it and also possibly share something interesting.
http://web.archive.org/web/20040304063834/http://members.iweb.net.au:80/~pstorr/pcbook/book2/ioassign.htm is used as a reference for I/O addresses. I’m happy wayback machine allows me to access that great website again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
I’m really tired now, and the article will also be boring if I keep going. I think it’s better to stop at this point. As we know, the BIOS will also keep initializing things and afterwards will load boot sector into 0x7c00 and start executing it. I hope you had lots of fun reading this article. My main idea is to have people interested in operating system engineering. I’ll also try to keep posting as I go through the OS course offered by MIT.
Bye for now!