Raphael S. Carvalho's blog

my space on the internet for random thoughts

Dissecting the ROM BIOS Shipped With QEMU for Fun

I’m currently studying OS development with MIT course 6.828 mostly for fun. I highly recommend it for everyone wanting to have a better understanding of how computer works. It will teach you how an operating system works like xv6 and you will also build your own OS. Check it out here: https://pdos.csail.mit.edu/6.828/2016/schedule.html

After you complete the first lab, you will better understand how hardware initialization is done by firmware BIOS, then how bootloader is loaded, and in turn the operating system.

Earlier IBM PCs couldn’t address more than 1MB of memory, and only the first 640k chunk was actually RAM. The 640k-1MB range was reserved for special use, like mapping devices like VGA display, 16-bit PCI devices, and BIOS ROM.

The layout is more or less as follow:

1
2
3
4
5
6
7
8
9
10
11
12
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000

It gets more complex for 32-bit physical address space. It’s also amazing how the hardware engineers left holes in the physical address space of the machine for backward compatibility.

When computer is turned on, BIOS is mapped at that 64k region reserved to it and control is transferred over to it. I want to better understand how QEMU emulates that. GDB and strace will be my friends in this journey. The former will be used to step through BIOS instructions, and the latter to see what QEMU is doing internally, like memory maps and files opened.

Using strace first

one of the most interesting things in strace output is the following:

1
2
3
4
5
6
access("/usr/share/qemu/bios-256k.bin", R_OK) = 0
open("/usr/share/qemu/bios-256k.bin", O_RDONLY) = 13
lseek(13, 0, SEEK_END)                  = 262144
lseek(13, 0, SEEK_SET)                  = 0
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 262144) = 262144
close(13)                               = 0

It’s very interesting that file storing BIOS is 256k in size, even though area reserved for it is 64k. I suppose that only some part of it is actually mapped. There are also some interesting files such as /usr/share/qemu/kvmvapic.bin and /usr/share/qemu/vgabios-stdvga.bin which are opened and read. They all will be used to emulate the layout figure shown above.

QEMU also reports the following through ‘info roms’:

1
2
3
4
5
fw=genroms/kvmvapic.bin size=0x002400 name="kvmvapic.bin"
addr=00000000fffc0000 size=0x040000 mem=rom name="bios-256k.bin"
/rom@etc/acpi/tables size=0x200000 name="etc/acpi/tables"
/rom@etc/table-loader size=0x001000 name="etc/table-loader"
/rom@etc/acpi/rsdp size=0x000024 name="etc/acpi/rsdp"

Actual dissection of BIOS using GDB

GDB can remotely connect to QEMU instance which waits for it, and that’s what I do here. Let’s get started…

QEMU starts executing at physical address 0x000FF000 and in real mode, which is at the very top of the area reserved for BIOS. CS and IP registers are 0xF000 and 0xFFF0, respectively. CS and IP were both used for addressing memory because registers were only 16 bits, thus the segmented addressing mode which multiples segment by 16 and adds offset. So PC is able to address up to 1MB of memory. The problem is that different values for segment and offset may refer to same physical memory which may easily lead to bugs. That was later addressed in protected mode which had bigger registers and MMU at its disposal.

We have a very basic idea of what BIOS will accomplish which is memory check, device initialization, and lately find a bootable sector to load into a predefined address (0x7C00) for which it will transfer control to. The purpose of this article is to go into the very specific details, because we already know the higher-level overview.

Dissecting…

I’ll only show the most important instructions because this article would get long and boring otherwise. On top of each instruction, there will be a comment to explain it and also possibly share something interesting.

http://web.archive.org/web/20040304063834/http://members.iweb.net.au:80/~pstorr/pcbook/book2/ioassign.htm is used as a reference for I/O addresses. I’m happy wayback machine allows me to access that great website again.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# PC starts at 0x0FF000 and jumps to physical address 0xFE05B. This way BIOS
# assumes control after power up or system restart:
[f000:fff0]    0xffff0:   ljmp   $0xf000,$0xe05b

# Cleans stack segment register
[f000:e066]    0xfe066:   xor    %dx,%dx
[f000:e068]    0xfe068:   mov    %dx,%ss
# Sets up stack point to 0x7000, meaning stack address is actually 0x7000
[f000:e06a]    0xfe06a:   mov    $0x7000,%esp

# 0xf3513 is probably an address which is moved to EDX, let's see what BIOS
# want to achieve with that...
[f000:e070]    0xfe070:   mov    $0xf3513,%edx

# Interrupt flag is reset
[f000:d15d]    0xfd15d:   cli
# So is direction flag:
[f000:d15e]    0xfd15e:   cld 

# Select register 0xF from CMOS with NMI disabled. The lower order 7 bits are
# used to select register, and the most significant bit determines whether to
# disable NMI. Register 0xF is CMOS Shutdown Status.
[f000:d15f]    0xfd15f:   mov    $0x8f,%eax
[f000:d165]    0xfd165:   out    %al,$0x70
# Now we will get CMOS Shutdown Status 
[f000:d167]    0xfd167:   in     $0x71,%al

# A20 control via System Control Port A. Used to control memory 1MB barrier.
# Bit 1 (rw): 0: disable A20, 1: enable A20
[f000:d169]    0xfd169:   in     $0x92,%al
[f000:d16b]    0xfd16b:   or     $0x2,%al
[f000:d16d]    0xfd16d:   out    %al,$0x92

# Load Global/Interrupt Descriptor Table Register
# Sorry I'm tired. I won't go into details of what exactly is stored in tables
[f000:d16f]    0xfd16f:   lidtw  %cs:0x6af8
[f000:d175]    0xfd175:   lgdtw  %cs:0x6ab8

# Wow, it looks like protected mode was set up by BIOS. Did I get that right?
[f000:d17b]    0xfd17b:   mov    %cr0,%eax
[f000:d17e]    0xfd17e:   or     $0x1,%eax
[f000:d182]    0xfd182:   mov    %eax,%cr0

# Looks like so. GDT descriptor is now used for addressing memory.
[f000:d185]    0xfd185:   ljmpl  $0x8,$0xfd18d
The target architecture is assumed to be i386
# And registers for data will use its own GDT descriptor:
=> 0xfd18d:    mov    $0x10,%eax
=> 0xfd192:    mov    %eax,%ds
=> 0xfd194:    mov    %eax,%es
=> 0xfd196:    mov    %eax,%ss
=> 0xfd198:    mov    %eax,%fs
=> 0xfd19a:    mov    %eax,%gs

# Now an indirect jump for value we moved to EDX in the beginning
=> 0xfd19e:    jmp    *%edx
# Setting up stack frame according to x86 calling convention
=> 0xf3513:    push   %ebx
=> 0xf3514:    sub    $0x20,%esp

# Pushing two values as arguments for a function
# Let use variables for them: A=0xf5b6c and B=0xf430b
=> 0xf3517:    push   $0xf5b6c
=> 0xf351c:    push   $0xf430b
=> 0xf3521:    call   0xf096f
# Loading address of A into ecx, and value of B into edx
=> 0xf096f:    lea    0x8(%esp),%ecx
=> 0xf0973:    mov    0x4(%esp),%edx
=> 0xf0977:    mov    $0xf5b68,%eax
# If I understood it correctly, it will now initialize PCI devices...

# and it goes on until bootloader takes over...

I’m really tired now, and the article will also be boring if I keep going. I think it’s better to stop at this point. As we know, the BIOS will also keep initializing things and afterwards will load boot sector into 0x7c00 and start executing it. I hope you had lots of fun reading this article. My main idea is to have people interested in operating system engineering. I’ll also try to keep posting as I go through the OS course offered by MIT.

Bye for now!

Programming and Classical Music

There’s some magic about classical music that helps me as a programmer. Well, I’m definitely not the best programmer out there, so when I find it hard to come up with a creative solution, I listen to classical music over and over again. I leave my gratitude here for Chopin, Mozart, Dvorak, Paganini, Vivaldi, Beethoven and so many others composers. I’m so grateful for all those pieces you guys have left behind. When I’m going through a creativity crisis, I listen to their songs. I also like to listen to them while working. It can be said that we all need to be in a good mood to produce quality work. That’s another thing that classical music plays an important role in my life. Whatever my mood is, I listen to classical music because it impacts my mind in a good manner. It’s almost like an infinite potion of inspiration that you can drink from.

People have different tastes, so I was thinking if your favorite genre can have the same effect on your life. Classical rock, instrumental rock, etc, could work too. But I don’t think shitty, noisy pop music will do it. I actually don’t think I can find the same level of magic in Beethoven symphonies, Vivaldi’s four seasons, Paganini’s caprices, etc, in some other genre. The complexity of the melody, the depth, the harmony in those musics are outstanding. I’m having a really hard time here to describe it, i.e. how classical music impacts almost every single aspect of my life. It helps me boosting my creativity and also keeping focused on the most important things, it makes me happy when I’m sad, and also the other way around. Yes, the latter is true, have you ever tried listening to Chopin and Beethoven’s sonatas?

If you aren’t into classical music, I leave here my invitation for you to try it out. I will help you get started by leaving some links below to some musics from the composers I like the most. These are just the tip of the iceberg though.

I’ll start with a magical performance by the Brazilian musician Arthur Moreira Lima which is considered one of the best Chopin interpreters: IMAGE ALT TEXT

Now Beethoven’s moonlight sonata played by the brilliant piano player Valentina Lisitsa: IMAGE ALT TEXT

I really hope you guys will give classical music a chance. I didn’t listen to it when I was a teenager. Now I love it, it’s my favorite genre by far. Anyone out there that share the same feelings? Not necessarily due to classical music, but whichever genre you like the most.

C++ Framework for Writing Servers Which Are as Fast as Usain Bolt

Prerequisite: C++

What if I tell you that there’s a framework out there that allows you to write servers that are as fast as the fastest man alive? Let me introduce you to Seastar. It’s a framework written in modern C++ that provides its user with a set of tools to extract the most from modern hardware.

Look at the chart below which compares Seastar memcached vs stock memcached: alt text

And it’s not only throughput. It has also proven to improve P99 latency considerably. Other example of applications that benefit from Seastar are ScyllaDB, a distributed database compatible with Apache Cassandra, and Pedis, a drop-in replacement for Redis.

What makes Seastar so fast?

Let’s go through some of its features and I expect that you will understand its power by the end of the list. Here we go:

1. One of its most interesting features is the shared-nothing architecture. What exactly is that? Basically, there will be only one thread for each core (or shard[1] in Seastar terminology) available to the application. And it also means that each thread will have its own set of resources (files, memory, etc) that will not be shared. By doing that, we eliminate the need to use locks because resources are no longer shared by threads. Communication between threads must be done explicitly through message passing, or else, we would need some locking mechanism.

[1]: In seastar, a shard is a thread assigned to a CPU core that acts like an isolated machine.

A common multi-threaded application usually looks like the left picture because you have threads accessing shared resources, whereas a Seastar application will look like the right picture because of its shared-nothing design, look:

images

2. Each Seastar thread will have its own scheduler for small asynchronous tasks (usually stored in std::function), which is called The Reactor. It’s important to keep in mind that all tasks should be asynchronous. That’s because if a thread blocks (waiting for I/O, for example), the reactor will not be able to run other tasks waiting to run.

Let me throw at you another example. What happens if a syscall is called which involves blocking the calling thread until the requested resource (for example, sys_read) is satisfied? The CPU would sit idle while waiting for I/O. From the perspective of a server, it would mean not handling any requests. So when you’re writing code for Seastar, you must make sure that you only use asynchronous API either provided by Linux or Seastar. All I/O in Seastar is done through asynchronous mechanisms provided by Linux such as aio.

Seastar API is very well documented, and it can be found here: http://docs.seastar-project.org/master/index.html

3. Because of the issue mentioned above, Seastar needs some mechanism to make it easier the task of writing fully-asynchronous code. And that’s done using the concept of futures and promises.

Let me explain how future is used in Seastar with code. Let’s say that you want to call a function to sleep for 1 second. Usually, you would only call a sleep function with 1 as parameter, and then the calling thread will sleep for 1 second and continue from when it left off. In Seastar, you shouldn’t block the current thread due to performance reasons, but you can block the task (also known as fiber). So you’d need to call a Seastar function which promises to wake up the task after 1 second.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include "core/app-template.hh"
#include "core/sleep.hh"
#include <iostream>

int main(int argc, char** argv) {
    app_template app;
    app.run(argc, argv, [] {
        std::cout << "Sleeping... " << std::flush;
        using namespace std::chrono_literals;
        return sleep(1s).then([] {
            std::cout << "Done.\n";
        });
    });
}

All functions that need to wait for something (like data from disk or network, time, etc) return a future type. Those futures can be consumed using future::then(), which takes a function that will run when the time has come. In the example above, sleep() returns a future type, which will be waited using future::then(). So the program above should print ‘Done.’ after 1 second. The chain of functions (also known as continuations) can grow as you want. So after the print, the program could sleep again for N seconds, write to a file, send a command over the network, whatever you want, all asynchronously! Please take a look here to know more about futures and promises in Seastar.

The list is getting long, so I’ll tell you quickly what else Seastar provides:

  • DPDK support
  • userspace TCP/IP stack
  • userspace I/O scheduler for fairness among components in the same thread
  • and much more!

Getting started

First of all, you should fork the project, which can be found here. Take a look at the README file for instructions on how to install deps and compile the project.

If you’re interested in knowing more about Seastar, I’d advise you to read this tutorial written by Nadav Har'El and Avi Kivity. If you’re willing to delve into some Seastar apps, please go to: https://github.com/scylladb/seastar/tree/master/apps

If you need help, send an e-mail to the project’s mailing list: seastar-dev@googlegroups.com

That’s it…

Seastar is very complex and it’s very hard to cover many of its aspects in a single article. At least, I hope I succeeded to help people understand what Seastar is about and how to get started. I intend to write more articles about it in the future covering more real-world examples. For example, how to write a simple server.

Thank you for your time and stay tuned!

Handling Game States

Prerequisite: basic C++ and game programming knowledge

When you start any game, you expect to see a loading screen, followed by the main menu which has a button that allows you to play the game. When you start playing the game, it’s also expected that you’ll be able to go back to main menu and possibly pause and resume the game. All these different stages of the game are known as game states.

Handling game states is a very difficult task, especially to newbies to game programming like myself. Today, I was looking for an efficient way to switch back and forth between all available states of my simple game.

The simplest way to do it would be using a switch statement, as follow:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
enum game_states {
    MENU,
    GAME,
    ...
}

void game::update() {
    switch(_current_game_state) {
        case MENU:
            // show game options
            // update current state to PLAY if user clicks on play button.
            ...
            break;
        case PLAY:
            ...
    }
}

That’s indeed very simple, but it would be a nightmare to maintain this code when the number of states increases considerably. It turned out that a Finite State Machine (FSM) is exactly what I was looking for. It’s said that FSM is an abstract machine that can be in exactly one of a finite number of states at any given time.

To implement a state machine that handle different types of game states, I took advantage of polymorphism. Each game state will derive from an abstract class called game_state; follow its definition:

1
2
3
4
5
6
class game_state {
public:
    virtual void on_enter() = 0;
    virtual void on_exit() = 0;
    virtual void update() = 0;
};

The first two methods will be used for loading and cleaning the game state, respectively. game_state::update() will be used for a given state to react to user’s input and possibly switch to another state. For example, when an user clicks on play button, the state machine will switch from menu to play state.

Now our state machine will be able to work with all different types of game states in the same way. To make the transition between stages more efficient, the machine will work in the same way a stack does. For example, a new state will be pushed to the back of container storing the game states. And more importantly, the active state is the one that was last pushed to the container. That’s how my implementation of game state machine turned out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class game_state_machine {
    std::vector<std::unique_ptr<game_state>> _game_states;
public:
    void push(std::unique_ptr<game_state> state) {
        state->on_enter();
        _game_states.push_back(std::move(state));
    }

    void pop() {
        if (!_game_states.empty()) {
            _game_states.back()->on_exit();
            _game_states.pop_back();
        }
    }

    void update() {
        if(!_game_states.empty()) {
            _game_states.back()->update();
        }
    }
};

Note that game_state_machine::update() will only call update on behalf of the active state, and that’s essential for the machine to work as expected.

I showed the implementation of abstract class for game state, but it’s also important to understand how an actual game state could be implemented by deriving from it. Check it out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class menu : public game_state {
public:
    void on_enter() override {
        // load menu sprites
        ...
    }
    void on_exit() override {
        // clean up goes here
        ...
    }

    void update() override {
        // if user clicked on play button, switch to play state.
        if ( ... ) {
            ...
            game_state_machine.push(std::make_unique<play>());
        }
    }
};

Very easy, no?! If we’re in play state, and want to go back to menu, all we need to do is to call game_state_machine::pop().

This was the most efficient way I found to handle game states in my own game. If you know a better way to do it, please leave a comment!

PS: the comment section only shows up when you click on the blog post’s title.

See you,

Ready to Have Fun With Bits?!

Hi everyone. I’m really glad that I finally launched this blog. Fun With Bits.

Special thanks goes to Peteris Krumins who shared some invaluable insights about blogging.

TL;DR: Bla bla bla about who I am. In this blog, I’ll talk about software development techniques, low-level hacks, game development, kernel, among other interesting topics I feel it’s worth sharing with my readers.

I don’t expect people to know me prior to reading my blog, so I’m going to introduce myself.

I think that my father (unfortunately, he’s no longer around us) is the main reason behind my interest in computers. When I was about 10 years old, he taught me how to use commands to interact with MS DOS. I remember the most important commands were to edit texts, execute programs, and navigate through the file system. I don’t know how many times I messed up with the operating system, making impossible to boot it. I had a lot of fun doing it though. After that, I was able to create my own websites using HTML and CSS. I didn’t even know what programming actually was at that time. I was just curious and wanted to do interesting things with my computer.

I actually started with programming after I got interested in creating my own alternative server for a MMORPG called Tibia. Of course, I didn’t create the server, but the experience made me learn how to host a server, manage my website and consequently a database which stored players' data, and also how to write scripts using LUA programming language for custom systems, like quests and spells.

At that time, I wasn’t even thinking of computer science as a career. I was just having fun with bits. Pun intended :-) But I changed my mind and joined a local university for a computer science degree. It helped me a lot because I wanted to be the best programmer among my peers. I worked really hard. A friend of mine a.k.a. kov introduced me to the open source world. He told me about the job opportunities I could get if I started helping relevant open source projects out there. IIRC, Linux kernel was the first open source project I contributed to. The change was to slightly improve the PID allocation code. I will not lie. It was extremely hard to find something to do, but the experience taught me a lot. I was really obsessed with kernel. I often found myself whispering the word ‘kernel’ while showing or walking down the streets. I wanted to better understand how computers worked, from boot to application initialization.

As I was gaining more experience, I contributed to other projects until I got my first job opportunity at an israeli startup called Cloudius Systems. It was working on creation of a project called OSv. In OSv, I worked with file systems. Mostly on improving ZFS support. I’m working for the same company, but now on creation of a distributed database called ScyllaDB.

I think I’m considerably better than myself of 5 years ago, but there’s a long way to go until I can say I’m really good with computers. By the time being, I’ll keep repeating this ZEN poem to myself: “To follow the path, look to the master, follow the master, walk with the master, see through the master, become the master.”

I’ll do my best to share interesting stuff with all of you.

Stay tuned!