Welcome to Barricane.

We hope you will find what you are looking for.

Event Sourcing with MySQL

Event Sourcing has some advantages when building systems - and especially when debugging them.
See Martin Fowler's page if you are unfamiliar with the concept.

To demonstrate an implementation with MySQL, We will model a simple 'users' class, which has an email address, a password hash, the time of last password change, and the time of last login.

This would traditionally be modelled in MySQL as:

CREATE TABLE `traditional_users` (
    `id` INT(10) NOT NULL AUTO_INCREMENT,
    `email` VARCHAR(50) NOT NULL,
    `passwd_hash` VARCHAR(50) NOT NULL,
    `passwd_changed` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `last_login` TIMESTAMP NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    INDEX `email` (`email`)
)

Each time that a user changed their password, or logged in, the old state would be lost.

Other approaches to keeping historic state include log files and audit tables.

To store the same information in an Event Sourcing system, we need tables to store events, and queries or views to serve up the current state.

The events will be 'user_registered', 'user_password_changed' and 'user_logged_in'.

The schemas for these tables are as follows:

CREATE TABLE `user_registered` (
    `timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `user_id` INT(10) NOT NULL AUTO_INCREMENT,
    `email` VARCHAR(50) NOT NULL,
    `password_hash` VARCHAR(50) NOT NULL,
    PRIMARY KEY (`user_id`)
)

CREATE TABLE `user_password_changed` (
    `pk` INT(10) NOT NULL AUTO_INCREMENT,
    `timestamp` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
    `user_id` INT(10) NULL DEFAULT NULL,
    `new_password_hash` VARCHAR(50) NULL DEFAULT NULL,
    PRIMARY KEY (`pk`),
    INDEX `Index 2` (`user_id`, `pk`),
    CONSTRAINT `FK_user_password_changed_user_registered` 
          FOREIGN KEY (`user_id`) 
          REFERENCES `user_registered` (`user_id`)
)

CREATE TABLE `user_logged_in` (
    `pk` INT(10) NOT NULL AUTO_INCREMENT,
    `timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `user_id` INT(10) NOT NULL,
    PRIMARY KEY (`pk`),
    INDEX `Index 2` (`user_id`, `pk`),
    CONSTRAINT `FK_user_logged_in_user_registered` 
          FOREIGN KEY (`user_id`) 
          REFERENCES `user_registered` (`user_id`)
)

Next we need to be able to find the last event, for a particular user_id, efficiently.

The following form of query does that (for user_logged_in):

SELECT *
FROM user_logged_in u0, 
( SELECT MAX(pk) pk 
  FROM user_logged_in    
  WHERE user_id = ?
) u1 
WHERE u0.pk = u1.pk;

First we'll form a query which uses user_registered as its base, with the password_hash value being superseded by the latest 'user_password_changed' new_password_hash (if any).

SELECT timestamp, user_id, email,
COALESCE(( SELECT new_password_hash FROM user_password_changed u0, 
           ( SELECT MAX(pk) pk 
             FROM user_password_changed     
             WHERE user_id = ?
           ) u1 
           WHERE u0.pk = u1.pk
         ), password_hash) password_hash,
FROM user_registered ur
WHERE ur.user_id = ?

Now we can just add in the latest values for the password_changed and logged_in fields:

SELECT timestamp, user_id, email,
COALESCE(( SELECT new_password_hash FROM user_password_changed u0, 
           ( SELECT MAX(pk) pk 
             FROM user_password_changed     
             WHERE user_id = ?
           ) u1 
           WHERE u0.pk = u1.pk
         ), password_hash) password_hash,
( SELECT timestamp FROM user_password_changed u0, 
  ( SELECT MAX(pk) pk 
    FROM user_password_changed     
    WHERE user_id = ?
  ) u1 
  WHERE u0.pk = u1.pk
) password_changed, 
( SELECT timestamp FROM user_logged_in u0, 
  ( SELECT MAX(pk) pk 
    FROM user_logged_in     
    WHERE user_id = ?
  ) u1 
  WHERE u0.pk = u1.pk
) last_logged_in
FROM user_registered ur
WHERE ur.user_id = ?

Isn't this query the spawn of satan?

No.

If you run 'explain' on it you'll see that three foreign field selects have a note 'Select tables optimized away'.

This means that MySQL is not having the look at the tables in order to find out which record contains the latest event for the user_id (for the fields user_password_changed and user_logged_in).

If is still more expensive than a query on the traditional_user table, but it's not that evil.

Posted on 17 May 2013.

Composibility of obtaining a value.

In programming languages, there are multiple ways of obtaining a value. Some are nearly instant, some take time, or forever. This is a short post on the problems of integrating time (and the outside world) into computer programs.

I have used C to illustrate some of these.

Group 1, from memory, directly or indirectly

int a = C;                 // assignment from a constant 
int a = b;                 // assignment from a variable (on the stack)
int a = *p;                // assignment from a dereferenced pointer
int a = structure.element; // structures, and so on
int a = access(args...)    // assignment from memory via a calculated reference
                           // think of properties from Python, etc. too

Group 2, calculation

int a = calc(args...)      // CPU intensive calculation of a pure function that hogs one or more cores

Group 3a, async IO

function callback(int a) {...}
asyncread(args..., callback); // think NodeJS too

Group 3b, blocking IO

int a = *p;                // if p is mmapped adn will cause a page fault
int ret = blockingread(&a, args...); // causes thread to yield to kernel

Group 3p, polling IO*

int ret = pollingread(&a, args...); // does not block, but may not yeild a new value

(* I am unsure of the validity of including G3p, as it does not yield the sought type a, only a Maybe a - to use Haskell terminology).

Note that many of these methods of obtaining a value look identical, even though they are in diferent groups. This leads to API confusion in many languages. New languages should make things that behave differently look different.

The differences between groups are important, as you can always wrap a method from a more primitive group as a more sophisticated way, but never the reverse.

Program flow constructs, such as if and for apply only to those values outside of G3a.

These groups impact directly on composability of code.

    +-----+-----+-----+-----+-----+-----+
    |     | G1  | G2  | G3a | G3b | G3p |
    +-----+-----+-----+-----+-----+-----+
    | G1  | G1  | G2  | G3a | G3b | G3p |
    | G2  | G2  | G2  | G3a | G3b | G3p |
    | G3a | G3a | G3a | G3a |  X  | G3a |
    | G3b | G3b | G3b |  X  | G3b | G3b |
    | G3p | G3p | G3p | G3a | G3b | G3p |
    +-----+-----+-----+-----+-----+-----+

(X marks composition as impossible.)

Composing blocking and asynchrous operations within one thread yields a situation where no progress on asynchronous composed operations can be made while any IO is blocking in the same thread.

Conclusions

  1. New languages need to think about these issues carefully. I want to write code that is "Group Generic" - i.e. my function will work not just on values, but on asynchronously obtained values too, without any change to my code. (This will take people many times cleverer than I to sort out.)

  2. For the time being, give up on blocking and use libuv.

Help

I need some better terms to categorise the sources of values. Should I consider a page fault cause by accessing mmapped memory differently from a page fault caused by a overloaded machine? I have assumed for simplicity that a machine should not be overloaded and that page faults do not happen on accessing runtime structures in memory.

Further ramblings

There is also the difference between a list of values, and an (asynchronous/blocking?) stream of values to consider. I should be able to write code which is "Group Generic" over both, though the stream implementation may need to be richer in some way.

e.g. For a 'sum' function, intermediate sums of a stream may be required, when the values have paused for some time. (Although that could also be valid for huge lists; think of updating a progress bar.)

P.S.

I am aware of streams in Haskell and Functional Reactive Programming though not much of practical use seems to have made it out yet.

Posted on 30 January 2013.

Optimising MySQL InnoDB

I was having trouble with a MySQL server, pegged at 180% CPU and falling behind on writes by more than 10 minutes each hour.

The fix was to increase innodb_buffer_pool_size from the default 8MB to 2G.

I just needed to add the line:

[mysqld]
innodb_buffer_pool_size = 2G

to /etc/mysql/my.cnf

This reduced the CPU usage from 180% to around 10%.

Posted on 12 November 2012.

Reboot on Kernel Panic

Last week, a server crashed with a kernel panic. It stayed crashed all night.

This seems to be the default behaviour wuth Ubuntu server 12.04.

I did some research. Running the following command will make a machine reboot 15 seconds after a kernel panic.

chris@svr3:~$ echo "kernel.panic = 15" | sudo tee /etc/sysctl.d/10-reboot-on-panic.conf

I've now done this for all my servers.

Kernel panics shouldn't happen. But if they do, I want a speedy reboot, not a screen full of CPU registers.

Posted on 20 July 2012.

Eclipse/CDT/autotools - No rule to make target 'all'. Stop.

When you create an autotools Helloword project in Eclipse (on Ubuntu, at least) it is not ready to build. In order to for Eclipse to be able to build the project, you need to perform the following steps:

$ aclocal

This generates the file aclocal.m4 and adds it to the current directory.

Next, run autoconf:

$ autoconf

After running autoconf you will find the configure script in the current directory. It's important to run aclocal first because automake relies on the contents on configure.in and aclocal.m4.

There are a few files that the GNU standard says must be present in the top-level directory, and if not found Automake will report an error. Enter the following command to create these files:

$ touch AUTHORS NEWS README ChangeLog

Now we can run automake to create Makefile.in. The --add-missing argument copies some boilerplate files from your Automake installation into the current directory.

$ automake --add-missing

$ ./configure

$ ./make

The 'make' command should have built your project for you.

Posted on 12 June 2012.