mysql-logo3 So, the other week, I migrated all my sites over to a new server. This was accomplished with minimum fuss, using lots of rsync magic and juggling of DNS ttls.

The part of the migration I imagined would be the most complicated, moving several tens of MySQL databases running on the old server to the new, turned out to be pretty straightforward, and essentially completed with one command. I thought others might find it handy to know how…

First, I brought down Apache on both servers, so I could be sure that nobody was going to try and write to the database. This may not really be required, since mysqldump can handle dumping live databases consistently, for MyISAM at least, but in never hurts to be paranoid.

Next, I needed to move all the databases, together with their access permissions and users (so the full MySQL schema) to the other server. One way to do that is to copy the whole /var/lib/mysql directory over. However, I had a lot of cruft in there (old bin-logs etc), plus there were a number of articles suggesting that the straight binary copy had a number of issues, especially for mixed storage engine environments. So, I opted for the mysqldump method.

Traditionally, this takes a lot of SCPing. Here’s how to do it with one command, using the magic of Unix pipes:

mysqldump -u root -pPASSWORD --all-databases | ssh USER@NEW.HOST.COM 'cat - | mysql -u root -pPASSWORD'

Boom. This ran surprisingly quickly for me, and you can of course just as easily use this method to transfer a single database.

Three gotchas:

  • If the link dies, you need to start again, so don’t do this over a flakey connection, and I suggest you run the command in a screen if the first server isn’t localhost.
  • You need to restart the database server on the target machine for the new users and privileges to come into effect.
  • On Debian, you may see an error along the lines of:

    Got error: 1045: Access denied for user ‘debian-sys-maint’@’localhost’ (using password: YES) when trying to connect

    Fix this by executing the command:

    GRANT ALL PRIVILEGES ON *.* TO 'debian-sys-maint'@'localhost' IDENTIFIED BY 'THEPASSWORD' WITH GRANT OPTION;

    Where THEPASSWORD is the password found in /etc/mysql/debian.cnf

One final note: this will only work if both servers are the same major version number. I was moving between two Debian 6.0 installs, so YMMV.

Happy Easter!

If you’re anything like me, you often find yourself needing to add create and modified timestamps to data stored in a MySQL database. Typically, I’d do this programmatically from within the application without giving it a second thought, wrapping up time(); in an SQL statement and firing it over.

Recently, while doing some development work for one of my clients, I was given the requirement to add Created and Modified timestamps to a whole bunch of existing data tables. These tables were referenced by hundreds of different MySQL queries, and to complicate matters further, we were in the middle of migrating the code over to a new database library.

I didn’t much fancy rewriting all of the SQL queries, potentially twice, just to add a couple of timestamps, so modifying code wasn’t much of an option. Thankfully, a couple of MySQL’s internal features came to the rescue.

Adding a ModifiedTime

Adding a modified timestamp to a table is the most straight forward. All your have to do is create the field of type TIMESTAMP, and by default, MySQL will automatically update the field when the row is modified.

There are a couple of things to be aware of:

  • While you can have multiple TIMESTAMP fields in a row, only one of these can be automatically updated with the current time on update.
  • If your UPDATE query contains a value for your ModifiedTime field, this value will be used.

So, to add your modified timestamp field to an existing table, all you need is:

ALTER TABLE my_table ADD ModifiedTime TIMESTAMP;

Adding a CreatedTime

Adding a CreateTime value is a little more involved.

On the latest versions of MySQL it is apparently possible to create a DateTime field with a default value of CURRENT_TIMESTAMP. This wasn’t an option for me as I was having to support a somewhat older version, besides, even on the newer versions of MySQL it is not possible to have more than one field using CURRENT_TIMESTAMP, which of course we are in order to get ModifiedTime working.

So, in order to get a created timestamp, firstly we must add a DATETIME field to the table.

ALTER TABLE my_table ADD CreatedTime datetime NOT NULL;

Note, that this must be created as NOT NULL in order for the next part to work (this is because setting NOT NULL forces an automatic all zeros default).

Next, we must create a trigger, which will automatically be fired when we insert a value into our table and set the created timestamp.

DELIMITER //
DROP TRIGGER IF EXISTS `my_table_insert_trigger`//
CREATE TRIGGER `my_table_insert_trigger`
BEFORE INSERT ON `my_table`
FOR EACH ROW
BEGIN
IF NEW.CreatedTime = '0000-00-00 00:00:00' THEN
SET NEW.CreatedTime = NOW();
END IF;
END;//
DELIMITER ;

Now, when you insert a value into the table, this trigger will fire and, if you’ve not provided a CreatedTime field in your insert query, it will be set to the current time stamp.

Conclusion

Since all the queries in the application code specified the columns they were updating on insert, I was able to use this method to add created and modified time stamp fields to all the existing object tables in the database, without needing to modify any of the existing application code.

This simplified my life greatly, but also suggests to me that this method might be somewhat more efficient than the methods I’d previously used.

By default, the standard LAMP (Linux Apache Mysql Php/Perl/Python) stack doesn’t come particularly well optimised for handling more than a trivial amount of load. For most people this isn’t a problem, either they’re running on a large enough server or their traffic is at a level that they never hit against the limits.

Anyway, I’ve hit against these limits on a number of occasions now, and while there are many good articles out there on the subject, I thought I’d write down my notes. For my own sake as much as anything else…

Apache

Apache’s default configuration on most Linux distributions is not the most helpful, and you’re goal here is to do everything possible to avoid the server having to hit the swap and start thrashing.

  • MaxClients – The important one. If this is too high, apache will merrily spawn new servers to handle new requests, which is great until the server runs out of memory and dies. Rule of thumb:

    MaxClients = (Memory - other running stuff) / average size of apache process.

    If you’re serving dynamic PHP pages or pull a lot of data from databases etc the amount of memory a process takes up can quickly balloon to a very large value – sometimes as much as 15-20mb in size. Over time all running Apache processes will be the size of your largest script.

  • MaxRequestsPerChild – Setting this to a non-zero value will cause these large spawned processes to eventually die and free their memory. Generally this is a good thing, but set the value fairly high, say a few thousand.
  • KeepAliveTimeout – By default, apache keeps connections open for 15 seconds waiting for subsequent connections from the same client. This can cause processes to sit around, eating up memory and resources which could be used for incoming requests.
  • KeepAlive – If your average number of requests from different IP addresses is greater than the value of MaxClients (as it is in most typical thundering herd slashdottings), strongly consider turning this off.

Caching

  • SquidSquid Reverse Proxy sits on your server and caches requests, turning expensive dynamic pages into simple static ones, meaning that at periods of high load, requests never need to touch apache. Configuration seems complex at first, but all that is really required is to run apache on a different port (say 8080), run squid on port 80 and configure apache as a caching peer, e.g.


    http_port 80 accel defaultsite=www.mysite.com vhost
    cache_peer 127.0.0.1 parent 81 0 no-query originserver login=PASS name=myAccel

    One gotcha I found is that you have to name domains you’ll accept proxying for, otherwise you’ll get a bunch of Access Denied errors, meaning that in a vhost environment with multiple domains this can be a bit fiddly.

    A workaround is to specify an ACL with the toplevel domains specified, e.g.

    acl our_sites dstdomain .uk .com .net .org

    http_access allow our_sites
    cache_peer_access myAccel allow our_sites

  • PHP code cache – Opcode caching can boost performance by caching compiled PHP. There are a number out there, but I use xcache, purely because it was easily apt-gettable.

PHP

It goes without saying that you’d probably want to make your website code as optimal as possible, but don’t spend too much energy over this – there are lower hanging fruit, and as a rule of thumb memory and CPU is cheap when compared to developer resources.

That said, PHP is full of happy little gotchas, so…

  • Chunk output – If your script makes use of output buffering (which Elgg does, and a number of other frameworks do too), be sure that when you finally echo the buffer you do it in chunks.

    Turns out (and this bit us on the bum when building Elgg) there is a bug/feature/interaction between Apache and PHP (some internal buffer that gets burst or something) which can add multiple seconds onto a page delivery if you attempt to output large blocks of data all at once.

  • Avoid calling array_merge in a loop – When profiling Elgg some time ago I discovered that array_merge was (and I believe still is) horrifically expensive. The function does a lot of validation which in most cases isn’t necessary and calling it in a loop is ruinous. Consider using the “+” operator instead.
  • ProfileProfile your code using x-debug, find out where the bottlenecks are, you’d be surprised what is expensive and what isn’t (see the previous point).

Non-exclusive list, hope it helps!