Recently, I moved the Elgg Multisite project over to Github.

The master of this branch is targeted at the 1.8 branch of Elgg, however, a number of you have requested that the 1.7.x branch be maintained to support the continuation of the 1.7 branch of Elgg.

So, as requested I’ve created a 1.7 branch of Elgg Multisite from the original archive so that the 1.7 and 1.8 branches can be maintained in parallel. I’ve also cherry picked a few minor mods from the 1.8 branch over as well.

Go play!

» Visit the project branch on Github…

By default, the standard LAMP (Linux Apache Mysql Php/Perl/Python) stack doesn’t come particularly well optimised for handling more than a trivial amount of load. For most people this isn’t a problem, either they’re running on a large enough server or their traffic is at a level that they never hit against the limits.

Anyway, I’ve hit against these limits on a number of occasions now, and while there are many good articles out there on the subject, I thought I’d write down my notes. For my own sake as much as anything else…

Apache

Apache’s default configuration on most Linux distributions is not the most helpful, and you’re goal here is to do everything possible to avoid the server having to hit the swap and start thrashing.

  • MaxClients – The important one. If this is too high, apache will merrily spawn new servers to handle new requests, which is great until the server runs out of memory and dies. Rule of thumb:

    MaxClients = (Memory - other running stuff) / average size of apache process.

    If you’re serving dynamic PHP pages or pull a lot of data from databases etc the amount of memory a process takes up can quickly balloon to a very large value – sometimes as much as 15-20mb in size. Over time all running Apache processes will be the size of your largest script.

  • MaxRequestsPerChild – Setting this to a non-zero value will cause these large spawned processes to eventually die and free their memory. Generally this is a good thing, but set the value fairly high, say a few thousand.
  • KeepAliveTimeout – By default, apache keeps connections open for 15 seconds waiting for subsequent connections from the same client. This can cause processes to sit around, eating up memory and resources which could be used for incoming requests.
  • KeepAlive – If your average number of requests from different IP addresses is greater than the value of MaxClients (as it is in most typical thundering herd slashdottings), strongly consider turning this off.

Caching

  • SquidSquid Reverse Proxy sits on your server and caches requests, turning expensive dynamic pages into simple static ones, meaning that at periods of high load, requests never need to touch apache. Configuration seems complex at first, but all that is really required is to run apache on a different port (say 8080), run squid on port 80 and configure apache as a caching peer, e.g.


    http_port 80 accel defaultsite=www.mysite.com vhost
    cache_peer 127.0.0.1 parent 81 0 no-query originserver login=PASS name=myAccel

    One gotcha I found is that you have to name domains you’ll accept proxying for, otherwise you’ll get a bunch of Access Denied errors, meaning that in a vhost environment with multiple domains this can be a bit fiddly.

    A workaround is to specify an ACL with the toplevel domains specified, e.g.

    acl our_sites dstdomain .uk .com .net .org

    http_access allow our_sites
    cache_peer_access myAccel allow our_sites

  • PHP code cache – Opcode caching can boost performance by caching compiled PHP. There are a number out there, but I use xcache, purely because it was easily apt-gettable.

PHP

It goes without saying that you’d probably want to make your website code as optimal as possible, but don’t spend too much energy over this – there are lower hanging fruit, and as a rule of thumb memory and CPU is cheap when compared to developer resources.

That said, PHP is full of happy little gotchas, so…

  • Chunk output – If your script makes use of output buffering (which Elgg does, and a number of other frameworks do too), be sure that when you finally echo the buffer you do it in chunks.

    Turns out (and this bit us on the bum when building Elgg) there is a bug/feature/interaction between Apache and PHP (some internal buffer that gets burst or something) which can add multiple seconds onto a page delivery if you attempt to output large blocks of data all at once.

  • Avoid calling array_merge in a loop – When profiling Elgg some time ago I discovered that array_merge was (and I believe still is) horrifically expensive. The function does a lot of validation which in most cases isn’t necessary and calling it in a loop is ruinous. Consider using the “+” operator instead.
  • ProfileProfile your code using x-debug, find out where the bottlenecks are, you’d be surprised what is expensive and what isn’t (see the previous point).

Non-exclusive list, hope it helps!

Those of you who know me from my work on Elgg or building the platform powering Latakoo are possibly not aware that I also write application software. Not as much as I’d like, the lure of web services and the long tail is seductive, but while web services solve some problems they create others. In many ways, traditional application software is simpler, for one thing it’s a heck of a lot easier to charge money for.

Anyway, recently I wrote an application which lets NPPL pilots manage their pilot logbooks and keep track of the various currency requirements. The NPPL Pilot Logbook is written in Java and runs on Windows, OSX and Linux. I went from idea to product in a couple of weeks, here are some notes on what I did, in no particular order, which may be of use to some of you…

Setting the release date

Something I try and do with any of my projects is set a very short time frame for having something out of the door. The reason is that I find enthusiasm inevitably starts to wane (often this starts to set in remarkably quickly). Once this starts to set in, the chances of the product ever getting released drop exponentially until inevitably the project gets canned or it gets sucked into months or years of internal chrome polishing, which is startup death (generally this is down to fear of putting their assumptions to the acid test of the market).

Anyway, many others have written about this far more eloquently than I can, but as a rule of thumb I set the following milestones for a project wherever I can:

  • End of day 1: First runnable (it doesn’t have to do much, it may crash all the time, but you have something that runs).
  • End of week 1: First public version.

Of course, you’re not done after week one. You’ll get feedback, good and bad, you need to roll this into the next version and release that. Rinse repeat. But crucially you’ve moved your project from possibility to reality, and it’s only in one of those that your project can be called a success.

Sure, you’re going to get users complaining that feature x doesn’t work or that the software crashes, maybe you’ll see a few negative blog posts. But all these people are using your software, and taking their feedback on board is a very successful way of turning them into very loyal customers.

But remember, nothing built by humans is ever perfect and the opposite of love is not hate, it’s indifference.

Language and platform choice

My target audience uses a variety of different computers, often without an internet connection (hence why I decided to build this as a desktop application in the first place). So, I was aiming at building something as multi-platform as possible. My development environment is entirely Linux, with an Windows XP VM I keep kicking about primarily so I can sync my iphone on itunes, so I also wanted something I could build natively.

I considered C/C++, but while all multi-platform stuff is going to be write once, debug everywhere, the trad C/C++ route seemed to be the most painful. Certainly when it comes to decoding the Win32 API or trying to link to wine libraries.

So, the choice was between Mono/.NET or Java. I picked Java, for no reason other than it seemed to present the fewer hurdles and it seemed to have the more mature tool chain. That said, I’m watching Mono with a keen interest.

Writing the code

I wrote some code, I leave this as an exercise for the reader. Under the hood, the NPPL Logbook is not a terribly complicated piece of software, I used Swing for the UI put together with netbeans and using their UI extensions.

A couple of very specific points:

  • Home directory: there seems to be no reliable way of getting a home directory location in a multiple platform context, and you can’t rely on user.home being set. So, writing configuration files can be interesting, especially if the user chooses to install in program files under windows.
  • Give the OSX dock bar a nice title

Deployment

I admit that this is still a little bit of a work in progress for me. Windows has by far the most mature deployment tool chain, with many good free installer construction kits.

Mac works from directly running the .jar archive, intelligently extracting the .zip file to a seemingly sensible location. Linux is still a little fragmented, although I do remember that .deb archives are pretty straightforward to build. Since the Linux demographic is highly technical I figured a .jar archive and a linux launcher would work for them.

The Website

My requirement for the website were that it should look (reasonably) professional given my limited time and graphic design skills, and that it should not be a time sink (as they often are). So, I got an appropriate domain (nppllogbook.com) and deployed the latest version of wordpress with a businesslike free theme I grabbed from wordpress.org.

It took a couple of days, off and on, to tweak the theme, install plugins, do basic SEO, write pages and get the site to a state where people could actually download my software (and as a by-product I open sourced a handy tool).

Next comes…

Having a way to get paid

I leave this until last, but it is obviously not the last in priority. From the beginning I had a simple business model; build something that solves a problem, sell it to people who have the problem it solves. To do this you need to be able to accept money (obviously), and if you don’t have an easy way for your customer to give you money you have problems.

As a non-US based company, this actually proved somewhat problematic. The new best of breed payment systems that world+dog seem to be using these days (Braintree et al) are pretty much US only, unless you want to slap some cash down up front to get a traditional merchant account with a bank.

This basically leaves you with Google Checkout or *gulp* Paypal. I tossed a coin and lumped for Paypal, which isn’t too painful providing you avoid their tight integration. I wrote another tool to generate registration codes, and use Paypal’s notify api to trigger an endpoint, which emails a registration code to the customer.

The headache I get every time I have to implement a payment system was mercifully small.

What next?

So, the NPPL Logbook is out there, you can download it and even pay for it (and if you’re a private pilot, I would encourage you to do both!).

Right now I’m spending time promoting it, as well as processing feedback to feed into the next version, and working on other product ideas. That’s another take home – you’re never done, and that’s a good thing!

Anyway, hope this rambling stream of conciousness will be useful!