It has been a few weeks since I finally received my Raspberry Pi, but up until today I have been too busy to play with it.

This changed today when I finally installed a boot image on a 4GB SD card, wired up the tiny little circuit board to the TV and connected the power. I was very gratified when my TV sprung into life and I was greeted with a booting linux system!

I had a little play before I had to be getting back to work, and first impressions were very positive. I opted for the recommended debian based image, a distribution I am very familiar with. Network and USB functioned straight out of the box, and I was even able to install a few packages.

I’m looking forward to tinkering with it some more!

tl;dr: Most NDAs in are harmful to my business and most importantly bad for my clients. Please don’t take offence if I don’t sign yours.

Like many others who work in freelance software development I am frequently asked to sign NDAs. As with employment contracts, these are seen by many as a formality and the expectation is that they are signed without reading and without question (which is of course a very bad idea). My refusal to sign NDAs has raised more than a few eyebrows over the course of my career, but in almost every case the client accepted my reasons once explained.

Over on his blog, John Larson goes in to far more detail that I am as to the reasons why he won’t sign a NDA (well worth a read btw). Suffice it to say, my not signing your NDA is not personal, its not me being confrontational, or about me standing up for some abstract principle of software freedom, and it’s certainly not about me wanting to run off with your idea!

Simply, it’s about wanting to preserve the ability to help present and future clients and, in short, maintain the ability to operate a business. Let me explain…

Everything is connected

You have a fantastic idea burning in your brain that you need my help and advice to build. Gosh, I’m flattered! However, I absolutely guarantee that no matter how unique and original the idea or service is, I will be able to name at least half a dozen other services that do, at least in part, something similar. There is a very good chance (especially if your venture is in the realm of web technology and social software) that I have built or worked on something very similar in the past, which may well be why you’re talking to me in the first place!

From a technology point of view, I give a cast iron guarantee that even if the concept is so radical as to have nothing even remotely similar already out there, it will employ techniques and technologies found in wide use. Technology is a field where techniques, ideas and concepts are remixed constantly and overlap widely. As John points out in his essay, the impossibility of tracing the exact time and place a given idea or concept originated leads to a grey area over what one can or can’t use in the future, and can lead to costly litigation.

As an aside, historically, ideas haven’t counted for much in the grand scheme of things, it is execution that matters. The idea of a social network existed long before Facebook, search engines existed long before Google, and Microsoft had a tablet PC long before Apple. In each case, the it was the winner’s execution of the idea that won the day.

That does not mean that your idea is a bad one, or that it is not worth doing, just that the protection of it through legal means by limiting my ability to best help my clients (including you) is probably not where you want to invest your energies. You had the idea so there is every possibility that someone could come up with a similar one in isolation, but that doesn’t meant they’ll implement it well, if at all.

Experience is my business

An agreement which places restrictions on using techniques developed and experience gained on previous projects directly harms my ability to build on my expertise and to inform future design decisions. This directly limits my ability to operate a business.

Too often, the NDA that I am asked to sign (often before discussing a project in detail) restricts the use of anything learnt while working on the project. This is typical for the boiler plate NDA the majority of people seem to use. Even if such mental compartmentalisation were humanly possible, it would mean, at best, reinventing the wheel for each client. Since this is of course impossible, signing such an agreement is knowingly disingenuous.

Put it this way; as my client you are paying for my expertise and experience. How comfortable would it make you feel if you thought that a previous client’s NDA might prevent me from avoiding costly mistakes, advising you on what did or didn’t work in the past, or building something to the best of my ability?

NDAs I can sign

NDAs do have their place, and I have signed them in the past. However, those that I have signed have always covered very specific enumerated and tangible items of declared confidential information.

So for example, if the project requires me to have access to the inner workings of a certain piece of pre-existing patented technology, I would probably sign. Or, if you needed to release to me a client list or a database containing confidential information. Not a problem.

This and much more, I believe, is covered by the principle of client confidentiality; like a doctor, I’m not going to discuss the details of a client’s project with the wider world, unless they have given me permission to do so. If the client wishes to have some extra formality in this regard then I am generally happy to provide it.

However, if the NDA places restrictions on my ability to use my experience to help other clients. I almost certainly won’t be able to sign. Again, like a doctor, I need to be able to use experience to diagnose symptoms and treat them with techniques I’ve used before. My business and my clients can’t afford to have that ability restricted.

Image “Ssh! It’s a secret!” by RobCottingham used under the Creative Commons Licence.

By default, the standard LAMP (Linux Apache Mysql Php/Perl/Python) stack doesn’t come particularly well optimised for handling more than a trivial amount of load. For most people this isn’t a problem, either they’re running on a large enough server or their traffic is at a level that they never hit against the limits.

Anyway, I’ve hit against these limits on a number of occasions now, and while there are many good articles out there on the subject, I thought I’d write down my notes. For my own sake as much as anything else…

Apache

Apache’s default configuration on most Linux distributions is not the most helpful, and you’re goal here is to do everything possible to avoid the server having to hit the swap and start thrashing.

  • MaxClients – The important one. If this is too high, apache will merrily spawn new servers to handle new requests, which is great until the server runs out of memory and dies. Rule of thumb:

    MaxClients = (Memory - other running stuff) / average size of apache process.

    If you’re serving dynamic PHP pages or pull a lot of data from databases etc the amount of memory a process takes up can quickly balloon to a very large value – sometimes as much as 15-20mb in size. Over time all running Apache processes will be the size of your largest script.

  • MaxRequestsPerChild – Setting this to a non-zero value will cause these large spawned processes to eventually die and free their memory. Generally this is a good thing, but set the value fairly high, say a few thousand.
  • KeepAliveTimeout – By default, apache keeps connections open for 15 seconds waiting for subsequent connections from the same client. This can cause processes to sit around, eating up memory and resources which could be used for incoming requests.
  • KeepAlive – If your average number of requests from different IP addresses is greater than the value of MaxClients (as it is in most typical thundering herd slashdottings), strongly consider turning this off.

Caching

  • SquidSquid Reverse Proxy sits on your server and caches requests, turning expensive dynamic pages into simple static ones, meaning that at periods of high load, requests never need to touch apache. Configuration seems complex at first, but all that is really required is to run apache on a different port (say 8080), run squid on port 80 and configure apache as a caching peer, e.g.


    http_port 80 accel defaultsite=www.mysite.com vhost
    cache_peer 127.0.0.1 parent 81 0 no-query originserver login=PASS name=myAccel

    One gotcha I found is that you have to name domains you’ll accept proxying for, otherwise you’ll get a bunch of Access Denied errors, meaning that in a vhost environment with multiple domains this can be a bit fiddly.

    A workaround is to specify an ACL with the toplevel domains specified, e.g.

    acl our_sites dstdomain .uk .com .net .org

    http_access allow our_sites
    cache_peer_access myAccel allow our_sites

  • PHP code cache – Opcode caching can boost performance by caching compiled PHP. There are a number out there, but I use xcache, purely because it was easily apt-gettable.

PHP

It goes without saying that you’d probably want to make your website code as optimal as possible, but don’t spend too much energy over this – there are lower hanging fruit, and as a rule of thumb memory and CPU is cheap when compared to developer resources.

That said, PHP is full of happy little gotchas, so…

  • Chunk output – If your script makes use of output buffering (which Elgg does, and a number of other frameworks do too), be sure that when you finally echo the buffer you do it in chunks.

    Turns out (and this bit us on the bum when building Elgg) there is a bug/feature/interaction between Apache and PHP (some internal buffer that gets burst or something) which can add multiple seconds onto a page delivery if you attempt to output large blocks of data all at once.

  • Avoid calling array_merge in a loop – When profiling Elgg some time ago I discovered that array_merge was (and I believe still is) horrifically expensive. The function does a lot of validation which in most cases isn’t necessary and calling it in a loop is ruinous. Consider using the “+” operator instead.
  • ProfileProfile your code using x-debug, find out where the bottlenecks are, you’d be surprised what is expensive and what isn’t (see the previous point).

Non-exclusive list, hope it helps!