Over on his blog, my good friend and colleague Ben has written a good post about bugtrackers. He is essentially complaining that there are currently none available that are good for both developers and end users.

Broadly speaking I agree with him. The two main players – Bugzilla and Trac – are both lacking. Bugzilla’s interface has notable usability issues, and trac too is somewhat lacking.

In both cases however, the core functionality of what a bugtracker actually does – a prioritised and editable todo list – works perfectly.

The problem is interface.

How do we create one that is useful to both developers (who need quite detailed settings) and end users (who need a simple interface and in many cases need a certain amount of hand holding in order to fill in a report which is useful to the developer)?

Thinking back to my usage of both Bugzilla and Trac – the answer is that we don’t.

Let me explain: I have used both Bugzilla and Trac in anguish on large projects for many many years, but I have hardly ever used the default interface – currently I use the excellent Mylyn (nee Mylar) for Eclipse. For me a bugtracker is a central todo list accessible from anywhere – combined with a central svn repo it becomes possible for me to continue to do work anywhere there is a computer and internet connection… invaluable if you spend any amount of time travelling.

It seems to me that a good approach would be to have the bug tracker entirely API driven (more so than it is now – which in many cases is a later bolt on), that way it would be possible to provide a variety of expert interfaces for developers and a simplified interface for end users – rather than having one interface try and do it all.

This interface should hold peoples hand and ask specific targeted questions to encourage non-programmers to provide reports which will be useful to developers.

Tagging (and tag clustering) could be a useful technique to then group issues together – making it easy to find related issues and to spot duplicates.

Building on some social technology to establish relationships between issues, comment around them and attach files and other media could also be useful.

If the underlining engine is the same this shouldn’t involve too much in the way of work duplication, but will allow for tighter integration with the tools and workflow people actually use.

I have spent the last couple of hours grappling with this problem, and having finally got to the bottom of it I’d thought I’d share my solution.

Ok, so the problem was that a PHP script which prepared a download (in this case a .zip) from Elgg’s file store was working fine in Firefox but producing a corrupt archive in IE.

On examining the headers being sent and received I was able to establish that there were two main issues going on:

  1. The zip file was being compressed by mod_deflate, this was being incorrectly handled by Internet Explorer, and so was producing a file which was actually a gzipped .zip file. This is a known issue, and is why Elgg’s .htaccess file only compresses text and javascript.
  2. The code which only permits compression for text mime types was being ignored.

The reason, obvious with hindsight but not at the time, was this:

The file was being served by a script, this script modifies the mimetype via header. However, apache was determining whether to compress the file or not based on the initial mimetype of the script – which of course was text/html!

Once I figured that out, it was fairly simple to solve. I added the following lines to the mod_deflate settings in the .htaccess file.

SetEnvIfNoCase Request_URI action\/* no-gzip dont-vary
SetEnvIfNoCase Request_URI actions\/* no-gzip dont-vary

These lines turn off gzip compression for all actions, while leaving compression running for all other files. This solution is better than turning compression off altogether but it is not ideal, for one if you attempt a script download from anywhere but an Elgg action (which really you shouldn’t be), you will need to modify .htaccess yourself.

Any better solutions welcome!

“I am creating an Elgg network and I am expecting 10 million users, can it cope?”

This question gets asked in one form or another almost every week, but I believe this is the wrong question to be asking. Perhaps the more pertinent scalability question is:

“I am creating an Elgg network, how do I attract 10 million users?”

This is by far the hardest question to answer and it must be solved before you can seriously address the comparatively simple task of hardware and software scalability.

Attracting users can be accomplished in many ways but it is mainly a matter of marketing, and of course to have a killer idea. As Ben discussed in his presentation at the recent Elgg Conference, this idea must be as useful to user 1 as user 10 million.

The idea has to be solid from day one,  so forget all the trendy “long tail” and “wisdom of crowds” buzzwords!

Once you manage to solve this most tricky of problems, you can begin to look at the infrastructure. So, can Elgg handle 10 million users out of the box?

Simply, no script in the world can handle this level of usage straight away without some modification and a serious investment in both time and money. You will not be able to unpack Elgg on a cheap shared host and have it handle 10 million users.

This is not an issue with Elgg’s design (which actually lends itself to many scalability techniques), but simple realism. Elgg has had substantial work done on scalability and optimisation – reducing queries, caching etc – and currently performs very well page for page against competitors like Ning and Buddypress.

Asking how many users an Elgg install can support is also a pointless question, because the answer is always going to be “it depends”. How many users Elgg can support depends on your hardware, your host (shared or dedicated), your database server, how your users behave and how many of them are active at any given time.

So what should you pay attention to?

Elgg itself is fairly optimal, and will improve over time. If you are dealing with millions of user you will be wanting to look at your server infrastructure – database server, bandwidth, memory, caching at every level. After this you can look at customised code to squeeze out the last percentage points of performance.

If you are serious about handling high load there is no avoiding the need to spend some time and money investing in your infrastructure. But, these are good problems to have, because it means that you have a successful network!

So in conclusion, my answer to the scalability question is “Don’t worry about it until you have to worry about it!”, get your users in first. Make a killer service that is useful from day one, and then worry about how you will handle millions of concurrent users.

Scalability is a largely solved problem… building a successful service isn’t, and is the thing you should be concerned with.