“I am creating an Elgg network and I am expecting 10 million users, can it cope?”

This question gets asked in one form or another almost every week, but I believe this is the wrong question to be asking. Perhaps the more pertinent scalability question is:

“I am creating an Elgg network, how do I attract 10 million users?”

This is by far the hardest question to answer and it must be solved before you can seriously address the comparatively simple task of hardware and software scalability.

Attracting users can be accomplished in many ways but it is mainly a matter of marketing, and of course to have a killer idea. As Ben discussed in his presentation at the recent Elgg Conference, this idea must be as useful to user 1 as user 10 million.

The idea has to be solid from day one,  so forget all the trendy “long tail” and “wisdom of crowds” buzzwords!

Once you manage to solve this most tricky of problems, you can begin to look at the infrastructure. So, can Elgg handle 10 million users out of the box?

Simply, no script in the world can handle this level of usage straight away without some modification and a serious investment in both time and money. You will not be able to unpack Elgg on a cheap shared host and have it handle 10 million users.

This is not an issue with Elgg’s design (which actually lends itself to many scalability techniques), but simple realism. Elgg has had substantial work done on scalability and optimisation – reducing queries, caching etc – and currently performs very well page for page against competitors like Ning and Buddypress.

Asking how many users an Elgg install can support is also a pointless question, because the answer is always going to be “it depends”. How many users Elgg can support depends on your hardware, your host (shared or dedicated), your database server, how your users behave and how many of them are active at any given time.

So what should you pay attention to?

Elgg itself is fairly optimal, and will improve over time. If you are dealing with millions of user you will be wanting to look at your server infrastructure – database server, bandwidth, memory, caching at every level. After this you can look at customised code to squeeze out the last percentage points of performance.

If you are serious about handling high load there is no avoiding the need to spend some time and money investing in your infrastructure. But, these are good problems to have, because it means that you have a successful network!

So in conclusion, my answer to the scalability question is “Don’t worry about it until you have to worry about it!”, get your users in first. Make a killer service that is useful from day one, and then worry about how you will handle millions of concurrent users.

Scalability is a largely solved problem… building a successful service isn’t, and is the thing you should be concerned with.

25 thoughts on “Elgg scalability

  1. Hi Marcus,
    I agree with u.I saw the major difference between elgg 1.2 and 1.1
    You’ve done a lot of works to improve the performance and scalability in core.Garbage collection log clear regularly, and most breath taking strategy memcache usage in all core class like entity class.I think we can only use them on dedicated host instead of share host.

    thanks for your work again

  2. Hello,

    I think that the core architecture has a far greater impact than the hardware in most cases. In other words, if you want to double or quadruple your performance, then hardware will do it. If you are looking for several orders of magnitude then you need to look at your code and data structures.

    Hardware changes can improve your performance in a linear fashion only, and is greatly limited by whatever whatever bottlenecks you may have. If you are CPU bound, then you can indeed scale by adding processors and (likely) servers, but even then only up to a certain point. Hard-disk performance bottlenecks, on the other hand, can be much harder to solve with hardware (within most budgets).

    I’m very new to Elgg, but from what I’ve read and seen, most if not all entity data is stored in a single table. Each of these entities can be linked to other entities in the same table, which means multi-joins on one or more large database tables – a design that generally does not scale very well.

    Whether my understanding of Elgg’s architecture is correct or not, the point is that a system’s performance can degrade exponentially as certain quantities grow. In such scenarios, no matter how much hardware you throw at that problem, you won’t fix it.

    ———-

    Getting back to elgg, the proof is in the pudding: who has scaled the platform to (say) a million users, right now? What hardware did they need to use to make this happen? What tweaks were necessary? Even if you cannot know whether you will need to scale that high, it would be nice to know what you are getting into when committing yourself to Elgg… 🙂

  3. Hi,

    Marcus I agree to some extent but C Marcel has a very valid point and having all the data in a single table seems to present a bottleneck, I have had experience with like problems and database table locking becomes very painful for user interacting with the system.

    I guess the question myself and others in the community are wondering is “Have any benchmarks been completed on Elgg?” I completely understand that a lot “depends” as you say on content, hardware, network, etc., but industry whitepapers on performance typically define all of this upfront with the benchmark test cases (thinktimes etc.). Are there any plans to do a benchmark and publish the results? I think this would help set people at ease that the solution they are investing so much time in will grow with them without having to worry about it when the time comes.

    All of this said, I’m relatively new to Elgg and have to say I’m very impressed with it so far. I’m playing with the 1.5 version and love it. You all have done a great job with it. 🙂

    -Troy

  4. Currently we don’t have any benchmark data. We have done quite a bit of code and performance profiling, but that’s not quite the same thing.

    A number of people in the community have expressed an interest in performing some load benchmarks but I have not heard anything further.

    That said, me and the rest of the Curverider team are working on some quite large Elgg installs. Anything we learn will of course be published and fed back into the community and core code.

    To answer the specifics: Yes, the schema design is a compromise, but I believe it is a good one – and for that matter I don’t believe it is a big one at that.

    The profiling we have done on the engine has indicated that the database queries (even on quite highly loaded sites) were not the bottleneck.

    In the code itself a lot of time was spent doing file IO to discover views (addressed in the upcoming Elgg 1.5 release).

    A typical page is generated in ~ 0.5 seconds. Apache then transmits this to the user which takes a variable amount of time depending on connection speed.

    There was a PHP related issue whereby echo took a very long time to echo large blocks of data but again we have worked around this in 1.5.

    However, by far the main bottleneck was retrieving images, CSS and JS after the main page had been generated.

    Much of this can and has been addressed by some fairly aggressive caching.

    Caching (in the form of memcache) can also be used to drastically reduce the load on the database in high load environments. Right now it is looking very much like any modifications we will need to make to the schema are going to be relatively minor if we need to do them at all.

  5. Hi Marcus. You wrote that elgg uses echo for data output. Why didn’t you use buffer output, as buffer output is usually faster?

  6. Hi Marcus,

    I have to install Elgg on a site with 50k users and we expect to have 500k in 2 years time. We need to provide an estimation in the amount of hardware required. I know that there are not real benchmark done and it all depends on the use etc… but do you know how much memory is each logged user consuming ? how many pages can elgg serve per second per CPU etc…. any data would be very usefull. I’ve seen that elgg website has nearly 20k users. What kind of hardware are you using to serve it ? how many servers Apache, MySql in cluster etc ….

    thanks !

  7. @pasta Largely that is a “how long is a piece of string” question 🙂

    You are right that no benchmarks have really been done (and I would be interested in hearing your experiences), but to start you off:

    Research done by Stan from Brighton Uni indicates that on a typical to high usage site you can expect to have ~5% of users active at any one time (this is typical usage, obviously you’ll get spikes which you should probably be able to handle).

    Memory wise during an individual page impression elgg generates some caching and temporary variables – so you can guess at about memory = site object + site query + user object + user query + dataset of page.

    CPU wise, all indications that we have done through profiling is that script execution is a negligible portion of the overall page load – on the order of ~0.02 sec depending on machine.

    In terms of hardware, our larger installs are still fairly modest in hardware terms (which is a good thing!), and if you were going to splash out on anything I’d splash out on memory (which gives you memcache options).

  8. A bit strange that you are not able to answer the question about scalability.

    Comments like the one above by C Marcel scares me: “Whether my understanding of Elgg’s architecture is correct or not, the point is that a system’s performance can degrade exponentially as certain quantities grow. In such scenarios, no matter how much hardware you throw at that problem, you won’t fix it”.

    Exponents always rules and … exponents always scares.

    I was thinking yesterday: OK, I don’t know much about this, but it looks nice. Lets go for it.

    But, if Elgg does not not scale? Do you have any model of the scaling (even the simplest would do)? What is your idea behind scaling?

    To be more specific, how many servers (dedicated) are needed per user (or per 5% user mentioned above)? And how does it scale? Exponential use? A benchmark is not needed to make such an estimate.

    (And, the site we are designing will have millions of users in the next few years (it is OK if you laugh). 2010 it will be 20 000. And, of course I worry, even if the failure probability for the site is large.)

    Best regards,
    Hakan Olin

    PS And I think your Elgg is fantastic. Really …

  9. Exponentials are always scary, but so far all growth on elgg systems appear to be fairly linear.

    Pre 1.5 there was an issue with scaling where plugins and views were concerned however this has been fixed – this may be what a couple of people have had concerns with in the past.

    By way of example – Community.elgg.org is an active site with quite a few users. This site is running on vanilla elgg on a very modest server without any special tweaks. It isn’t even running memcache as it ran fast enough without it.

    With memcache enabled hits on the database drop dramatically as well, leaving most of the work done by the apache process.

    My advice is still to walk before you can run. Get a good site going and then scale up. All scalability problems are solvable ones, and if it is necessary to perform modifications to the code then these can be accomplished when it is economical to do so (cost of development < new hardware for equiv performance gain).

    As you scale the database grows in size of course, so one could look at splitting reads and writes (support already built in). There are also sharding techniques to try but this would require some schema modification.

  10. A lot of you have been talking about having “resonable” hardware. Is that defined for elgg? For example what is the elgg community site hardware specs? Is that common knowledge? It would really help if some of you folks explain your real world hosting / clustering setups.

  11. Hi,

    I was wondering if I can connect elgg to another type of DB maybe oracle or even a nosql DB, how can I achieve this? even if it needs heavy customization. I would really appreciate is if you give me a head start.

    also do u think it is better to do it on 1.7 or on 1.8?

  12. The core should be “relatively” straight forward… you will need to create a compatible schema then modify the core get_* functions etc as well as the database initialisation code.

    The real problems comes with third party plugins which have their own database queries … we strongly recommended people didn’t do this, but there is nothing we can do to stop them, and I know that many did.

  13. I’m trying to decide which platform to use for a new social network. It worries me that there’s no clear answer here about scaling elgg onto multiple servers and how that can be achieved. If you are lucky enough to hit multiple millions of users, you need to be confident that a platform CAN scale. Not a case of cross that bridge IF it happens. I wouldn’t buy a 1.1 litre car engine hoping it’ll pull a big caravan. The easiest way to scale one server is to use server that will take lots of memory, multiple (multi core) processors, caching disk controller, sas drives, optimise the caching etc. At some point though this may not be enough. Ok use some front end caching proxy to take some load. But still you may need to scale to a multi server envronment. So that’s the key point of my question and maybe many other peoples. Can you scale Elgg onto multiple backend elgg servers / sql databases, is there a method or guide for this, does elgg cater for this out the box? etc…

  14. (late to the conversation!) Elgg is a pretty vanilla PHP/MySQL app, so it should scale with the same abilities/issues most other apps have. The big differences:

    1. Elgg truly is a PHP framework and so its plugins vary enormously in capability and, hence, resource consumption. Which plugins are enabled and how users interact with them will have an huge impact on performance.
    2. Almost all users see individualized content. The view system also is completely dynamic. Both facts mean there’s very little opportunity for caching and any site that scales will have to figure out how much dynamism they are willing to give up to cache more content.

  15. Last comment was made around one and half year back, and second last one was made 3.5 years back.

    So were all issues related to scalability resolved ? What happened to those, who were expecting 50k or 200k users ? Did any of them get their milestones? If so, then, they can share their experiences.

    That will be helpful for all.

    Let’s move ahead. I got the same question here, what you all had. Expecting 1 million users, And 10k concurrent users 🙂

    I do agree with what “C Marcel” says that if a system is consuming too much resources at database level, specially using too many joins, hardware scaling can’t do enough.

    Having said that if database queries are optimized enough then hardware scaling is the next option, and you should definately have a look at Amazon service like Autoscaling, Elastic Load Balancer, ElastiCache etc.

    I have looked at architecture of Elgg and also glanced Elgg’ DB. And till now, I am getting positive vibes.

    If all goes well then I am planning to do some load tests with x number of concurrent users and will share my thoughts here.

  16. Everyone’s scalability issues are different, and you _will_ need to do some modifications to vanilla elgg to handle large numbers of concurrent users.

    As well as standard web stuff (server side cache, opcode cache etc), I’ve had a lot of success building specific caches for expensive pages – e.g. construction of the river. Simple, but effective.

    Finally, one client decided to go a different direction, using a nosql backend which fit better into elgg’s object style data model than MySQL. It was a fair amount of work, but the core data functions were replaced and functioned pretty much as a drop in replacement.

    Operating in a load balanced environment, the biggest problem is the data directory, which would have to reside on a clustered filesystem or a S3 mount in order to be shared over your spooled up instances.

  17. You mentioned the data directory as being a problem in a load balanced environment. What is actually stored in the data directory? We have decided to store “content” in a separate storage service so that is not a problem for us. However, it appears that ELGG also uses the data directory for caching, is this true? We are using ELGG strictly as an engine and not as a web site with “pages” that need to be rendered and potentially cached. I guess my question is, what is actually stored in the data directory and would it come into play at all in a scenario where we are using ELGG for web services only.

  18. Elgg caches some things to the data directory; mostly just a compiled list of views on the system and a compiled list of language translations. This isn’t a big deal if this is per-node, since it’ll be generated once each time.

    The biggest deal is file uploads and profile pictures, which as you say, you’ve already resolved.

    So, in answer to your question, it’s basically just some one time run system level caching which is generated by booting the system. This should be fairly straight forward to disable/store somewhere else, or a simple approach would be to hardwire dataroot to /tmp, which should be fairly safe for transient caches.

Leave a Reply