web | Marcus Povey

The Domain Name System – which much of the internet is built on – is a system of servers which turn friendly names humans understand (foo.com) into IP addresses which computers understand (111.222.333.444).

It is hierarchical and to a large extent centralised. You will be the master of *.foo.com, but you have to buy foo.com off the .com registrar.

These top level domain registrars, if not owned by national governments, are at least strongly influenced and increasingly regulated by them.

This of course makes these registrars a tempting target for oppressive governments like China, UK and the USA, and for insane laws like SOPA and the Digital Economy Act which seek to control information, and shut down sites which say things the government doesn’t like.

Replacing this system with a less centralised model is therefore a high priority for anyone wanting to ensure the protection of the free internet.

Turning text into numbers isn’t the real problem

It may not be an entirely new observation here; the problem of turning a bit of text into a set of numbers is, from a user’s perspective, not what they’re after. They want to view facebook, or a photo album on flickr.

So finding relevant information is what we’re really trying to solve, and the entire DNS system is really just a factor of search not being good enough when the system was designed.

Consider…

Virtually all modern browsers have auto complete search as you type query bars.
Browsers like Chrome only have a search bar
My mum types domain names, or partial domain names, or something like the domain name (depending on recollection) into Google

For most cases, using the web has become synonymous with search.

Baked in search

So, what if search was baked in? Could this be done, and what would the web look like if it was?

What you’re really asking when you visit Facebook, or Amazon or any other site is “find me this thing called xxxx on the web”.

Similarly when a browser tries to load an image, what it’s really saying is “load me this resource called yyyy which is hosted on web server xxxx on the web”, which is really a specialisation of the previous query.

You’d need to have searches done in some sort of peer to peer way, and distributed using an open protocol, since you’d not want to have to search the entire web every time you looked for something. Neither would you want to maintain a local copy of the Entire World.

It’d probably eat a lot of bandwidth, and until computers and networks get fast enough, you’d probably still have to rely on having large search entities (google etc) do most of the donkey work, so this may not be something we can really do right now.

But consider, most of us now have computers in our pockets with more processing power than existed on the entire planet a few decades ago; at the beginning of the last century the speed of a communication network was limited by how fast a manual operator could open and close a circuit relay.

What will future networks (and personally I don’t think we’re that far off) be capable of? Discuss.

Amazon, who are most widely known for their online shop, has another project called Amazon Web Services (AWS) offering website authors a collection of very powerful hosting tools as well as hybrid cloud services.

This is not a sideline for Amazon – a recent conversation I had with an Amazon guy I met at a Barcamp revealed that in terms of revenue AWS exceeds the store – and it dramatically lowers the cost of entry for web authors. Using AWS it is possible to compete with the big boys without the need to mortgage your house for the data center and it solves many storage and scalability problems, what’s more – its pay as you go.

To be clear then, Amazon is now a hosting company, and its rather curious as to why people (Amazon included) are not making a big deal of this.

Anywho, I’ve recently had the opportunity to work on a few projects which use AWS extensively, and since I did run into a few gotcha’s I’d thought I’d jot some notes down here, more as an aide-mémoire as anything else.

Getting an AWS account

Before you begin, you’re going to need an account. So, head over here and create one.

Creating a server

You are going to want to create a server to run your code on. The service you need here is Amazon Elastic Compute Cloud (EC2), so sign up for EC2 from the AWS Management Console.

You are going to have to select an EC2 Image to start from, and there are a number of them available. An Image (called AMIs) is a snapshot of your server, its disks and any software you want to run. Once configured you can start one or more Instances of it, which are your running servers. If you want more control over your servers, consider getting unmanaged vps hosting.

Once you’ve configured your server, you can create your own image which you can share or use to boot other instances. This whole process is made infinitely easier by using an EBS based image. EBS is a virtual disk, these can be added like normal disks to your server – sizes 1GB to 1TB. EBS backed images store data to these disks and so configuration changes are preserved, additionally new server images can be cloned with the click of a button.

You can add extra disks to your server, but they must be in the same availability zone (datacenter) as your EC2 image and currently can’t be shared between EC2 instances.

So:

Save your sanity and use an EBS backed image – if you’re looking for a Debian or Ubuntu based one, I recommend the ones created buy Alestic. You can search for community managed images from within the management console, be sure to select EBS backed! For reference, the AMI I often use is identified as ami-209e7349.
Grab an Elastic IP and assign it to your image, point your domain at this.
Your goal is to build an AMI consisting of a ready to go install of your web project, so install the necessary software.
Note however, that you want to build your AMI so that you can run multiple instances of it in order to handle scalability. Since EBS disks can’t be shared you may have to change your architecture slightly:
- Don’t store user data on the server – user icons etc – these can’t be shared across instances. Store these instead on Amazon S3 and reference them directly using URL where possible. Note, temporary files like caches are probably ok.
- If you use a database, use MySQL, but only install the client libraries on your image – you won’t be using a local server (see below)!

Once you’ve configured your server create an AMI out of it, this can then be booted multiple times.

Using a database

The database service you will most likely want to use is called Amazon Relational Database Service (RDS) which can function as a drop in replacement for MySQL.

Sign up for it using the management tool, create a database (making note of the passwords and user names you assign) and allow access from your running server instance(s) via the security settings.

The database is running on its own machine, so make a note of the host name in the properties – you will need to pass this in your connect string.

Conclusion

AWS is a powerful tool, if you use it right. There is a lot more complexity to cover of course, but this should serve as a start.

I’ll cover the more advanced scalability stuff later when I actually get to use them in anguish (and get the time to write about it!)

Marcus Povey

Time, Space, and Plexiglas

Tag Archives: web

DNS is a symptom of broken search #sopa

Turning text into numbers isn’t the real problem

Baked in search

Tips for building web services on Amazon AWS