Just thought to record the horizontal scaling options I came across so far.
- Connected and stateless – Requests in this scenario are synchronous (connected), meaning request is processed immediately and response served. Till then connection to kept open. Servers are stateless. When I refer stateless, I mean that sessions are externalized to a data store (probably a in-mem cache). In this scenario, we have 2 options to achieve horizontal scaling.
- Place servers behind load-balancer. Load balancer could distribute the requests across the servers depending on the health and load of servers behind it. As the servers are stateless, requests from the same session can be served by any server behind the load balancer. Depending on the expected load, you could attach any number of server to load balancer and scale out. With AWS cloud, you have the auto scaling option to add EC2 instances depending on the traffic.
- Delegate the request routing to DNS. DNS could route requests to the servers based on round-robin fashion. This approach has an obvious disadvantage w.r.to TTL attached to DNS entries. Even if low TTL is set, servers enroute could cache the entries and make this approach ineffective.
- Connected and stateful – This case is similar to above but servers are stateful. Servers have the session information. You would have to enable stickiness in load-balancer. All requests from the same session would go the same server. If the server fails, the session could be lost.
- Disconnected architecture – This architecture suits when the result of processing is not immediately required. This would leverage a queuing system. The front end system which faces the end-users, places a message in queue and revert to the user. A fleet of systems behing the queue could keep polling the queue and do the necessary processing. You could attach any number of servers behind the queue. Heavylifting is done by these servers and the front-end system is lightweight. Order submission in e-commerce websites is a good example of this.
- Distributed data processing – This is most suited for offline job processing. Map reduce architecture followed in Hadoop ecosystem is a good example of this. Job is split into chunks and shared with number of servers. Results from the servers are aggregated and combined.