Friday 23 May 2008

Deployment Predictability

My colleague Uri raised an interesting point in his post. I completely agree with Uri, and would like to give an example.

I have been involved in a project for a mobile operator in the UK during the second half of last year. We built a scale-out SOA activation platform for a new mobile device launch using GigaSpaces. The GigaSpaces platform replaced an existing system.

The original system was built using JBoss as a backend server. Predictions were for a huge increase in activation-requests the system had to handle due to the new device launch. While the system worked fine using the JBoss as a platform, there was no way these guys could predict how many instances of JBoss they would need to run in order to cope with anticipated load. They started to do some benchmarking and performance testing to figure out where the system's limits were, but they soon found out, that the process was leading them nowhere. This is mainly because the JBoss's were inconsistent and they hit the scalability ceiling using a few (very few…) nodes. When adding more instances, the overhead of synchronising the JBoss cluster grow exponentially as suggested by
Ahmdal's Law, so the gain in TP that each instance added varied depending on the cluster size and other nodes' load, which kills predictability all together.

JBoss is just an example in this case. It's not a JBoss specific flaw, but rather a tier based approached which imposes a limited architecture.

They then came to us to resolves the predictability challenge.

We did an exercise to figure out how the deployment would look like using GigaSpaces, and came up with a linear formula of the HW and number of instances needed to support the given load. More than that, they knew that if the business predictions had been pessimistic, supporting extra load would mean simply deploying more spaces... On top of that, their back office systems did not support HA, and would explode if load increased suddenly, so GigaSpaces also provided HA and throttling for the backend servers. During one overnight test the database failed for about 4 hours, and the system was fully functional and completed users' requests, while completed requests waited for the database to be brought up again to complete the archiving process. The customer was truly impressed!

Needless to say that the launch went flawless, and there were no issues what so ever with the GigaSpaces based system.

So, yes – Uri makes a good point. GigaSpaces' customers can predict and properly plan ahead the deployment needed to support their business.


No comments: