I have many geek friends who are willing to help me be in the know of the new trends in the field of Information Technology (IT). One of the ideas that has caught my attention for some time is how to manage large and mission critical applications with commodity hardware which is often referred to as Intel Boxes.
Normally when we think about mission critical data centres, reliability and scalability takes precedence over cost. Then what come to our mind are high end servers and mainframe computers that cost oodles of money.
With everybody, industry, government, health et al looking for more and more data to crunch, conventional approaches to data centre management soon reaches its limit and explodes in cost. That is when smart geeks put together solutions that can manage this big number with arrays of commodity hardware.
The basic idea of software design in this paradigm is that any box in the array should be allowed to fail without affecting serviceability.
When a box fails the technician just walk across, pulls out the faulty piece and replaces it with a similar piece. With commodity hardware, it is not too costly to store enough spares. Important thing to take care of is, to use standard hardware and standard system software consistently across.
My friend Shuvam (who is one of the best techies in town) recently made a presentation on this, at our office. The key learning from his presentation were:-
When you are confronted with any problem which requires data crunching on millions of records (like Insurance premium, Interest calculations, MIS report generation, Searches through large data sets) or any transaction system which has a very high rate of transactions on unrelated records (like Trading systems, Settlement systems, Core banking system, Issuing Unique Id to Indian residents) the solution is to set up (i) A cluster of cheap servers with local disks with High-speed connectivity (Gigabit or faster) then (ii)Slice data horizontally, (iii) distribute among nodes, (iv) Send the query out to all nodes, so that each node can compute and send back answers, (v)Aggregate the results coming from the nodes, and (vi)Re-do logging for fault tolerance. Divide and Conquer!
In a layman’s language, it means that we learn to use lots of average people to do complex and large scale processes. This is what the Mumbai Dabbawalas have shown in human processes much before Google demonstrated in IT processes. A brilliant implementation with ‘common hardware’, none of them individually equipped with any training in logistics management or operations research, collectively manage a highly scalable and reliable service delivery with better than six sigma levels of quality.
What are the conceptual framework that is common to both Google data centre and Mumbai Dabbawala that is worth learning?
Process design that allows for any component to fail, i.e.; the processes should be as person independent as possible and it should be possible to scale by adding more components without adding to the complexity.
The key ingredients that one needs to take care of this are;
i) Ensure that the process flow is really smooth with no scope for inventory pile-ups
ii) Break down the job to clear and self contained modules that can be performed without waiting for instruction from the top
iii) Develop Crystal clear ‘standard operating procedures’
iv) Make sure that each member joining the team has this SOP embedded to his brain stem and not just uploaded to his brain
v) Have enough redundancies
This same idea is one of the core components of what Toyota people have popularised as the LEAN principle in manufacturing.
One of the big concerns that we often hear from the mangers is that they are unable to deliver and or scale because they can’t afford to have high quality people (sort of IBM mainframes and Sun Servers) for doing mundane things as smart people are costly and difficult to retain. The discussion above shows how you can scale with Intel Boxes and Intel People.
You don’t need such high-end hardware (human ware) in abundance to build
‘Teams that Scale' ; what you need is to have a leader who has a vision and a committed team of evangelists (Jesus Christ and his twelve disciples) who can build and motivate troops who are willing to build on their faith.
Great post Koshy!
ReplyDeleteHi, As mentioned to you personally, would have liked to read more of your thoughts / views / assessment on Intel People. Hope to see an exclusive post on that some day when you feel like writing about it.
ReplyDelete