Tuesday, July 2, 2013

Hadoop Hindsight #1 Start Small


I thought we would start series on some lessons we've learned.  Many of the topics I've learned the hard way so I hope it will be helpful for those a few steps behind in the journey.  YMMV, but I wish this ideology was firmly ensconced when we started.
Identify a business problem that Hadoop is uniquely suited for.
Just because you found this cool new hammer doesn't mean everything is a nail.  Find challenges that your existing tech can't answer easily.  One of our first projects involved moving 300 gigs of EDI transaction files.  A business unit was having BA's grep for customer strings on 26,000 files to find 4 or 5 files, then FTP'ing those to their deskptop for manual parsing and review.  They might spend a few HOURS doing this for each request.  It was a natural and simple use of Hadoop.  We learned a lot about design patterns, scheduling, and data cleanup.
Solve this one business challenge well.
Notice I didn't say nail it perfectly.  There are many aspects of Big Data that will challenge the way you've looked at things the last 20 years.  The solution should be good, but not necessarily perfect.  Accepting this gives time to establish PM strategy and basic design patterns.
Put together a small team that has worked well together in the past.
This is critical to your success! Please, please, please take note!  Inter-team communication is the foundation upon which your Hadoop practice will grow.  In The Mythical Man-Month my man Fredrick Brooks said:
To avoid disaster, all the teams working on a project should remain in contact with each other in as many ways as possible...
Ideally a team should consist of the following:
1 Salesman (aka VPs)
1 Agile-trained PM
1 Architect
2 Former DBAs
1-3 skilled java developers
1 Cluster Admin
Obviously this is very simplified and some roles can overlap.  My point is you should have no more than 10 people max starting out!
Support your solution.
This very same team should also live thru at least 3 months of support of the solution they've created.  Valuable insight is gained once you have to fix a few production problems.  Let the solution mature in production a bit to understand support considerations. This gives you time to adjust your design patterns. Trust me, you'll want time to reflect on your work and correct flaws.
Smash your solution and rebuild (Optional - If time permits)
Good luck getting the time, but if you're serious about a sustainable Enterprise Hadoop solution this should be rightly considered.
Go forth and multiply.
By this time your patterns and procedures should form the DNA of your new Hadoop cell. You're team should naturally develop into the evangelists and leaders upon which the mitosis of a new project occurs, carrying with it the new replicated chromosomes.  As your project cells divide and multiply, you'll be able to take on more formidable challenges.
That's all I have to say about that.