Wednesday, May 22, 2013

Why most Enterprise Hadoop jobs will not require hardcore Java skills in 3-5 years.

OK.  So the controversial heading hopefully piqued your interest, and if your a hardcore java developer - just hear me out.  Its nothing personal.
In the late 1979, RSI's Oracle version 2 ran on Digital's VAX minicomputers (32bit AND virtual memory!). If you were proficient with the first commercial RDBMS, you had to posses mad Macro-11 or PL-11 (the high level version) skills to actually make many of the functions work that we take for granted now. Many basic tools that DBAs and developers use today simply didn't exists.  You had to roll your own.  Even the data dictionary was a new concept and often in-flux.
Hello World, Macro-11 style:
        .TITLE  HELLO WORLD
        .MCALL  .TTYOUT,.EXIT
HELLO:: MOV     #MSG,R1 ;STARTING ADDRESS OF STRING
1$:     MOVB    (R1)+,R0 ;FETCH NEXT CHARACTER
        BEQ     DONE    ;IF ZERO, EXIT LOOP
        .TTYOUT         ;OTHERWISE PRINT IT
        BR      1$      ;REPEAT LOOP
DONE:   .EXIT

MSG:    .ASCIZ /Hello, world!/
        .END    HELLO
Don't forget the RT-11 commands to assemble, link, and run!
.MACRO HELLO
ERRORS DETECTED:  0

.LINK HELLO

.R HELLO
Hello, world!
.
It was an immature but revolutionary way to store and recall information. Bell Labs saw the business benefits of the Oracle RDBMS and thus much hype and exuberance flowed in the land:
"They could take this data out of the database in interesting ways, make it available to nontechnical people, but then look at the data in the database in completely ad hoc ways." - Ed Oates
During these early days you would need a room full of advanced computer science academics just to keep the system functioning - at each and every business.  There were no safety nets and everyone had there own perspective on how to do both a multi-join query WITH an aggregate function (and on the 4th day RBO was created, and it was good).  Read consistency was still 5 years away!  As time went on, the best brains from the IT collective pioneered standards and best practice that we all use today.  As the tech matured, the need for low-level Macro-11 developers diminished as they were replaced by a more mature product that would appeal to large non-tech companies.  As the need for low-level tech skills went away, patterns were established and the need for highly skilled programmers to keep the data store functioning went away.  Interestingly,  the data and the patterns of its flow remained.  That is why enterprises have DBA to maintain modern relational databases, not developers.
Inevitably, there are some times when advances dictate new low-level programming skills on a large scale.  When RSI released Version 3 in C, there was high demand for developers who could read and speak the prose of Mr. Ritchie.  This was necessary for recompiling and testing a consistent code base across everything from minis and mainframes, to PCs.  While C was quite portable, there was much work to be done in the storage subsystems.  Again, as the need for low-level tech skills went away, the data remained.
When we look at the new world of Hadoop, we must understand that this type of tech revolution has occurred before.  Right now there is much work afoot to solve the primitive questions.  This undoubtedly requires a new breed of low-level Java developers... for awhile.  We see the results of these efforts in tools like Pig, Hive, Impala, and Stinger glued together via HCat.  Once the dust settles, I wouldn't stake my professional future on mastering low-level MapReduce, but rather focus on mastering the higher level tools.  This will allow the enterprises quicker access to business insight.  As Hadoop's primitive issues are solved in to standards and patterns in the next 3-5 years, the need for Java developers will diminish substantially in the next 3-5 years.  Just look at how many PL-11 or C++ programmers your enterprise has in their DBA teams; the low-level tech comes and goes, but the data remains.