Why HDFS

Posted on: Sun 05 August 2012

Charles Zedlewski makes an interesting analogy between HDFS ™, the Hadoop Distributed File System, and the Linux operating system. Stealing the punchline:

It’s rare when you get to see history repeat itself so completely as it is with HDFS. Today HDFS may not be the best filesystem for content addressable storage or nearline archive. But then 15 years ago who would have thought Linux would find its way into laptops, routers, mobile phones and airport kiosks?

Linux drew us the map. The smart money is already following it.

We’ll see how it plays out, but given the entrenched nature of HDFS he might be right. HDFS’ open source nature, and maybe more importantly community, means just about any good distributed file system idea can be quickly embraced and extended.

There is probably one area where HDFS could be radically updated or face displacement. Real-time streaming datasets don’t fit the HDFS model particularly well. Doesn’t mean someone smart can’t come along and fix it up.

Zedlewski also heavily invokes the nice support for Map/Reduce processing that HDFS provides. Map/Reduce is clearly successful, but these other processing demands may eventually lead to other programming models that fit less well with HDFS.

But I’m of a mind that Zedlewski is mostly right, and that HDFS is a nice solid foundation to build on going forward.