Hadoop's run into enterprise cloud
Hadoop is causing a stir with its free software for enterprise IT; it's already creating a buzz with Cloudera, Yahoo! and EMC. Joe Weinman, the head of communications media and entertainment at HP, also stops by to discuss the economics of public cloud and the value of mid-sized providers in this week's episode of Cloud Cover TV.
We discuss:
- Hadoop Summit in Santa Clara
- What Hadoop’s free software can do for you and your enterprise IT
- Cloudera, Yahoo! And EMC’s Hadoop products
- Joe Weinman, head of communications media and entertainment at HP, discusses the economics of public cloud computing
- Is there more room in the market for other players besides the big public cloud companies?
- The reasons to disperse the service architecture of a public cloud
- The importance of response time in the cloud and page design
- The cloud's role in dynamic real-time data
- Network providers’ interest in offloading backbones
- Large cloud providers are trying to start building more instances in more zones
- The benefits of mid-sized cloud providers
Read the full transcript from this video below:
Hadoop's run into enterprise cloud
Jo Maitland: Hi, welcome to Cloud Cover TV, our weekly show on all the juiciest news in the Cloud computing market. I'm Jo Maitland here in San Francisco, and this week my guest on the show is Joe Weinman. Joe's the head honcho for communications, media, and entertainment at HP. He's also well known, or most widely known, for his papers on Cloud economics. He's going to be up in a bit next. First of all, though, I want to talk to you guys about Hadoop.
It's the Hadoop Summit here in Santa Clara, and Hadoop is starting to get some traction with Enterprise IT. What is it? Essentially Hadoop is an open-source software framework for developers that want to write applications that can rapidly process vast amounts of data in parallel on clusters of compute nodes. Why is that interesting? Well, as data volumes grow and grow and grow, you need to figure out a way to extract the value from all of that data.
Today, what are your options if you're not using Hadoop? Your options are expensive, proprietary, giant systems from the likes of HP, Microsoft, Teradata, IBM, Oracle, the usual suspects all in this bracket. Those technologies are well established. And then more recently along has come Hadoop, which is free, it's extensible, and it has actually a growing community of innovative young companies around it, building commercial products. So Cloudera is one of them.
In fact, EMC has a project called Project Greenplum, which is worth checking out. I'm not sure if those tools are out in the wild yet, but that's interesting. And then actually Yahoo, which runs the largest production set of Hadoop machines--it's 22 clusters, something like several hundred petabytes of data--they've actually spun that project out now to a commercial company called Horton Works. So Yahoo obviously sees that there's a business behind here.
How do you get started if you're in the Enterprise, you're in the IT department, you're working with a lot of data? Check out Hadoop. You can actually look at it on and use it on Amazon. Amazon has a service called Elastic MapReduce, so for pretty much no start up costs, you can check out how to use Hadoop there. Or you can obviously run it internally--it's free open source software--on whatever machines you have available. Enjoy, have fun with that, and we'll be reporting from the Hadoop Summit this week, so stay tuned to the site for more news.
Earliest this week I talked with Joe Weinman on a different topic. Joe is focused on the economics of Cloud computing, and specifically whether there is more room in the market for other players besides the big public Cloud companies. Here's what he had to say about that.
Joe Weinman: Why I'd like to say that I'm an easygoing guy, but I guess I'm a contrary and deep down or something. So I see a lot of what people say is going to happen, and I have questions about some of it. So I'll give you an example. One of the things that I looked at, which dates back to my original 10 Laws of Cloudonomics piece, has to do with the sort of diminishing returns that are associated with building out Clouds for latency reduction. So people tend to think of Cloud; they don't think that much about behind the scenes architecture as long as it works. But the fact is that there are lots of great reasons to disperse the service architecture of a public Cloud.
You see it with a variety of availability zones. You see it in content delivery networks. But the basic idea is that as the world gets more and more interactive, you just can't afford the time to go from the user device all the way back to the Cloud service and then return. And people look at that and they say, "Oh well, global round trip latency is just for the network or 160 milliseconds. Who's going to notice that?"
But the fact is if you actually stop and look at how long it takes for Web pages to load, there's a lot of back and forth requests and responses. And so whatever the response time is, you have to multiply by whatever it may be: 5, 10, 15, 20, 50. So certainly a lot of the engineering that goes into Web page design has to do with things like making the objects lighter, trying to do caching or predictive caching, or content delivery, where possible.
But to the extent that the Cloud is largely about dynamic real-time data, what you want to do is have any user anywhere in the world get accurate, up-to-date data and deliver that via a rich experience. And that ultimate rich experience may be video. And what we've seen so far with HD is just the very beginning. HD is going to look like black and white within the next few years, because they're already working on quad HD and ultra HD, which is 16 times the resolution. Add in 3D, add in deeper color depth . . .
Jo Maitland: Which is fabulous, but the network is not even close to supporting that yet.
Joe Weinman: Exactly, and so what you're finding is even network providers are interested in offloading their backbones by getting more resources close to the edge. So basically what it says is that you've got this nasty trade off that you're trying to make. On the one hand you'd like to consolidate as many users, workloads, applications, etc. in a single location, because then you benefit from statistical multiplexing effects that basically say that even if individual loads vary substantially, the more and more that you aggregate, they end up being relatively smooth, and in the limit they're perfectly flat.
But on the other hand if you consolidate, then you have the response time issues to deal with, so by that logic you want to move everything out to the edge to the extent possible. So what that says is that there's really this balancing act, where the large Cloud providers are trying to basically start building more and more instances in more and more zones. Response time and performance is just one issue.
Of course, you've got to look at country compliance regulations and real estate costs and power and lots of other different balancing effects. But in any event if you start with that notion, what it tells you is that there are benefits to the mid-sized Cloud providers, that not everything is going to go to one or three or five largest providers.
Jo Maitland: Google, AWS, right.
Joe Weinman: Not that they're not going to continue to do extremely well, and thank the engineering behind that for making them just such contributors to our industry. But the fact is that I think that there are roles to be played for the mid-sized providers.
Jo Maitland: Thanks, Joe, for being on the show. That was fun. This has been Cloud Cover TV. Tune in next week for more insider news on the cloud computing market.