Is the cloud a good spot for big data?
That’s a controversial question, and the answer changes depending on who you ask.
Last week I attended the HP Big Data Conference in Boston and both an HP customer and an executive told me that big data isn’t a good fit for the public cloud.
CB Bohn is a senior database engineer at Etsy, and a user of HP’s Vertica database. The online marketplace uses the public cloud for some workloads, but its primary functions are run out of a co-location center, Bohn said. It doesn’t make sense for the company to lift and shift its Postgres, Vertica SQL and Hadoop workloads into the public cloud, he said. It would be a massive undertaking for the company to port all the data associated with those programs into the cloud. Then, once its transferred to the cloud, the company would have to pay ongoing costs to store it there. Meanwhile, the company has a co-lo facility already set up and expertise in house to manage the infrastructure required to run those programs. The cloud just isn’t a good fit for Etsy’s big data, Bohn says.
Chris Selland, VP of Business Development at HP’s Big Data software division, says most of the company’s customers aren’t using the cloud in a substantial way with big data. Perhaps that’s because HP’s big data cloud, named Helion, isn’t quite as mature as say Amazon Web Services or Microsoft Azure. But still, Selland said there are both technical challenges (like data portability, and data latency) along with non-technical reasons, such as company executives being more comfortable with the data not being the cloud.
Bohn isn’t totally against the cloud though. For quick, large processing jobs the cloud is great. “Spikey” workloads that need fast access to large amounts of compute resources are ideal for the cloud. But, if an organization has a constant need for compute and storage resources, it can be more efficient to buy commodity hardware and run it yourself.
Public cloud vendors like Amazon Web Services make the opposite argument. I asked Amazon.com CTO Werner Vogels about private clouds recently and he argued that businesses should not waste time on building out data center infrastructure when AWS can supply it to them. Bohn argues that it’s cheaper to just buy the equipment than to rent it over the long-term.
As the public cloud has matured, it’s clear there’s still a debate about what workloads the cloud is good for and which it’s not.
The real answer to this question is that it depends on the business. For startup companies who were born in the cloud and have all their data in the cloud, it will make sense to do your data processing in the cloud. For companies that have big data center footprints, or co-location infrastructure set up, then there may not be a reason to lift and shift to the cloud. Each business will have its own specific use cases, some of which may be good for the cloud, and others which may not be.
Best Microsoft MCTS Certification, Microsoft MCITP Training at certkingdom.com