Companies tend to underestimate the importance of Big Data security. What advice would you give to organizations about to welcome Big Data into their cloud storage environments?
Good security practices apply regardless of data volume, velocity and variety. The most important consideration is to weigh up the pros of retaining the data, with the cons in terms of the financial and reputational damage your business could suffer in the event of any breach. In the past, some cloud providers havenít fully appreciated that a low probability or low impact security breach can still lead to a disastrous outcome. In short, do you really need to keep the data?
In some cases, keeping data may not only be useful in generating business value, it may also be mandatory from a regulatory standpoint. For example, SEC17a-4 demands that certain financial records should be retained for up to six years. Other regulations such as PCI-DSS and HIPAA have stringent rules around how personal, sensitive information should be stored and accessed. So itís important to consider what regulations apply to your data and how your approach to security meets the compliance requirements.
In summary, the advice I would give is to first, establish what data you need to store and why you need to store it. Second, determine what compliance regulations is the data subject to, and how procedures and systems need to change to meet the requirements. I canít recommend highly enough calling in third-party experts, who understand the security and governance regulations in your domain, to review the approach taken. When retaining some types of data, a third-party assessment is mandatory.
What are the main challenges of handling petabyte-scale volumes of data?
Big Data presents some unique challenges when it comes to ensuring that the data supply chain remains secure from producer to consumer. The security framework must be able to operate effectively when faced with the three Vs of Big Data: volume, velocity and variety. For Big Data subject to the sorts of regulations discussed earlier, the security challenges can be daunting. The problem can boil down to securing access to a single record buried amongst trillions and trillions of others.
Unfortunately, many of the tools and technologies for managing data at scale do not provide the fine-grained security needed to protect sensitive data. Part of the problem is that Big Data platforms, such as Hadoop, have treated enterprise security as an afterthought. Hadoop grew out of web businesses for which perimeter security and simple file-based access permissions were good enough. In the face of tough data retention regulations that can apply at the individual record level, these approaches are insufficient. Right now, core Hadoop alone does not provide a full security and governance solution, and those wishing to secure large volumes of data on Hadoop must look to third parties to achieve this.