Data leakage prevention: From target to integration

The specifics of Data Loss Prevention projects are such that besides improving company’s data system infrastructure and choosing the technical solution it is necessary to define the project’s goal and the required resources. The client company should be involved in some reorganization of the data flow system.

The protected objects
The protected objects include commercial and government secrets, personal and banking data, or other data protected by compliance. Preparing the list is the first necessary step. DLP systems can protect: a document, an important quote from the document, a form, or a linguistic document sample. If you need protection of only one type, you will have to convert your “List” into this type. We will consider frequently used objects.

The document
Protection of a static document is proceeding through the protected format and statistical methods.

Protected format: a register of all protected documents is created, they get a “protected” label and a protected format (encoded), so that the infringer could not move the data from protected document to unprotected.
Statistical methods: a “fingerprint” (a checksum) is taken from the document, which is then used to identify the similarities between the protected documents and their copies.
All documents under protection are moved into one folder and the system marks them as confidential. If the document is not in this folder, DLP system will not get an order to protect it. Someone should be in charge of deleting documents and samples from the confidentiality database when they become not confidential.

An important quote
Important quotes from the document should also be protected. It is a part of the document that contains confidential information. Some DLP systems do not separate important from unimportant quotes and consider all parts of the document to be confidential. Then the system will give too many false alarms.

The simplest thing is to forbid quoting documents. This is either achieved by creating a format that does not allow to copy text from it, or special agents on the workstations which prevent copying parts of the document or add the same confidential attribute to the documents with quotes from the original that the original has.

Statistical solutions are not attached to the document, but control the traffic of the quotes from the document out of the data system, taking fingerprints of the outgoing documents and comparing them to those of the original. This is achieved by taking fingerprints of the document and its parts, but increases the size of each fingerprint severely (10% of the original.) Still statistical solutions are not requiring installation of various agents or the use of protected formats.

Form or template
Often the protection is focused on certain data, rather than documents: credit card data or personal data (phone numbers, addresses, etc.). In this case, the aforementioned technologies are useless, since we can’t know all the data beforehand. Thus current DLP systems use technologies that define sensitive data based on its structure, or template, and the system should be properly setup to respond to the needed templates.

Content protection
Linguistic text analysis modules require minimum intrusion into the data storage system and methods of accessing it. Each outgoing document is given a set of text indexes and is given to a few categories. If the text contains confidential data, the system will understand it, regardless of whether the document was in the protected register.

The minus is: linguistic database is usually done manually, with help of a professional linguist. The advantage is the ability of creating the database starting with a few keywords and working from false alarms and the system’s ability to self-improve. Only linguistic solutions allow the company to start with nothing more than a “List of confidential data”.

Before beginning a DLP project you should estimate your resources and docs needed as a sample and define who will add new documents to the protected group and delete old ones and how will he do it. After that you can select the appropriate protection technology and the DLP system, choosing the proper technology, otherwise it won’t satisfy the client because he will not be able to use it the way it was designed to be used.

Don't miss