Big Data Application and Privacy Protection, Can't Have Both?2021.02.25
Many countries have begun to implement the national big data strategy, which mainly includes vigorously
promoting the innovation and development of the big data technology industry, building a digital economy
with data as a key element, using big data to improve the modernization of national governance, using
big data to promote protection and improve people's livelihood, and practical protection national data
security, etc. Under the guidance of this grand strategy, many countries big data application
development has made certain achievements, but the resulting big data privacy protection issues are also
many, and it has become an important key indicator of the healthy and stable development of big data.
Big data application and privacy protection must have both in order to develop healthily. This article
intends to put forward some personal opinions on big data collection, use, and storage.
Let’s talk about the life cycle of data first. All data will go through five periods. From the beginning
of data generation to the identity the data, to data storage, and then to data use. Finally, it may be
that data archiving is no longer used or data is destroyed. Of course, PKI (Public Key Infrastructure)
technology is the only reliable technology to ensure the security of big data, which completely solves
the four headache data security issues (PAIN): (1) the confidentiality of data (Privacy); (2) the
identity authenticity of the data producer and consumer (Authentication); (3) data integrity
(Integrity); (4) non-repudiation of data production behavior and usage behavior (Non-repudiation).
The important application of PKI technology is the digital signature and encryption application of
digital certificates. As shown in the figure below, let's see how PKI technology protects big data
security.
-
Data Production
The data producer can be a person or an object (data collection terminal), and the producer produces
the data.
-
Data Identity
After the data is produced, the data producer’s identity certificate should be used to digitally
sign the data to prove the true identity of the data producer. Of course, the digital signature and
timestamp are used to prove the non-repudiation of the data production behavior and the trusted data
production time. At the same time, it can also prove the ownership of the data and the data
identity, so that each piece of data has a trusted identity, which is used for data copyright
protection, data traceability and legality validation before data processing, to prevent illegal
data from entering the database to pollute big data. This step is very important, with multiple
responsibilities to protect data from the source and ensure data quality. But this step of the
current operation is basically missing.
-
Data Storage
After the data is produced, it has an identity and needs to be submitted to the cloud server for
storage. The data submission must be encrypted and transmitted using https to prevent the data from
being illegally stolen and tampered with during transmission. This security measure is very
important, but the reality is that a large amount of data collection is submitted to the cloud
server in cleartext, which is in urgent need of improvement. If the data is confidential data, you
can also encrypt the data with the public key of the person who has the right to read the data
before submitting it to the cloud server, to ensure that only the person who has the right to read
the data can decrypt it, and to prevent the confidential data from being illegally used.
-
Data Usage:
Data consumer must of course use https to browse data or download data, otherwise there is no
guarantee that the data will be illegally stolen and tampered with during the data distribution
process. If the data is encrypted, the data consumer needs to decrypt it with the private key of the
encrypting certificate to read it normally. The most important thing is that not only the use of
data must be controlled by user’s right, but the user must use a digital signature to prove his
legal identity and attach a timestamp signature to prove the time of data use. The digital signature
and timestamp can not only confirm the user’s identity, but also can effectively guarantee the
non-repudiation and trusted time of the data use behavior for follow-up audit and traceability.
-
Data Archiving and Data Destruction
This step is the end of the data life cycle. You can use the archiving certificate digital signature
and timestamp to archive and solidify the data. The purpose is not only to ensure that the data
cannot be tampered with anymore, for future auditing and traceability if
required, and the timestamp signature can prove the trusted archiving time. If you decide to destroy
the data, it can be a physical deletion on the server. However, to destroy the data completely that
has been delivered to the user, the encryption certificate can be
revoked to destroy the data completely, to ensure that the data can no longer be used. This
destruction operation requires that the data must be encrypted with a certificate before being
delivered to the user.
As you can see from the above PKI data protection measures, if we use digital signature and encryption
technology throughout the data life cycle, we can guarantee the security of the data throughout the life
cycle. Big data applications and privacy protection are both available!
For everyone to understand the above concepts, let me illustrate with email data. Email is the second
largest data source on the Internet. The number of emails per day is as high as 269 billion. These data
contain a large amount of personal privacy information and business secrets, and even many highly
confidential information such as financial payment and credit card information, but they are all
transmitted in cleartext at present and stored in cleartext in the mail server, very insecure.
So, based on the five different periods of the big data life cycle mentioned above, let me talk about how
MeSign Technology applies PKI technology to protect email data security and protect email privacy.
-
Data Production:
The user completes the data generation after writing the email, and the email account owner is the
data producer.
-
Data Identity:
The user can use the digital signature function of MeSign App to add a digital signature to each
email to prove the identity of the email, prove the identity of the data, and ensure that the data
cannot be illegally tampered with. Once tampered, when the recipient receives the email, MeSign App
or Outlook will prompt that the digital signature is invalid, and the data has been tampered with.
At the same time, the MeSign email timestamp service attached automatically can also prove that the
production time of this data is trusted.
-
Data Storage:
MeSign App sent email has digital signature, encryption and timestamp by default, and the mail
server also has a TLS implementation like https encryption to ensure encrypted transmission of
emails to the mail server. The certificate encryption of the email itself ensures that the email is
stored in ciphertext in the mail server, which ensures the storage security of confidential
information, especially for users using cloud email services, which can ensure that the confidential
information of the email will not be stolen, illegally tampered and will not be leaked.
-
Data Usage:
After the recipient receives the email, MeSign App will automatically decrypt the encrypted email,
validate the digital signature, display the sender's identity, and effectively prevent email fraud.
Since the recipient also have a digital certificate, clarifies the real identity of the data
consumer, being able to decrypt this email proves that the data consumer's use behavior is
non-repudiation.
-
Data Archiving and Data Destruction:
If the email itself has been digitally signed and encrypted, it can be automatically archived
without signing it again. If you want to destroy the data, simply delete the email completely.
As you can be seen from the above example, MeSign App implements automatic email encryption and digital
signature and timestamping, which can effectively help users realize the secure use of email data and
privacy protection. I believe that no matter what data is, especially the data collected by data
collection terminals, all data should be protected by applying PKI technology to ensure the security of
its entire life cycle and identify the identity of the data from the source of the data, encrypted
transmission to the cloud, and encrypted storage in the cloud, and it should also be encrypted for data
exchange and data use. And take effective measures to truly destroy the data that should be destroyed
and use digital signature and timestamp to solidify the archived data. Only in this way can we ensure
that both big data applications and privacy protection have both.
END
Click
here to download this blog post (PDF format with digital signature and
timestamp)