ISACA Journal
Volume 1, 2,014 

Features 

Meeting Security and Compliance Requirements Efficiently With Tokenization 

Stefan Beissel, Ph.D., CISA, CISSP 

The processing of sensitive data requires compliance to standards and laws that include high demands on data security. Companies that process sensitive data do not always need the specific data content in every processing step. Sometimes only the unique identification of the data is required. Tokenization replaces sensitive data with unique strings that cannot be converted back to the original data by an algorithm. Systems that use these strings do not need to handle sensitive data anymore. Therefore, the scope of systems that must meet compliance and audit requirements can be reduced via tokenization.

Basics of Tokenization

Tokenization strings are surrogates used to uniquely identify a piece of data and contain no information beyond the token. The fact that sensitive data are replaced with tokens reduces the number of systems that work with sensitive data and, therefore, the risk of compromise. Systems that only process tokens are not required to meet as high security requirements as those that process sensitive data. As a result, the scope of systems that must be compliant to standards and laws is reduced.

Examples for compliance requirements are the Payment Card Industry Data Security Standard (PCI DSS) and the numerous national laws for the protection of personal data, e.g., the Federal Data Protection Act in Germany (BDSG). PCI DSS was released by the PCI Security Standards Council (PCI SSC), a panel of five credit card companies. PCI DSS is a standard that aims to improve the security of cardholder data and includes requirements for data security and related audit methods. PCI DSS is required when cardholder data or authentication data are stored, processed or transmitted. In particular, the primary account number (PAN) is the defining factor in the applicability of PCI DSS requirements. Tokenization can be used to replace PANs and, thus, restrict the applicability of PCI DSS.

To prevent the compromise of systems that contain personal data, all personal data can be replaced by tokens. This approach is ideal for all data processing operations that deal with ambiguous information and less with the actual content of data, e.g., data mining.

Generating Tokens

Before a token is generated, a fundamental decision has to be made about whether the token will be used once or several times. If a token will be used once, a single token is created for each data value, for example, by a sequential number. If a token will be used several times, the same token is created for the same data value. In the latter case, the same token occurs several times in the processing systems and allows cumulative evaluations. The frequency of use must be considered in the generation technique. While encryption and hashing automatically create the same token for the same data value, token generation with numbers needs an additional mechanism that checks whether a token has already been created for the same data value and provides the token for reuse.

Encryption techniques are used to change data by algorithms to a form called ciphertext, so that the data have no similarity to their original form of representation, called plaintext, but can be converted back to their original state by a key.1, 2 Because tokens generated with ciphertext can be converted to their original state, encryption techniques are less suitable to generate tokens. According to the PCI SSC,3 encryption techniques are a way to generate tokens, but this does not mean that sensitive data are completely protected against decoding to cleartext. Therefore, encrypted data should not be processed in uncertain environments and should not be taken out of the PCI DSS environment that includes all protected systems.

Hashing is a technique originally used for ensuring the integrity of data. When data are transmitted, hashing can ensure that the data have not been tampered with or corrupted during transmission.4, 5 Using hashing with a data packet creates a digital fingerprint (hash value or message digest) that is as unique as possible. Therefore, hash values can be used as tokens. Depending on the algorithm used, the risk of collisions is present6 and the uniqueness of the token is no longer ensured. The best known hashing algorithms are MD5 and SHA-1. MD5 was developed by Ron Rivest in 1991 and uses a hash value with a size of 128 bits. MD5 is now generally considered insecure as a result of collisions. SHA-1 was released by the US National Institute of Standards and Technology (NIST) in 1994 and is a revision of SHA. SHA-1 hash values have a size of 160 bits. Extensions with larger hashes are SHA-2, released in 2001, and SHA-3, released in 2012.

Other techniques for the generation of tokens are the use of a serial number or a random number that is generated using a pseudo random number generator.7 In principle, any string may be used as a token as long as it creates a unique identification, allows almost no collisions and cannot be converted by an algorithm to its original state.

Tokens can be generated not only for individual data, but also for data sets that consist of a combination of two or more data values. Prior to the generation of the token, a data value may be further attached to the primary data value like a salt. A salt is a string that is appended to an existing string before encryption or hashing.8

Assignment of Tokens

Since tokens are unique, each token can be associated with its original data. This mapping is performed by a tokenization system. Since the mapping is not possible with only the use of mathematical algorithms, the tokenization system must maintain mapping data. Where sensitive data must be present only in certain process steps and not continuously, tokens can be used partially, and, when necessary, the tokens are allocated to the original data values by the tokenization system.

Figure 1The tokenization system must be set up in a secure network environment (see figure 1). When in use, all systems that contain tokens, but no sensitive data, can be removed from the secure network environment. The secure network environment contains only systems with increased security requirements, for example, specified by PCI DSS.

Therefore, tokenization systems must be well protected. They include mapping data that allow the assignment of a token to the original data and their compromise can affect all token processing systems. In addition to the tokenization requirements of the PCI SSC (listed in the Regulation and Sampling section of this article), strong cryptography is needed for the encryption of sensitive data.9 Examples of acceptable encryption algorithms are AES (128 bits and higher), TDES (minimum double-length keys), RSA (1024 bits and higher), ECC (160 bits and higher) and ElGamal (1024 bits and higher).

Auditing a Tokenization System

To assure that a tokenization system complies with IT security requirements, audits should be conducted. The main protection objectives of IT security are confidentiality, integrity and availability. Regulation and cost-effectiveness should also be taken into account when defining audit objectives.

Confidentiality
Maintaining confidentiality requires that data cannot be viewed by unauthorized persons and thus cannot be compromised. Physical and logical access controls prevent unauthorized penetration into the area where the hardware of the tokenization system can be found and into virtual spaces where tokens are assigned to sensitive data. A segmentation of the network can be used to control and limit access from insecure network segments to the secure network segment in which the tokenization system is located. This can be achieved, for example, by using a firewall that filters network traffic. Furthermore, routers with access control are also suitable by generating a virtual local area network (VLAN). Encryption of data contained in the tokenization system prevents captured data from being read by, for example, the recording of network traffic or the theft of a hard drive from the tokenization system. Basically, the data packets can be encrypted individually by, for example, Pretty Good Privacy (PGP) encryption of files or the data transfer can be encrypted completely by using an encrypted communication channel with, for example, Secure Shell (SSH), a virtual private network (VPN) or Secure Sockets Layer/Transport Layer Security (SSL/TLS). A hard-disk encryption can be implemented with software, which can be operating system (OS) vendor software, such as Bitlocker, or third-party software, such as Truecrypt, and with hardware containing encryption modules. Secure deletion ensures that deleted files cannot be recovered by unauthorized persons. In addition to rendering the physical media useless by destroying or degaussing, there are software solutions that offer repeated overwriting of the data.

By monitoring the logs at the tokenization system, irregularities in system behavior can be detected. Such a detection indicates attacks or technical malfunctions (log management). Recognized irregularities can be reported through alerts to system administrators who initiate measures. Automation is possible through the use of intrusion detection systems (IDS) (for automatic monitoring and alerting) and intrusion prevention systems (IPS) (for response to identified attacks, e.g., by a dynamic adjustment of access rights). Antivirus software prevents malicious software from starting and changing files or tapping data. Malicious software includes viruses that are active when executed by the user, worms that spread independently by exploiting vulnerabilities and Trojans that are disguised as harmless programs. The components of the tokenization system must be protected against software vulnerabilities. To protect them against vulnerabilities, the system must be hardened. Standard parameters are adjusted, and all features and services that are not required are uninstalled or disabled to offer no unnecessary points of attack. Security updates from software vendors must be installed periodically (patch management). In addition, vulnerability scans provide information on existing vulnerabilities of the system. Increased security awareness among users reduces the risk of users being victims of social engineering. In addition, security awareness reduces the risk of careless users storing sensitive data outside the secure environment. The measures that are used to protect confidentiality also serve to protect integrity. If data are compromised by an attacker or malicious software, they can often be damaged or tampered with as well.

Integrity
Integrity means that data are not tampered with or damaged by unauthorized persons. To ensure the integrity of data within the tokenization system, three principles should be applied. The need-to-know principle states that users should have only as much permission on the tokenization system as they absolutely need to carry out their duties, to prevent unauthorized manipulations beyond their tasks. The separation-of-duties principle states that one person should not be responsible for all aspects of a business process, to ensure that unauthorized manipulations can be noticed by colleagues. The rotation-of-duties principle states that responsibilities are exchanged regularly between users, to ensure that a user can be replaced and unauthorized manipulations of colleagues can be noticed.

Internal company policies and work instructions should be used to implement these principles.

Availability
Availability means that users or systems that are authorized to access data can access these data at any required time. The availability of a tokenization system can be guaranteed by hardware and infrastructure that are ready for use and have sufficient capacity to process all requests as quickly as necessary. Attackers can compromise the availability by flooding the tokenization system with requests and, thus, cause a denial of service. Protection against an attacker can be achieved with a web application firewall, which is designed specifically to protect web applications. Capacity planning can prevent a strong utilization of the systems due to personal growth, for example. Capacities are also at risk due to external influences, such as environmental disasters. Business continuity management is necessary to guarantee the operation of the tokenization processes in case of disturbances. A part of business continuity management is disaster recovery, which ensures the quickest possible restoration of the tokenization system after a total system failure.

Regulation and Sampling
Tokenization systems can be used in various fields, such as health care and finance, for the implementation of data privacy requirements to ensure PCI DSS compliance. So far, only the PCI SSC has published special security requirements for tokenization systems.10 In addition, general security requirements of the PCI DSS are valid.11 These requirements relate primarily to confidentiality. The protection of integrity and availability is the responsibility of the company after evaluating the cost/benefit aspects.

The control measures that result from the security requirements (see figure 2) can be verified by using sampling techniques. There is a basic distinction between statistical and nonstatistical sampling methods.12 For the verification of one control measure, different sampling techniques are usable in most cases.

Figure 2

The selection of sampling techniques should be based on current risk assessments. When considering access controls, discovery sampling (statistical) can be used, in which case samples are taken until a user account with too powerful permissions has been discovered. With compliance sampling (statistical), a sufficient password complexity of user accounts is verified. And with judgmental sampling (not statistically), risky user accounts such as unnecessary administrator accounts can be determined manually.

Cost-effectiveness
For credit card processing companies, it is necessary to set up a PCI-DSS-compliant environment because the failure of passing the annual PCI audit would result in significant revenue losses. In addition, loss of reputation and possible fines by credit card companies can be expected. However, there is design freedom in the determination of control measures for the PCI-DSS-compliant environment. Companies can decide between different technologies or products such as PGP, SSH, VPN or SSL/TLS for encryption of external data transfer. Other companies that are bound only in relation to data privacy and want to define their own level of security have even greater design freedom.

To assess the performance of a tokenization system, the investment costs and running costs must be compared to the potential savings. The capital costs of a tokenization system include the costs of new hardware and software, installation and network segmentation using routers or firewalls. In addition, organizational activities, such as the creation of work instructions and guidelines, have to be considered. The running costs include maintenance of the system and administration. Potential savings result from the fact that the scope of the secure network environment can be more limited. Audits and reviews can be more focused on the secure network environment and, therefore, can be performed more efficiently. The administration effort in the less-secure network environment is reduced because fewer requirements must be implemented, for example, on the topics of hardening, encryption and logging.

If the implementation of tokenization is desired, its sustainability should also be taken into consideration. If planned business changes can influence the processed data, the tokenization system should be designed to be scalable. This could be the case if, for example, outsourcing of the data processing is planned and, therefore, no more tokenized data are processed internally. In addition, technological developments can result in insecure cryptographic algorithms due to higher available computing power. Cryptographic algorithms need to be regularly evaluated and replaced, as necessary.

Use Case E-Commerce

An exemplary use case for a tokenization system is the integration of an e-commerce merchant, who accepts credit card payments through a web store. The flow of a transaction in e-commerce begins with the customer who makes a purchase at the online store and pays with his/her credit card. After the customer has communicated his/her card information to the merchant, the transaction is routed through the processor to the card organization, which performs an authorization request to the card-issuing institute. If the following authorization response is positive, the payment is approved. The merchant then receives a confirmation and the payment amount is charged to the customer. Then the merchant ships the purchased goods or provides the desired service. The settlement of the payment is made by the card-issuing institution, which charges the payment amount to the end user and credits it to the card organization. The card organization forwards the credit to the processor, who transfers the accumulated credits in contractually agreed payment cycles to the merchant.

Figure 3The storage, processing or transmission of PANs by the merchant require the application of PCI DSS. It is most advantageous for the merchant organization to keep payment data outside of its network by using tokenization without having to change any technical processes.13 In a token-based method, the merchant must ensure that the web session is redirected to the systems of the processor, e.g., by using a plug-in, before the payment information is entered by the customer. The customer enters his/her PAN and, thus, sends it directly to the processor, which operates a tokenization system. The processor assigns the PAN in its tokenization system to a multiusable token and sends the token to the merchant (figure 3).

Specifications for the composition of a PAN are given in ISO 7813. According to these specifications, a PAN consists of a six-digit issuer identification number (IIN), a variable account number with at most 12 individual digits and a check digit, which is generated by the Luhn algorithm. For example, a PAN “4000300020001000” is converted by the SHA-1 hashing algorithm to the token “c4caec101d38c68005fa56806153bcbcb70586c0.” The technical processes of the merchant do not have to be changed if length and format of the token do not infringe on any technical restrictions (i.e., specified data types in databases). Within the infrastructure of the merchant, the token can then be treated in the same way as the PAN. The merchant can determine if the same PAN is used again for a purchase based on the uniqueness of the token without knowing the actual PAN. Subsequent transactions by existing consumers can be handled without storing the PAN in the network of the merchant. In addition, consumers who often cause chargebacks can be identified by the token before completing the transaction. Chargebacks are reversals that are mandatory by law in case of invalid authorizations (§ 675j BGB and § 675p); however, they are also performed as an optional service provided by the card organizations if requested by the cardholder.

Conclusion

The scope of systems that handle sensitive data and, therefore, must meet compliance and audit requirements can be reduced by using tokenization. Tokenization facilitates a more restrictive handling of sensitive data without adjusting business processes. Therefore, tokenization offers potential savings. When implementing a tokenization system, security provisions and cost-effectiveness should be taken into consideration.

Endnotes

1 Buchmann, J.; Einführung in die Kryptographie, 5th Edition, Germany, 2010
2 Schmeh, K.; Kryptografie: Verfahren, Protokolle, Infrastrukturen, 4th Edition, Germany, 2009
3 PCI Security Standards Council, PCI DSS Tokenization Guidelines, 2011, www.pcisecuritystandards.org/documents/Tokenization_Guidelines_Info_Supplement.pdf
4 Op cit, Buchmann
5 Op cit, Schmeh
6 Op cit, Buchmann
7 Stapleton, J.; R. S. Poore; “Tokenization and Other Methods of Security for Cardholder Data,” Information Security Journal: A Global Perspective, vol. 20, iss. 2, 2011, p. 91-99
8 Ertel, W.; Angewandte Kryptographie, 3rd Edition, Germany, 2007
9 PCI Security Standards Council, Glossary of Terms, Abbreviations, and Acronyms, Version 1.2, 2008, www.pcisecuritystandards.org/pdfs/pci_dss_glossary.pdf
10 Op cit, PCI Security Standards Council, 2011
11 PCI Security Standards Council, Payment Card Industry (PCI) Data Security Standard—Requirements and Security Assessment Procedures, Version 2.0, 2010, www.pcisecuritystandards.org/documents/pci_dss_v2.pdf
12 ISACA, IT Standards, Guidelines, and Tools and Techniques for Audit and Assurance and Control Professionals, 2013, www.isaca.org/Knowledge-Center/Standards/Documents/ALL-IT-Standards-Guidelines-and-Tools.pdf
13 ISO & Agent, “HP Upgrades Tokenization of Payment Data,” vol. 8, iss. 11, 2012, p. 17

Stefan Beissel, Ph.D., CISA, CISSP, is an IT security officer responsible for the management of security-related projects and compliance with the Payment Card Industry Data Security Standard (PCI DSS) at EVO Payments International.

 

Add Comments

Recent Comments

Opinions expressed in the ISACA Journal represent the views of the authors and advertisers. They may differ from policies and official statements of ISACA and from opinions endorsed by authors’ employers or the editors of the Journal. The ISACA Journal does not attest to the originality of authors’ content.