Most engineers—especially network engineers—are happy to leave user privacy to security and privacy professionals. At first glance, it does not seem as though network engineers should worry too much about users’ privacy.
The larger information technology context does not help this situation. Privacy is a distant second compared to solving problems allowing the organization to operate more efficiently or profitably. Common privacy tropes include
• If you have done nothing wrong, you have nothing to hide.
• If people want my (or my user’s) data, there is not much I can do about it.
• The privacy statement users agree to when they use our service protects us (as an organization).
• Large providers and vendors say they care about my (and their) users’ privacy; this is enough.
These lines of argument are no longer valid. If you touch or transport a user’s data, you are responsible, at least in part, for keeping their data private. Privacy drops to the lowest level of legal compliance rather than risk and harm reduction far too often.
This section considers various aspects of user privacy, where user privacy intersects with network engineering, and the tools network engineers can use to support privacy.
Personally Identifiable Information
One of the most critical concepts in privacy is personally identifiable information (PII ). PII is any piece of information, or collection of information, that can identify an individual user.
This definition might seem broad; maybe some examples will help, such as
• A government-issued personal identification number, such as the Social Security number in the United States or a driver’s license number
• Location history gathered from a cellular telephone, or any other tracking device or service
• Financial account numbers, such as those issued by banks and investment firms
• Medical history
These examples are, perhaps, obvious. You should be cautious with sharp or simple definitions of PII. Many not-so-obvious combinations of data can still identify an individual user.
Network operators also collect PII. For instance, the IP address assigned to a user’s host or mobile device can be used to identify an individual user. Randomized IP addresses, regular renumbering, and Network Address Translation (NAT) might make connecting an IP address with an individual user more difficult. Still, individual users can be identified using the logs kept by most network operators, given a time and IP address.
Many law enforcement agencies require operators to maintain logs that can be used to correlate an IP address to an individual user.
The history of locations from which a user has accessed a network is often also considered PII. People are generally creatures of habit, accessing the network from just a few places, all likely close to their homes.
The domain names, websites, and services users access across time can often be used to identify an individual user. Even if these sites or services do not include financial or medical institutions, people tend to share a lot of personal information with online services and interact with online services in easily identifiable patterns.
It is always better to be a little cautious about data about users.
Data Lifecycle
Many of the privacy protection tools available to network engineers will make sense only within the context of a data lifecycle. Figure 18-3 illustrates a data lifecycle from a privacy perspective.
Figure 18-3 Data Lifecycle
There are many different paths through the five data lifecycle steps shown in Figure 18-3. For instance:
• Data might be collected, processed, used, and destroyed without being stored.
• Data might be collected, validated, and stored for later analysis and use.
• Existing data might be analyzed, generating new data to retain.
Regardless of the path, each step has privacy risks and tools.
Collection requires consent and notice. Users should know what data will be used for, how easy it will be to identify them from the data, which it may be shared with, and when it will be destroyed.
Consent is problematic in information technology. Notices about how data is used, stored, and destroyed must contain much information.
Processing can mean validating data and searching for patterns in data through analysis. Processing can also mean removing, adding, or altering data to make identifying individuals more challenging (or impossible), called deidentification.
Disclosing data means using the data in some way, such as modifying a web page shown to a user. A data breach is, essentially, unauthorized consumption or use of data. Breaches are normally associated with external people or organizations, but insiders can also cause a data breach.
Primary data use is why it was collected or how the operator told the user they would use the data. Any other use of data is secondary. Secondary data use can be difficult to justify from a privacy perspective. Retaining data is storing it, and destroying data is erasing it in a non-recoverable way.