The Illusion of Internet Privacy Part 1
Posted 5/17/2010 5:27:00 PM
The internet has become an integral part of many aspects of our lives, but most people have no idea (or desire to know) how it works, when it is secure, and when it is insecure. In this series of articles, I seek to address this issue by going over the many ways data can travel through the internet, the security of that data, and how to make sure that data is secure. I will try to keep everything as basic as possible, so do not assume that you will not understand this article if you only have a basic understanding of computers.
Let's visit a website Before we talk about security, let's talk about the simple act of visiting a website using your computer's internet browser. For this example, I am going to use my favorite site, Google. I type in "http://www.google.com" into my web browser, and in under a second my browser window looks like this:
This is behavior that we have all come to expect when we type an address into our browser. But what exactly happened? Let's start with the actual address that we entered. I have broken it down into parts:
Let's examine these parts one at a time:
So, those four parts combined make up the typical web request. Now let's go over what happens when you hit enter in your web browser's address bar.
- Protocol: this is the protocol specification. In this case it is "http://" or "Hyper-Text Transfer Protocol." There are many other protocols, but this one is mainly used for feeding a web page into your web browser. We will discuss other protocols later.
- Server: this is the actual server we are trying to communicate with, and in this case it's broken up by a period. This means we are communicating with the "www" server on the "google" domain. Without part 3, however, this information is almost useless.
- Top Level Domain: this is the overall suffix to a web request. There are many different top level domains, such as ".com" or ".net," which separate one server from another on the internet. The idea was to originally break the internet up by website type, with ".com" being commercial websites, ".org" being websites for organizations, and many others. We usually think about ".com" as that is the one that took off. When combined with the server, the Top Level Domain points to one specific "IP Address" on the internet. Think of an IP Address as being similar to a full street address. This is how we find people on the internet.
- Folder: although it's not shown above, this would be anything following a "/" after the full domain name. This is used to reference different "directories" or "folders" on the specific server, and is useful for visiting specific pages on the site.
Breakdown of a connection to a site Note: I am going to leave out a lot of details here that I do not feel are relevant to the article, such as how your web browser handles mistyped addresses, how the addresses are parsed and found, the actual anatomy of an HTTP "Get" request, etc. If you have questions, feel free to contact me. So now we have typed out a full web address and have hit the "enter" key on our keyboard. What next? The process as we are going to analyze it looks roughly like this (don't worry, we will go over each step in more detail throughout the series if need be):
Obviously this process has been simplified, but the really important parts that we will be focusing on are number 3 and number 5.
- The computer contacts one of your Domain Name Servers (DNS) to determine the IP Address of the server you are trying to connect to
- Once the IP address has been found, your computer sends its Web Request to the IP address
- Your request goes through the many switches, routers, and servers sitting between you and the server you are trying to contact
- The server gets your request, and sends back the page you want
- This page goes through the many switches, routers, and servers sitting between the server and your computer
- Your browser displays the web page
Where does all that data go? When data leaves your computer and is sent on its way to another computer (or server) on the internet, it passes through many points along the way. This is simply because we do not have a direct link (usually) to the sites we are trying to communicate with. Depending on your Internet Service Provider (ISP) and the location of the remote server, the number of devices your data may pass through will vary. It is quite simple to view the devices your request moves through using the "tracert" tool. "Tracert" is short for "Trace Route" and is found on most modern operating systems. It is usually run from the command line. I will explain how to use it in Windows, the instructions should be about the same for whatever operating system you are using:
Here are the results I got when doing a tracert to "google.com" from the University of Wisconsin Stevens Point (Note, I have redacted IP addresses and Host names related to the University):
- Open a command prompt window
- Type in "tracert" followed by a space and then the domain you are connecting to (do not include anything before the name, such as "http://" or "www.", just the domain name i.e. google.com
- Press Enter
- Your results will be displayed
Based on these results, we can see that when the request leaves my computer it travels through 13 devices before it finally reaches its destination of google.com. Let me say that again, 13 devices! Any of these devices have the ability to intercept my data when it travels through them, and if the data is not secured then it is ripe for the picking. Most of these "hops" as they're called are only routers leaving my ISP, and then the switches and routers of the ISP's in between Google and me, but they could also be servers, computers, or pretty much anything else that somehow ends up in the path. This is extremely important to understand, as for most web requests (ones that begin with "http://") your data is completely unencrypted and unsecured, and every single one of those "hops" has access to it. Here's a short list of important but (usually) unencrypted things that these "hops" can see:
The list can go on and on, but in the end, if you see "http://" at the beginning of your address bar, your data is most likely being sent unencrypted. Now, I don't want to go ahead and scare you away from the internet (yet). In almost every case, the hops in between you and your target couldn't care less about what you are transmitting through them, and most of your web activity is pretty innocuous. The point of this demonstration, however, is to show you how vulnerable we actually are on the internet without some sort of security. Conclusion In this article, I have discussed the anatomy of a web request and discussed the many places your data goes before it reaches the target. In the next article, we will examine the secure version of "http://" including how it works and why we need it. Please comment and let me know if you have any corrections or questions.
- Facebook Chat and Facebook pages (excluding the login page)
- AOL Instant Messenger and pretty much every messaging program
- Google Searches and their results
- Youtube Searches and results
- Depending on the provider, your online e-mail