A long time ago, in a network, far far away, a great adventure took place!
Out of the chaos of new ideas for communication, the experiments, the tentative designs, and crucible of testing, there emerged a cornucopia of networks. Beginning with the ARPANET, an endless stream of networks evolved, and ultimately were interlinked to become the Internet.”
RFC 2468, Part 1
Before the internet, before domain names, and before I was born, there was the ARPANET — the little network that could. What started as a network that bridged four research centers in the western United States became the foundations of the internet we know today.
The ARPANET connected computers called hosts together using giant, wired routers called Interface Message Processors (IMPs). These routers served as the nodes of the network, and allowed the research centers to communicate with one another.
To send a message to another host on the network you needed to know its address, the unique number of the host and IMP. An address was like a telephone number: meaningless and impossible to remember. So researchers solved the problem the same way our phones do — they created an address book. This address book was called a hosts file.
By 1973, the ARPANET had grown tenfold, spanning the United States and adding IMPs across the Atlantic in Norway and London. Host files were becoming unweildly. Programmer and composer L. Peter Deutsch wrote,
“It seems about time to put an end to the absurd situation where each site on the network must maintain a different, generally out-of-date, host list for the use of its own operating system or user programs.”
Instead, ARPANET researchers decided to give each host an official name and distribute one master hosts file,
HOSTS.TXT, maintained by the researchers at the Network Information Center (NIC). This change was critical because it changed the purpose of the hosts file. Instead of acting like your cell phone’s list of contacts, it acted as a public phone book.
As the network grew, even
HOSTS.TXT became unwieldy. In the mid-1980s, researchers created a new, decentralized system for managing and organizing host names at scale. They called this the Domain Name System (DNS), and it introduced two key concepts: hierarchy and resolution.
The first major concept of DNS introduced was a new, hierarchical style of domain names. Instead of navigating to an address like NYT-COOKING, you’d navigate to cooking.nytimes.com. This would allow related websites to be grouped together. In our New York Times example, their main site is www.nytimes.com, their cooking app is cooking.nytimes.com, and their Medium publication is open.nytimes.com. With the DNS hierarchy, it’s clear that all of these websites are related.
But what about the “.com” part? That’s a top-level domain (TLD). DNS was created with only a handful of top-level domains, served to classify sites by purpose. Names like .gov, .edu, and .org are fairly straightforward. And .com, the most ubiquitous TLD of them all? .com stands for “commercial.” The process to create a new TLD was exceptionally strict, and would remain that way for almost thirty years.
Creating TLDs also served another purpose — they facilitated the migration from ARPANET addresses to proper domain names. Every host name on the ARPANET was given a temporary domain name with the .arpa TLD. Domains like NYT-COOKING automatically became nyt-cooking.arpa when accessed using DNS, so the entirety of the ARPANET was still accessible when DNS was first implemented.
The second major concept of DNS introduced was domain name resolution. Instead of relying on a hosts file on your own computer, your computer would ask a domain name server to resolve a name into an address on the network. That way, the address was (almost) always up to date.
But the hosts file persisted. It remained your own personal list of addresses, used to augment the DNS results. Programmers could add an entry to the hosts file if they wanted to override the domain name server, which became especially useful for testing network-related code.
The introduction of domain name servers also introduced a problem. What if you were writing code (or later, building a website) that wasn’t connected to the network yet? How could you test it? Programmers introduced the concept of a loopback address to solve this problem. With a loopback, a host could send traffic to an address and receive that same traffic as if it had come from another machine. This allowed them to test their code without connecting to any name servers.
When IPs were introduced, the loopback address was named localhost and added to every computer’s host file. It remains the de facto address for testing websites to this day.