An Introduction to DNS

The Domain Name System (DNS) is an application-layer protocol that translates domain names to IP addresses. It’s an extremely important protocol for regular network operations, so much so that we IT folk have a common saying whenever something goes wrong: “It’s always DNS.” It’s often described as the “phonebook” of the Internet and like a phonebook, it’s important from a communications standpoint. From a security perspective, it’s an especially interesting target to attack as mitigations against attacks on DNS are still widely unadopted.

Domains

Domains are human-readable identifiers for locations, such as discord.com, shawnd.xyz, or mail.protonmail.com. These domains consist of “labels” such as discord, com, shawnd, and mail, which are separated by dots.

DNS uses a hierarchical naming structure. Commonly omitted from the domain is the root ., which denotes the very top of the hierarchy. A domain name that has the root and all labels associated with a precise location is referred to as a fully qualified domain name (FQDN). Thus, the FQDNs of Discord and my site are actually discord.com. and shawnd.xyz., respectively. These describe the exact endpoints on the Internet that a user may wish to communicate with.

However, routing requires IP addressing; domains alone are insufficient. A translation mechanism between domains and IP addresses could bridge the gap and allow humans to describe endpoints using human-readable domain names while computers could use computer-readable IP addresses.

hosts.txt and the Scalability Problem

Before DNS, hostname resolution used to be handled by a central machine on the network that had a hosts.txt file. This was analogous to the “contacts” list on your phone: you manually had to define the individual mappings between domains and IP addresses, much like how your “contacts” list requires you manually define the individual mappings between names and phone numbers.

This presented a few problems which would become more apparent with the growth of the Internet. Perhaps the greatest problem was scalability: every time someone new joined the Internet desiring a domain, all central machines had to manually update their hosts.txt file. Coordination was messy, and inconsistencies were rampant. While this may have been sufficient in the days of the ARPANET, a more scalable solution was needed in order to prevent hostname resolution from bottlenecking and impeding the growth of the Internet. Thus, DNS was born.

The Hierarchy

At the very top of the hierarchy is the root .* from which everything else on the hierarchy originates from. The next level in the hierarchy consists of top-level domains (TLDs) such as com, net, and org. At the next level are domains such as google, reddit, and wikipedia, and the level below them being more specific subdomains such as www, mail, and en. At each level, they are referred to as “subdomains” of the level above. For example, com, net, and org are all subdomains of .; google is a subdomain of com; and www is a subdomain of google.

The hierarchy tree can descend 127 levels deep, although this limit is rarely ever approached in practice. The tree is divided into zones, each consisting of one or more domains and subdomains. It is due to this zoning mechanism that the burden of administrative responsibility is divided in the total hierarchy of the Internet in what is a “divide and conquer” approach. The root need not know of all domains at all levels on the Internet to the maximum theoretical depth, but instead only needs to be concerned with TLDs; TLDs are authoritative in their own zones and need not be concerned with the zones of other TLDs; subdomains of TLDs are authoritative in their own zones and need not be concerned with the zones of other subdomains of their TLDs, nor subdomains of other TLDs; and so on and so forth. The authority of these zones are delegated to specific name servers within the zones.

* To be precise, the root does not actually have the label .; the root has a zero-length label. Recall that labels are separated by dots.

DNS Name Servers

In each zone resides one or more DNS name servers. These name servers contain resource records that describe the domain, and may include multiple types of entries:

Record Type Description
A A mapping between a domain name and an IPv4 address.
AAAA A mapping between a domain name and an IPv6 address.
CNAME An alias for a domain name, also known as a canonical name.
MX A mail exchange record denoting the mail server.
PTR A pointer to a domain used for a reverse DNS lookup (IP to domain).

There are many more types of DNS records, but these are just the most common ones. We’ll primarily be focusing on A and AAAA records as they provide the core functionality of DNS name resolution.

An authoritative name server is a special type of DNS name server that holds the original records of a zone. By contrast, a non-authoritative name server does not hold the original records of a zone and may just perform normal lookups and cache responses from other name servers. DNS responses may be marked as originating from an authoritative or non-authoritative source.

DNS Name Resolution

The process of a typical DNS name resolution process can be divided into two parts: between the client and the local name server, and between the local name server and other name servers. The client will ask the local name server using a recursive query, while the local name server will ask other name servers in a series of iterative queries. There are a couple of reasons why we do this:

  1. A recursive query allows the local name server to take advantage of caching.
  2. A series of iterative queries is fast and has a low memory footprint.

If a desired domain is not in the cache of the local name server, or if the cached data has expired, then the local name server will ask the root name server. The local domain server will not get the address of the desired domain, but will instead get the address of the next level in the hierarchy (TLDs) to send the query to. Through a series of iterative queries, the local name server gets closer and closer to the desired domain until finally reaching a name server that holds that information and can provide a precise IP address, finally concluding the series of iterative queries and allowing the local name server to respond to the client querier.

The resolution of catcourses.ucmerced.edu. may look like so:

Next: Implementing DNS in C

Now that we’re acquainted with the basic premise of DNS, let’s explore it further and more in-depth by breaking down the DNS specification and implementing it in C!

There will be a link here when the next article in this series is released.

Happy hacking!