Akamai Diversity

The Akamai Blog

Architectural paths for evolving the DNS

The Domain Name System (DNS) is one of the Internet's fundamental systems, providing the top-level hierarchy for naming Internet resources. One of its purposes is to act as a sort of phonebook, allowing names such as "www.example.com" to be resolved to resource information, such as server IP addresses. It provides the hierarchical naming model that enables clients to "resolve" or "lookup" resource records associated with names. This naming hierarchy also ties into other systems such as the Public Key Infrastructure (PKI) certificate trust system (where trusted Certificate Authorities use demonstrated control over names to issue certificates for those names) and into the hostnames used within many types of URLs. Because it is so fundamental, future architectural shifts in how DNS services are implemented have the potential to have wide-ranging implications.

In this post, I aim to describe some of the motivators pushing towards shifts in the architecture of DNS services, discuss some use-cases and constituencies that may be impacted, and talk some about potential paths and directions. One thing that becomes clear is that there may not be a one-size-fits-all solution, but rather a need for a family of solutions targeted at different use-cases.

DNS services have operated with a somewhat consistent deployment and usage model over the past few decades. This model has involved a "stub resolver" on clients (often with configuration details for the client received from the local network via a protocol such as DHCP) interacting with a "recursive caching resolver" which in-turn talks to authoritative DNS servers ("authorities") that provide responses to queries.

As many other widely used Internet protocols have been evolving to incorporate end-to-end encryption and authentication (to enhance privacy and integrity), most widely deployed DNS services still remain unauthenticated and unencrypted. This leaves DNS requests and responses (which happen as part of the resolution process prior to visiting most services, such as web URLs) unsecured and vulnerable to surveillance and modification from on-path network attackers such as those involved with malware and pervasive monitoring.

Some security impacts of this are mitigated to a degree by end-to-end security with other services (such as HTTPS, now used for close to 75% of page loads from some web browsers); however, a generally unsecured DNS has significant privacy ramifications and can be leveraged by malicious actors to various ends.

The Domain Name System Security Extensions (DNSSEC) is sometimes used to provide integrity between recursive resolvers and authorities, but that only solves one piece of the puzzle. In particular, DNSSEC only authenticates answers (and even then only for zones signed by authoritative servers), and it does not encrypt queries or responses or provide any confidentiality for users.


Related to the IETF's DNS Privacy (DPRIVE) and DNS-over-HTTPS (DoH) working groups, two new secure transports for DNS queries and responses have been specified: DNS-over-TLS (DoT) and the still-in-progress DNS-over-HTTPS (DoH). There is still significant debate and discussion on how DoH/DoT fit into the broader Internet architecture. As an example, there is not a well-defined path for clients to be configured with a DoH/DoT server in a secure manner. For prototype and research purposes, some clients have hard-coded specific secure DNS servers into clients (such as Mozilla's use of Cloudflare's DoH service in some nightly and experimental builds of Firefox). While interesting as a proof-of-concept and to gain experience, as a long-term approach this has the risk of consolidating key parts of the Internet to rely on a few services, which goes against the fundamentally decentralized architecture of the Internet. There are also concerns that this has the risk of shifting and centralizing both trust and attack surfaces in ways that could have unfortunate consequences.

Centralized DNS resolvers (such as some of the DoH servers deployed today) may provide increased privacy against a local network observer, but they can also result in significantly impaired Internet performance for users in some cases. For example, CDNs such as Akamai that map traffic based on DNS may lose the ability to direct end-user traffic to a nearby cluster in cases where a DNS service is being used that is not affiliated with the local network and which does not send "EDNS Client Subnet" (ECS) information to the CDN's DNS authorities. While Mozilla performed a study on how using DoH impacted DNS performance using a widely distributed DoH service (finding that most queries using a DoH cloud service were around 6 milliseconds slower, with some of the slowest transactions performing much better), this study only measured DNS performance and not the impact on overall site performance. A published study performed by Akamai showed that mapping traffic based on client location rather than DNS resolver location can substantially improve overall website and download performance, especially in the case of DNS resolvers not located near end-users.

Many large network operators (both ISPs and Enterprises) rely on DNS resolutions using their DNS resolvers as a cornerstone of network operations and service offerings (for internal-only resolutions for services like VOIP, DNS64, anti-malware protections, custom ECS setups for improved CDN load balancing, government-mandated content filtering, visibility, and more). Given how vital a robust DNS service is to the end-user experience, network operators view DNS as a critical service to own and manage so they can provide the performance, scale, and reliability that their customers demand. Some of these network operators perceive any significant shift by clients to use DNS resolvers other than those provided by the operator as a risk to their ability to provide good service to their customers.

One of my observations is that many of the recent discussions about improving DNS privacy and security have been focused primarily on Web use-cases involving individual end-users using mobile clients (web browsers and applications on mobile devices, tablets, and laptops). Mobile clients pose a number of important and compelling problems as these clients connect to a wide mixture of friendly and hostile networks, such as cellular networks but also WiFi within homes, coffeeshops, conferences, workplaces, and perhaps even malicious WiFi endpoints masquerading as one of these.

Cleartext HTTP has long provided an attack surface in hostile networks, both by allowing malware to inject and modify content, as well as by exposing private data to attackers. To mitigate this, many sites have moved to encrypted-and-authenticated HTTPS. However, while HTTPS provides end-to-end security when connecting to Web services, it still leaks information to the network, most prominently via DNS and TLS SNI, allowing on-network attackers to see which sites are being visited. Full-tunnel VPNs provide a solution in some cases, but don't scale well and just concentrate traffic at another point from which it can be attacked or surveilled. Individual mobile users can't be expected to understand or clearly express trust relationships with every network they might connect to.

From this angle, it's attractive to encrypt DNS lookups to an authenticated DNS resolver not associated with the local network. That still has some downsides, such as the CDN traffic localization issue mentioned above, although there are paths to address them (such as if the trusted DNS resolver in-turn had a way to pass along adequately pseudo-anonymized ECS information to the CDN DNS authority over an authenticated and encrypted TLS channel while also using approaches such as QNAME-minimization to avoid leaking the specific names being resolved to unaffiliated third-parties such as gTLDs). Using a trusted centralized DNS resolver also has the centralization downsides discussed above, so even here there may be a desire to find distributed solutions that still meet as many of the security goals as possible.

The broader issue is that while the "individual user with a mobile web client" use-case is a high-profile and important use-case, it is far from being the only use-case. When exploring many other common use-cases, various other constituencies and requirements also come into play. While improved privacy is one motivator for recent work, it is not the only objective in play.

Objectives and Requirements

There are a number of objectives and requirements in evolving how DNS services work, both in-terms of improvements to the status quo but also not breaking important existing use-cases. Some of these are complementary while others are in direct conflict, requiring trade-offs or varying architectural solutions within different environments.

Across these, an important objective will be to enable users (and client administrators for managed devices) to make informed choices. This is made more challenging due to the complexities here being overwhelming to users, making it important to enable choice between safe defaults.

  • Privacy/Confidentiality: There's a strong desire to improve end-user privacy and protect them against pervasive monitoring. In particular, this often comes down to making it harder for an observer (either on the network or operating a DNS recursive server) to use metadata from DNS requests to track the websites a user is visiting by extracting hostnames from the user's DNS queries, track a mobile user as they move around the network, or link a client IP address to an actual person. RFC 7626 (DNS Privacy Considerations) discusses this in great detail. Using DNS-over-HTTPS or DNS-over-TLS can hide DNS lookups from passive observers on the network. However, hiding DNS lookups from untrusted DNS resolvers and active adversaries relies on having a way to locate and authenticate a trusted DNS recursive server.
    • The privacy benefits of using DoH/DoT are only present when the DNS recursive server is itself trusted by the client from a privacy perspective. Switching from using an ISP's local in-country DNS resolver (with the ISP constrained by local privacy regulations) to a DNS resolver that is out-of-country (and operated by a third-party in a different national jurisdiction) could in some cases make privacy worse rather than better, regardless of what communications transport is being used for the DNS. Consolidating DNS lookups to a few services also introduces new risks for enabling the correlation of user activity, and these services potentially become highly attractive targets for subpoenas and extra-legal attacks. This is all made substantially more challenging as many users lack a way to judge the level of trust they have with various DNS service providers, making it hard for them to make to make an informed choice.
    • As DNS is an important part of application-level traffic routing, it is hard to hide it from parts of the network that need to route application-level traffic. For example, clients also send hostnames in-the-clear in the TLS Server Name Indication as part of initiating TLS sessions. There have been proposals for encrypted SNI (most recently draft-ietf-tls-esni), but the value of this has been limited as long as DNS lookups are in cleartext.
    • Protecting against traffic analysis is an inherently hard problem. Even if DNS questions and answers are encrypted, attackers also have a wide set of options for: performing timing analyses on these lookups; observing the IP addresses clients connect to and correlating those to services known to be running on those IPs; observing lookups from a recursive server to authorities and correlating these with lookups from the client to the recursive server; and many more. Users with extremely high privacy requirements or seeking to circumvent censorship are likely always going to prefer using a system designed to provide privacy (such as Tor or Psiphon), but even then those systems have limitations.
    • Some concerns have been raised that DoH and DoT can actually increase linkability between requests in some cases. For example, independent UDP DNS requests being translated through a NAT may be somewhat hard for a recursive server to correlate with each other without additional identifying attributes. However both TLS and HTTP add not only persistent connections linking requests but also add enough additional attributes (session tickets, TLS ClientHello variations, HTTP header ordering variations, not to mention HTTP cookies) that a DNS recursive server desiring to do so can likely correlate multiple requests made by the same client.
  • Integrity: As a protocol initially deployed without authentication, DNS lacked a way to provide integrity for responses. This meant that an active attacker (or an attacker that could perform a cache poisoning attack) could cause a client to get and use a bogus DNS response. DNSSEC was introduced as a way to authenticate DNS responses, but it has faced challenges in its uptake and deployment, and most implementations are focused on securing lookups between resolvers and authorities. Another value of a DoH/DoT service is that it can help provide integrity for client-to-resolver communications, allowing resolvers to securely communicate DNSSEC-validated responses back to clients not performing DNSSEC validation, mitigating some of the operational deployment complexities around DNSSEC.
    • Unauthenticated UDP-based DNS has historically also been vulnerable to cache poisoning attacks from off-path attackers, such as via "Kaminsky attacks," with new similar attack vectors being found periodically. Using TCP and/or TLS helps mitigate many of these, but only for the leg of the DNS lookup where they are being used.
    • Note that for active network man-in-the-middle attacks between clients and resolvers, an attacker that can return invalid DNS responses to poison a client's cache can often also simply reroute IP-layer traffic instead or as well. (A notable example here is nation-state actors returning bogus DNS responses containing targets they'd like clients to "attack.")
  • Performance: Improving performance and the corresponding end-user experience has significant value. Using local DNS resolvers often performs better, especially if they can cache responses across pools of users. Recursive-DNS-as-a-service can either improve or hurt DNS resolution performance based on how it is configured and how close it is to end-users.
    • DNS lookup times are not the only factor in performance. DNS answers often contain the IP addresses of a service instance that a client is being directed to visit. When a user is using a recursive DNS server in a different location from them in the network, DNS-based traffic direction (such as used by some CDNs) may send the user to a service instance dictated by the location of the recursive DNS service rather than by the location of the end-user. ECS can be used to mitigate this, but only if it is sent by the recursive DNS service and only if the DNS-based traffic direction system takes ECS into account. Even though ECS only contains a prefix of the client IP address, some recursive DNS services may still be hesitant to send it in cleartext to authorities as this could be construed as a downgrade from TLS-to-cleartext. (In particular, an attacker monitoring the network around a recursive DNS server can more readily perform timing and correlation attacks when ECS is present: on a cache miss, an encrypted query is likely to have a causal relationship to triggering an unencrypted DNS lookup containing the client IP and the name from the encrypted query.) The net result here may be that when using a DNS service lacking ECS, the DNS lookup times could be fast but the overall user experience might be sub-optimal and more likely to go over congested internet links.
    • Some proposed future directions for DoH have included performing HTTPS requests and DNS lookups over coalesced HTTP/2 and/or QUIC connections to mitigate connection establishment times. (ie, when talking to "www.example.com" for HTTPS, giving it ways to push DNS records for other resources on the page such as via HTTP/2 server push.)
  • Policy Enforcement: Filtering and applying policies against DNS lookups, such as via RPZ or other proprietary mechanisms implemented in recursive DNS services, are important use-cases in many environments. These policies may be implemented by the administrator of a client or end-point as part of an Enterprise policy, anti-malware software, or parental control software.
    • DoH/DoT can improve the effectiveness of such methods by authenticating the trusted recursive resolver, helping to ensure that the request has gone through the policy enforcement filter. This benefit requires flexibility/configurability for the DoH/DoT resolver, but configuring a DoH/DoT resolver service that is trusted to apply policies (such as anti-malware) can enable a user to get these policies regardless of which network they're connected to, even in the face of a hostile local network. For example, ISPs that offer DNS-based security services to their consumer clients could enable users to securely get the benefits of these services even when not connected to the ISPs' network by using a configured DoH/DoT trusted resolver service offered by the ISP.
    • Disclaimer: Akamai has a number of offerings in this space. Akamai's Enterprise Threat Protector and Security and Personalization Service both leverage DNS filtering to protect against malware and other threats. Akamai's managed and licensed recursive DNS offerings also power the parental control and content control services offered by a number of ISPs.
  • Censorship Avoidance: Policy enforcement mechanisms also can be used by governments or other entities for Internet censorship. This is seen around the world and often implemented by governments providing lists of DNS names and IP addresses that ISPs are required to block in the DNS resolvers they provide to their users, as well as in their border routers (at the IP layer but also sniffing for cleartext Host header fields and TLS SNI indicators). Enabling DNS lookups over HTTPS via DoH has some potential to bypass some of these censorship filters by bypassing ISP DNS resolvers. This is an area full of risks and competing constituencies.
    • On the one-hand, making DNS traffic opaque to censoring entities by mixing it into encrypted multi-purpose HTTPS channels does make it harder to block.
    • On the other hand, if DNS-over-HTTPS becomes a dominant mode of DNS lookups it seems likely that censoring governments will take more invasive approaches. For example, governments may start mandating TLS man-in-the-middle on all traffic (which would be a significant step backwards and would lose many of the benefits obtained by the shift to HTTPS). Alternately, governments may start mandating that users use ISP DNS resolvers and require ISPs to have dynamic firewalls that block traffic from a client to IP addresses not returned to that client via a DNS lookup of an allowed DNS name. Both of these might end up being worse than the status quo and could negate any gains potentially made here.
  • Offloading Complexity: DNS recursive resolvers enable the use of very simple client stub resolvers. This can be especially important in resource-limited clients and devices with long life-times, such as embedded IoT devices.
    • The cryptography underlying TLS can add significant complexity, both in-terms of CPU resources in devices, but also in-terms of managing cryptographic agility. The timescale in which cryptographic algorithms remain secure is often shorter than the timescale over which embedded devices are expected to continue functioning, as evidenced by the challenges faced by the retirements of MD5, SHA1, RSA1024, SSLv2, SSLv3, DES, RC4, various certificate authorities, and others. Each one of these resulted in abandoning support for non-updatable devices as the services they relied upon disabled old cryptographic primitives. In contrast, DNS stub resolver implementations in embedded clients have remained relatively stable over the past decades.
  • Cost: There are infrastructure, complexity, and human costs associated with deploying and managing DNS resolvers. It is almost certain that DoT and DoH will be more resource-intensive than unprotected DNS-over-UDP, both due to cryptography but also due to additional resources for holding TCP and HTTP state. Many ISPs have significant investments in DNS infrastructure which may not have sufficient capacity headroom to move directly to DoT and DoH. Significant capital expenditures will require strong business justifications.
  • Network-Local Behaviors: Some deployment scenarios have requirements for DNS results that are influenced by the local network environment. While some of these already need to be handled on the client (such as for DNSSD), the remainder might need special client-side handling in-order to use a DNS resolver not associated with the local network. For example:
    • Enterprises and carriers often deploy "Split DNS" (often called "split-horizon DNS") where different views provide different results. For example, names internal to the Enterprise may resolve to private IP addresses within the Enterprise environment and private zones may be used to resolve names internal to the Enterprise. Using an off-net DNS resolver with no local context may cause performance issues and/or break applications.
    • Home networking has a need for names that resolve locally within the home, such as ".local" and "home.arpa" names resolved via DNSSD. A major architectural gap today is there there's not yet a sane way to establish trust between and get TLS certificates for endpoints within a home network.
    • DNS64 where an IPv6-only network provides access to IPv4-only services by synthesizing IPv6 "AAAA" record responses by prefixing a NAT64 IPv6 prefix onto DNS IPv4 "A" records.
    • Many DNS-based local and global load balancers make decisions based on the network location of a user, such as by sending a user to a local service when on-net or to a cloud-based service when off-net.
  • Mobility: Clients are increasingly connected to more than one network within short periods of time. Mobile devices may switch between WiFi and cellular networks as users move around, with a desire for a seamless experience that still maintains both Performance and Privacy properties. Homes and businesses may increasingly have Internet connectivity from multiple sources. Enterprise users may connect to split-tunnel VPNs for some of their work day. Relying on a single DNS service tied to a single network may be problematic in some of these cases, especially in the face of Network-Local Behaviors. Additionally, users and Enterprises configuring Policy Enforcement on devices (such as for parental controls or malware mitigation) may wish these policies to be enforced independent of the current access network provider.
  • Availability: The DNS needs to remain available, even in the face of DDoS attacks and massive events. There may also be trade-offs in the availability vs. integrity/privacy space in the face of on-path network attackers.
    • Being able to scale authoritative DNS server infrastructure to withstand large attacks is crucial but also challenging, and adding expensive cryptography inline with requests does not necessarily help.
    • Relying on any form of over-centralization or implementation monoculture also introduces availability risks, such as if most widely used DNS recursive servers were consolidated to be operated by only a few organizations.
    • For the client-to-recursive connections, an inherent challenge is that an on-path network "attacker" (such as the local ISP or WiFi provider) usually has the power to block secure connections as long as the end-points can be distinguished from other traffic. This can force clients to make a choice between retaining availability but losing integrity/privacy by talking to an unauthenticated or untrusted DNS service.
  • Supportability: Troubleshooting Internet connectivity and performance problems often involves tracing DNS lookups or investigating issues between clients and their DNS resolvers. Users, ISPs, and Enterprises will all face new customer support challenges if clients increasingly switch away from local DNS resolvers and as individual applications start using different DNS resolvers.

Multiple Use-Cases

The Internet is far more than just web browsers and mobile devices. For each deployment use-case, the needs of different constituencies are also weighted differently. (For example, the "end-user" constituent is critical in many mobile device and home networking use-cases, but the "enterprise operator" constituent is critical in many data center, embedded device, and managed enterprise client use-cases.) Some of these use-cases are directly confronted by the "Tussle in Cyberspace" without a clear solution always being present. Each of these use-cases has different demands on the objectives outlined above. Here is a certainly far-from-complete sampling of use-cases, some of which have some overlap:

  • Mobile devices (phones and laptops) used by consumers / end-users: Privacy, integrity, and performance are some of the top objectives for this use-case. The balance between them will vary between end-users, with censorship-avoidance also being desired by some end-users. How to trade-off privacy vs. performance is not always clear here as many users care heavily about performance and don't realize they care about privacy until there is a situation causing them to need it desperately. These end-users are also often not technically sophisticated, so providing them multiple choices or requiring that they understand how the Internet works is not realistic. These users also move between a wide mix of networks, including networks that they may have reasonable trust in depending on applicable laws (e.g., a mobile telecom operator constrained by GDPR and other E.U. regulations) and totally untrusted networks (e.g., coffee shops and other forms of public WiFi). An ideal architecture here would be one that provided both balanced privacy, integrity, and performance regardless of which network they are connected to without requiring user intervention.
  • Home networks: Home networks bear similarities to mobile device networks in terms of having unsophisticated end-users who want things to "just work." Home networks may have somewhat more trust in their local network provider (as there is often just a single ISP they may be connected to through a channel secured by the ISP). Home networks may also contain consumer electronics and embedded IoT devices with their requirements for performance (e.g., for OTT video streaming from a local CDN cache) and for a desire to offload complexity into local network services. Operators of home networks may wish to enforce policy (such as "Parental Controls") on local devices, including mobile devices connected to or even affiliated with the home network. Home networks also have network-local behaviors for local service discovery, and at some point soon there will be a desire to have a way to establish trust between devices within a home to enable secure HTTPS secure communications between local services.
  • IoT embedded devices: Many "Internet of Things" (IoT) devices have long lifetimes, infrequent updates, and limited resource footprints. As such offloading complexity is often critical. For example, these devices must continue operating securely for longer than the lifetime of some cryptographic algorithms (and certainly longer than the lifetime of certificate authority and DNS root keys), and updating these devices can often be a challenge. Depending on the nature of the device, integrity is also often important. Some use-cases such as automotive may also have privacy requirements.
  • Data centers: Services running in Enterprise and Cloud data centers have a very different set of requirements than end-user devices. Integrity, performance, policy enforcement, and an ability to leverage network-local behaviors are often required. There is often much less mobility in these environments (beyond VMs and containers migrating between hosts).
  • Administratively managed devices: Enterprises, schools, and other organizations often have strong requirements for policy enforcement. Providing integrity protection for DNS responses and securing communications against malicious network actors is also often strongly desired, especially as some of these devices may be mobile. Network local behaviors are sometimes desired, but only from "trusted" networks and in conjunction with policy enforcement.
  • Devices and ISPs under government-mandated controls: Some governments require that ISPs limit which sites users can visit, often by requiring that ISPs implement some mixture of IP-based, DNS-based, and TCP flow-based filtering. There is an inherent tussle here between these controls mandated by local laws and end users' desires for privacy and censorship avoidance. As more traffic has been moving to HTTPS, DNS has remained as an easy and generally effective way for ISPs to implement these government requirements. If DNS moves to an encrypted channel and becomes indistinguishable from other HTTPS traffic, ISPs may be forced by their local governments to implement more draconian measures that would lessen privacy, security, and user choice compared to the status quo. For example, it is possible we'd see more of:
    • Mandated TLS man-in-the-middle for all traffic (which would be much worse from a privacy and integrity perspective for all traffic, not just DNS traffic).
    • Firewalling based on DNS responses to force users to use ISP DNS resolvers.
    • Mandated use of an ISP or government-provided DNS resolver, even when mobile or roaming.

From the above it should be clear that there's likely not a single technical solution that covers even just these use-cases, let alone the other corner cases not enumerated above.

Potential Architectural Directions

Based on the range of use-cases and objectives, it is likely that multiple solutions will be needed to satisfy them. In this section I discuss some possible directions, although some of these may turn out to not be practical or desirable once they have been explored in more detail.

Today we have DNSSEC and the DoT/DoH protocols:

  • DNSSEC helps with integrity for the Authoritative to Recursive path when configured on both sides. It has the potential to help with integrity for from Recursive to Stub but this is not widely deployed or configured, and that flies against Offloading Complexity use-cases due to the complex and evolving nature of DNSSEC. DNSSEC also doesn't interoperate well with Network-Local Behaviors. DNSSEC also has challenges for highly dynamic responses, as exist in some CDNs. DNSSEC helps *only* with the integrity of responses as it provides no encryption on the wire.
  • DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH) just provide a secure transport for DNS. By themselves they do not provide a mechanism for how to securely provision their usage and establish trust. Other than explicit opt-in by technically savvy users, they also lack a way to be enabled with appropriate user consent, which becomes important but also challenging when performance and privacy and who-to-trust trade-offs must be made. The degree to which these help provide integrity, privacy, and censorship avoidance are directly related to the ways in which they are configured and used.

(There have also been independent DNSCrypt and DNSCurve proposals and implementations, but neither of them is part of active work streams in the IETF.)

Across all of these it may be necessary to extend the local configuration and policy APIs in operating systems and client software to allow central configuration of a trusted resolver (or local DNS proxy service which in-turn uses its own set of trusted resolvers). Many operating systems and browsers already have frameworks for allowing managed device policies to be configured.

Tightly Managed Environments

For tightly managed environments such as Administratively Managed Devices and Datacenters, configuration options here become more straight-forward as there is a clear client administrator who is configuring policy for which DNS server to use, how to authenticate it, and what policies to apply. A few aspects that will need to exist for each of these:

  • How to configure the trusted DNS resolver to use, both in-terms of its endpoint (i.e., URL or IP addresses or hostnames, as well as protocol) and how to authenticate it (i.e., what sort of certificate and how to validate it). For Administratively Managed Devices running some sort of software plugin that applies policy via DNS, such as Akamai's Enterprise Threat Protector (ETP) or similar products from other vendors, this can be managed from that application's configuration. As the objective of these products is to help Enterprises secure their user devices, it becomes natural to move DNS over a secure transport (DoT/DoH) to provide additional integrity and confidentiality when the application is talking to an integrated trusted resolver service.
  • How to configure local applications and the operating system to use the desired DNS resolver. Some operating systems and applications (such as web browsers) provide mechanisms for managed device policy configuration. Having some standardized local APIs here may be valuable so that a local device policy application can configure the trusted resolver used by applications (which may be a local DNS proxy service that does DoT/DoH itself) .
  • A way to obtain DNS service configuration from the network in a trusted manner, such as for VPNs and data center networking configuration. A fundamental challenge here is that DHCP and similar protocols don't have bootstrapped trust, and as such may be reasonable to leverage for local networking configuration but lack a way to configure trusted services. (i.e., obtaining authentication and authorization configuration from an untrusted DHCP server provides no value beyond opportunistic security.) One direction may be to do something like configure onto devices which Provisioning Domains (PvDs) are trusted for which DNS names and then to allow the PvD policies (which are fetched via HTTPS) to indicate which trusted resolver to use for which names. For example, a static device policy might indicate "use a trusted resolver PvD 'corp.example.com' for all names within ('*.example.com' and '*.example.org') if one is specified, otherwise use https://doh-default.example.com/doh". Bootstrapping this configuration as well as providing good defaults will be a challenge here. PvDs are currently defined only for IPv6, so an actual solution may end up differing in how it is designed and implemented.

A challenge here is that solutions that may work for large Enterprises may not scale down to small businesses. Some of the same technologies that could be used here constructively to secure client devices also present risks that they could also be abused to further compromise user privacy and security in cases where they are installed surreptitiously on client devices.

End-User Environments

Mobile devices and home networks are a much larger challenge as there often isn't a knowledgeable administrator who can make good decisions about which trusted resolver service to use. As a result, good secure defaults are crucial.

Using opportunistic DoT/DoH has some value and is a starting point. For example, if a residential ISP's DNS authorities offer DoT/DoH then there is likely no harm in using it as opposed to using a cleartext channel. However there isn't a good way to bootstrap trust here and as such this should be looked at as opportunistic security.

Using a single trusted resolver service provided by the application or operating system may be one option, but this may have end-to-end performance trade-offs and has challenges for network-local behaviors. More so, approaches that would consolidate to a small number of trusted resolver services would likely be the wrong architectural direction for the Internet.

For network-local behaviors, there may be value in developing ways to build trust within home (and small business) networks. Providing a mechanism for clients to establish trust in a local trust root for local names might then enable local HTTPS usage in a secure manner. For example, a home router might be used as a trust root with clients using a QR code to register with it. This could then allow the home router to sign certificates for ".local" names valid and trusted only when connected to that network.

Domain-Associated Trusted Resolvers

For the more general case, a direction for exploration is to allow trusted resolvers to be associated with domains. In particular, clients would move from a model of using a single DNS resolver for all requests to using different resolvers for different requests. This would involve putting more DNS resolution logic into clients, but it also has attractive security and performance properties. From a privacy perspective, this could help reduce the exposure of DNS lookups to the service being looked up and has a potential to reduce the number of actors involved.

For example, clients might use Google's trusted resolver for Google hostnames, Facebook's trusted resolver for Facebook hostnames, and various CDN's trusted resolvers for hostnames served off of those CDNs. An added advantage is that when used with DoH there might be possibilities for delivering both DNS and HTTPS requests over the same HTTP/2 TLS or QUIC connection. Browsers and/or operating systems would still need a way to configure a default trusted resolver (or perhaps use an opportunistic resolver provided by the local network?) for other requests as well as for bootstrapping.

The architectural details for a system such as this still need to be designed. For example, some options might include:

  • Having a collaborative governance structure for trusted resolvers similar to the https://cabforum.org/ that would specify requirements around logging, security, privacy, and integrity in order for a trusted resolver to be used by browsers. There would then be a mechanism for domains to register one or more certified trusted resolvers as being associated with that domain, as well as a way to distribute the domain-to-resolver association to clients. For example, Mark Nottingham proposed using a Bloom filter to encode a digest of these associations as a way to help this scale to a large number of domains.
  • Having a decentralized model, such as with a resource record on domains specifying a trusted resolver to use for that domain. For example, a DNS domain could include a new DNSSEC-signed Trusted Resolver ("TR"?) resource record type indicating the URL for a DoH service to use for looking up names within that domain. This would be using DNS for bootstrapping recursive resolvers, with all of the associated integrity (e.g., via DNSSEC) and privacy problems. The design for this bootstrapping will be challenging, as will be avoiding leaking too much cleartext information to the network.

There are many subtleties and details that would need to be worked through for each of these. These approaches also add significant complexity so are likely not applicable for use-cases where offloading complexity is desired.

While Domain-Associated Trusted Resolvers might be a reasonable "safe automatic default", end users and device administrators would also still need to be able to opt-in and configure specific or fixed Trusted Resolvers and/or policy servers. Any secure Trusted Resolver model also does not have an obvious solution to enable Network Operators to enforce policy without user consent.

Next Steps

Given the critical nature of the DNS in the overall architecture of the Internet, many impacted stakeholders will likely want to have a say in where and how the architecture of the DNS evolves. Many of these changes will likely take place slowly over many years, while others could potentially be implemented by Browser or OS vendors as changes in default behavior over the course of weeks that could immediately have wide-ranging impact. It's no surprise that the latter is creating much anxiety, but even the slower changes could fundamentally shift markets and business models over time. Participating in Standards Body discussions, such as IETF working groups, can help influence directions and provide valuable feedback on technical details. The actual impact of these proposed standards, however, ends up being a function of how software vendors, service providers, and network operators adopt and implement the resulting technology. There doesn't yet appear to be a single solution that satisfies all use-cases and needs, but rather an evolution towards a family of technologies that will work in different environments. An ongoing risk here is that this will further increase complexity of an area that already has extreme complexity. However it is likely that efforts in this space will continue so long as there is a desire to improve security and privacy for Internet users.

Thank you to many folks at Akamai for their contributions to this article, as well as to many discussions with individuals outside of Akamai that have helped shape my thinking on this topic. Conversations at the DRIU BOF at IETF 102 and surrounding conversations as well as on the IETF DNSOP mailing list have been particularly valuable inputs.

While precautions have been taken in the preparation of this document, Akamai Technologies, Inc. assumes no responsibility for errors, omissions, or for damages resulting from the use of the information herein. The information herein is subject to change without notice. Akamai and the Akamai wave logo are registered trademarks or service marks in the United States (Reg. U.S. Pat. & Tm. Off). Cloudflare, Firefox, Google, and Mozilla are trademarks of respective entities and are used only for identification purposes and to the owner's benefit, without intent to infringe. Published October 16, 2018.