How to remove DNS record takeover bug class ?

Dangling DNS records are not something new. They are just out-of-date DNS records which may have served its purpose in the past. This DNS record trash has been there for ages and was not considered a security issue. They are pointing to some resource (IP or DNS record) that was owned/trusted in the past.

What makes the dangling DNS record deadly is the fact that others can seize the resources that the record is pointing to. DNS record takeover, majorly known as domain/subdomain takeover is just exploitation of dangling DNS records. This DNS record takeover falls under security misconfigurations bug class, but I think it has matured enough to make itself a bug class.

In this blog post, I will walk you through the (sub)domain takeover bug class, the different types of takeovers, and finally the mitigations. To keep this blog post simple, let’s assume that we use AWS services for DNS management and all other workflows. The mitigations that I will show you is cloud provider agnostic. You can use the same methodology even if you use a different cloud provider like GCP, Azure, etc, or a mix of cloud providers.

Domain/subdomain takeover is when an external entity acquires the resource (IP/DNS record) that the dangling DNS record points to. It’s technically controlling the resource that DNS record points to, which in turn controls the content behind the DNS record.

There are many types of DNS records, however, 4 of them can be affected because of this bug class. They are:

  • CNAME record
  • A record
  • MX record
  • NS record

Exploiting these dangling DNS record types form the basic attack vectors for (sub)domain takeover bug class. Each attack vector might look similar but the impact and probability to exploit differs.

This is the most commonly exploited attack vector. There are multiple open-source tools to detect and takeover the subdomains if they are found to be vulnerable. There’s a lot of competition among bug hunters to find this kind. This is because:

  1. Easy to judge if a CNAME record is vulnerable (because of some identifiers). In most cases it’s just an error message that discloses if the subdomain is vulnerable.
  2. Most bug bounty programs consider this as Medium to High severity bug. So it’s a bug that assures bounty.
  3. Detection and exploitation can be completely automated. Lots of open-source tools that readily available. Easy bounty if found.
  4. This bug is not just a mistake on the company side, the SaaS providers are also part of the blame. Lots of SaaS vendors don’t have proper verifications/controls in place which leads another user to acquire the same resource that the company / other users had in the past. The CNAME takeovers will continue to exist till all the SaaS providers have proper verifications.

The common reasons for this attack vector are:

  1. The company no longer uses the SaaS product and the SaaS provider allows the reuse of the same subdomain
  2. Record pointing to expired domains
  3. Typo in resource name the record points to

This is when the A record points to IP address(es) which is no longer owned by the company. The A record takeover is more common than other record type takeovers, mainly because of how the cloud providers assign public IPv4 addresses to VMs. However, this attack vector is rarely exploited and is done only by a few bug hunters and security researchers. Reasons for this being a rarely exploited are:

  1. It’s hard to determine if a particular IP pointed by the A record is vulnerable and exploit it. There’s no clear identifier that tells if a domain is vulnerable like in the case of CNAME takeover. If an IP doesn’t have open ports, it can be

    1. behind a firewall
    2. reassigned to some other cloud user and can’t be exploited until they release it
    3. released to the cloud providers IP pool

    The exploitation is likely possible when a subdomain points to a single IPv4 address which was released to the IP pool and is assigned to the attacker by the cloud provider.

  2. There is less chance of success when targeting a single organization/bug bounty program.

  3. This attack vector also depends on the cloud provider. This takeover is possible if the cloud provider randomly assigns you a public IP each time you create a VM. If the cloud provider has some algorithm that assigns you an IP from a small set of IPs instead of the entire IP pool, then exploitability becomes hard.

This method is similar to the CNAME takeover attack vector. The vulnerable scenarios remain the same as those in the CNAME takeover. But MX takeover has a comparatively low impact. This is because claiming the resource in MX record can only allow you to receive emails.

Another unpleasant fact is that there’s usually more than 1 MX record for a domain. If only 1 among 3 MX records is available for takeover, then the impact depends on the priority of vulnerable MX records. Successful exploit of the MX record might almost lead to phishing attacks (with reply to email as one of the valid subdomains) and intellectual property stealing.

This method is also similar to the CNAME takeover attack vector. The impact of this is comparatively high as the takeover of the name server address allows you to become an authoritative name server of the domain. But like MX records, there is usually more than one NS record for a registered domain.

NS record errors are the rarest because:

  1. Most domains have name servers that are configured to all respond all queries. Usually they are managed name servers of the domain registrars / cloud providers. It’s only few companies that have separate NS records for it’s subdomains.
  2. This vector tend to occur only when there are two or more NS records. If there’s a single NS record configured and its not properly setup, the domain resolution would not work in the first place. This is also the reason why successful takeover of NS might not affect 100% of the users trying to connect to the site.

The most common reasons for this attack vector are:

  1. Expiry of the domain mentioned in NS record
  2. Typo in NS records

To know more about this takeovers look at the resources. Don’t forget to check Patrik Hudak’s blog posts which beautifully describes these types of takeovers and its impact.

With the research and my experience of subdomain takeover detection, I can say the following:

Type Occurrence Effort to Detect & Exploit Impact
A takeover MEDIUM HIGH HIGH
CNAME takeover MEDIUM LOW HIGH
MX takeover MEDIUM LOW LOW
NS takeover LOW LOW HIGH

When an attacker gets hold of it, he/she can do multiple things. According to a Medium blog post, the impact includes:

  1. The ability to host malicious files that can be used to steal user data.
  2. The ability to launch phishing campaigns on behalf of the targeted company.
  3. The ability to chain other vulnerabilities to take over user accounts and other sensitive information
  4. The ability to display malicious content that would degrade the usability and accountability of that company.

Other impacts include:

The mitigation for this bug class cannot be achieved only using technology. You need to add people and processes for this mitigation formula to work.

You can imagine writing an amazing tool to get all registered domains of your company, their DNS records and make sure it’s not vulnerable to takeover. However, in reality, it’s not easy as writing code. If there’s no guidelines/process which tells employees where to register domains for the company and how to manage DNS records, they are going to do it whichever they find easy (more common in startups). This means that your amazing tool secures a part of the overall domains owned by the company.

“You can’t protect what you can’t see”

Have a centralized DNS management which includes both domain registration and DNS records management. If that’s not possible / already using multiple providers, then have a strict process to limit the number of providers. By doing this you will not come across a subdomain takeover bug report of a subdomain which you never knew existed.

Once the process is set then you can solve the rest of this problem with automated audits. The audit on a high level will look like the following:

  1. Get the list of all public IPs you own

    If you are using AWS, almost all public IPs are found in AWS EC2 service (Public Instance IPs, Elastic IPs, NAT Gateways, Elastic Network Interfaces). Create a script to dynamically get the public IPs.

  2. Get the list of Cloud Provider services which creates custom subdomains

    Understand what cloud provider services are being used in the company. If the services create a custom subdomain, update the script to dynamically fetch the custom subdomains. The list of AWS services that create custom subdomains includes AWS Cloudfront, AWS S3, AWS ELB, etc.

  3. Get a list of DNS records that you own

    Fetch the list of all DNS records that can be affected by this bug class, namely A, CNAME, MX and NS records. In AWS, you get it from Route53. Here are some tips that helps you to write the logic of the script faster:

    • Only collect the A records which point to public IPs. A records pointing to private IPs cannot be taken over.
    • In the case of CNAME, MX and NS records its hard to determine if the pointed resource is private or public. So note it all. Theres no problem in case the domain/subdomain pointed by a DNS record is private and hosted in our AWS zones.
  4. If any DNS records are not pointing to IPs / DNS records / Cloud Provider services that you own, flag it

    Cross check each DNS record with the IPs and other subdomains you own. In case of A records, check if the public IP is in the list of IPs you own. For other DNS record types, check if it is pointing to a DNS record that you own. If the DNS record doesn’t match a resource that we own, then it means it is pointing to an external resource.

  5. Check the flagged DNS records if they are whitelisted (known 3rd party domains and IPs). If not, then report it

    For each DNS record that doesn’t point to a resource owned by the company, check if they are whitelisted (known DNS records). If not, then report it. If the DNS record was intentional, then add it to the whitelist else delete the DNS record.

    You can further increase the effectiveness of detection by the following:

    1. For CNAME records, you can run available takeover detection tools to find if they are vulnerable.
    2. Whitelisted 3rd party resources are not permanent, it needs to be regularly checked but not frequently. For the whitelisted domains, have a periodic check to make sure that the 3rd party resource was not expired/abandoned by the teams.

While researching the solution for this (sub)domain takeover bug class, I found other alternate solutions that aren’t effective. Here are some ways on how you SHOULD NOT solve this problem:

  1. Hiring companies to report/mitigate subdomain takeovers

    This is what I call “Cutting it from the outside in” approach. Any external bug hunter/security company can only gather the subdomains from different sources, like Shodan, Censys, VirusTotal, etc. However, this list is not 100%. After analyzing these subdomains, there is a possibility of still having subdomains leftover that can be vulnerable to takeover.

    DNS management services like AWS Route53 are a single source of truth for all DNS records owned by the company. Do not follow this method unless your company does not allow you to access the DNS management service.

  2. Trying to use static IPs for all the public instances and deleting DNS records before releasing the IP

    One idea to remediate A record takeovers would be to use static IPs for all VMs on the cloud. Then before releasing it, check for all the A records pointing to the IP and remove them. By doing this, we will remediate instant dangling DNS records and will get time till we remove the DNS record. This might sound promising at first (in theory). However, in reality, the number of static IPs that could be assigned is dependent on the cloud provider.

    In AWS, you can only have 5 Elastic IPs per region. Also, to make things worse, the Elastic IPs are charged separately if they are not assigned to any VMs. Even using Elastic IPs, if you release it without deleting DNS records, it remains vulnerable. So, this is a bad solution and can’t be used.