DNS issues can sometimes be tricky and involved to pinpoint and there are many concepts in how to create more redundant and resilient DNS records and fallbacks.
To help with this discussion, it’s helpful for everyone to know just a touch more about how Domain Name and DNS records work. Here are a few starter articles on how it all comes together.
It’s important to understand that, during an outage, the IP address that resolves when pinging the subdomain is responding to our tests or to tests from outside of the same region. In that case, it implies that there is not a problem with that server/hosting location of the site, but instead that there may be some other issue at hand.
Many times, the fact that some users in a local area could see/load the subdomain successfully and others could not shows that there was an inconsistent error/issue that could have been network-specific, carrier-specific or even a regional issue.
We had a local client who experienced that issue at a California tradeshow when they couldn’t load their site from CA but everyone in NC could load the domain just fine. We found out later that the internet provider for them in CA was having major DNS issues that cleared up later that day.
There’s a useful tool called a Traceroute which lets you see all the steps (or hops) that are taken when you look up a website by domain name. This tool can help troubleshoot an issue of this type and we'll often recommend it to clients to test with. During a reported item, we'll use one to help diagnose by using a Visual Traceroute tool on the day of outage to see that from some locations would complete successfully, while for others it would die/stop in the middle. In those circumstances, calling the site by domain name would not work and the user would receive an error “domain/dns record cannot be found” type of message.
Another thing that complicates matters is that the DNS information can be read from multiple locations and stored/cached in other locations as well. This creates a scenario that can be arduous to troubleshoot, though possible with time. Some examples for these are that there can be the DNS records that you see when browsing from your phone, different cached DNS records pulled when loading from within an office network and different still when on another carrier.
Inside an Office Network, there can be local DNS records which are involved with local (internal only) subdomains, email, etc.
Outside World – The DNS records for the domain can be with services such as CloudFlare, Hubspot, etc., as well as with the main registrar.
How To Troubleshoot? - One of the main external testing tools that we recommend is a service called, “down for everyone or just me” which calls the domain name from their location. This removes any question of local/regional issues, or network issues, or local computer problems and tests to see whether a website is "down for everyone else or just for me." If their service says that a website is up... then the issue is likely a local one that needs to be further investigated.
SO, with the testing at the different levels and finding mixed successful and nonsuccessful results, another element that can be involved is local carrier and “path to the internet/dns” that can be experienced by users based on how/where they are connecting to the internet and what DNS servers those services are connecting to which eventually get to the final DNS record location...
These local type of outages, whether whole regions or specific to certain carriers, is something that can happen from time to time and recent ones associated with DDoS attacks have made news:
So, perhaps this helps explain a domain name related issue of local outage, or perhaps it makes it more confusing! In any case, if there is a DNS-related issue experienced, and it is temporary, then it should clear up in an afternoon. If it happens frequently or even a few times, then I would recommend further actions to help investigate and address the issue. Just let us know what you're seeing and experiencing and keep good notes! It will help us all work together to troubleshoot.