Broken links send a strong negative signal to visitors: “This website is broken and outdated!” Clearly, that’s not the impression you want your website to make. Dr. Link Check is a simple yet powerful solution that helps you identify broken links before you lose your reputation and your clients.
Enter the address of your website below to start a quick check:
Dr. Link Check starts with the URL you provide and then crawls, page-by-page, through the entire website. In the process, each link has to pass several crucial tests:
The first step is to check if a link conforms to the rules for valid URLs. This weeds out links like
https://www.example,com/ (comma instead of a period) and
http:images/example.jpg (missing host). Valid URLs with schemes not verifiable by the crawler (such as
tel:+555 1234 5678 or
file://server/file.docx) are marked as “Unsupported,” with the recommendation to check them manually. Currently supported URL schemes are
In the next step, the hostname extracted from the URL (if available) is translated to an IP address. For this step, the crawler asks a DNS server for the A (IPv4) or AAAA (IPv6) records of the hostname. If no records are available, or the domain’s nameserver doesn’t respond in time, a “Host not found” error is reported.
Now that the IP address is known, a TCP connection to the server is established. At this stage, two errors are possible: a “Connect error” if the connection attempt fails, or a “Timeout error” if the connection cannot be established within 40 seconds.
For HTTPS links, the crawler checks for four essential components: 1) the SSL certificate returned by the server is valid, 2) it has been issued by a trusted certificate authority (CA), 3) it’s not expired and 4) it actually belongs to the hostname. In cases where the server only supports obsolete protocols (such as SSLv2 or SSLv3) or cipher suites, the link is marked as broken with an “SSL handshake error.”
When examining the response received from the server, the crawler first checks the value of the HTTP status code: Values in the
3xx ranges indicate a correct and successful response, while all other values are considered an error. The most common HTTP status codes are
301 (Permanent redirect),
302 (Temporary redirect),
404 (Not Found), and
500 (Internal server error). Redirects are automatically followed up to 15 times, before the crawler gives up with a “Too many redirects” error.
If the server returns an HTML or CSS document, the source code is parsed and discovered links are added to the queue to be checked. The crawler not only looks for
<a href> page links, but it also extracts URLs from
<iframe src>, and dozens of more HTML tags and CSS attributes.