Dr. Link Check unterzieht jeden Link einer genauen Prüfung und liefert anschließend eines der folgenden Ergebnisse.


OK

The link works as intended. No action is needed.

For https or http links (like https://www.example.com/), this means that the web server returned a HTTP status code in the 2xx range, indicating a successful request. The most common success status code is 200 (OK).

For data links (like data:text/plain;charset=utf-8;base64,SGVsbG8sIFdvcmxkIQ==), our crawler makes sure that the URL is syntactically valid and the data can be correctly decoded.

For mailto links (like mailto:mail@example.com), it is verified that the email address’s domain name actually exists and has an MX record. An MX record is a DNS entry that specifies which mail server is responsible for receiving emails for a domain. Emails sent to domains without MX records will usually be returned as undeliverable.


Invalid URL

The link address is not properly formatted, possibly due to a typing or copy-and-pasting error.

For instance, https://www..example.com/ is flagged as an invalid URL due to the second dot. Here are a few other typical examples:

  • https://www.example.com): The URL ends with a closing parenthesis instead of a slash (/).
  • https://insert link here: A placeholder text was turned into an invalid http URL.
  • mailto:jane@example.com&subject=Hello World: The mail subject is mistakenly delimited by & instead of ?.

Please note that Dr. Link Check is a bit more strict than browsers like Chrome or Firefox when it comes to validating the syntax of a URL. For instance, https:///www.example.com/ works in most browsers but is marked as an Invalid URL by our crawler due to the third slash (which is not allowed according to the official specification).


Unsupported scheme

The link’s URL is well-formed but uses a scheme (the part at the beginning before the colon) that our crawler doesn’t support. Only http, https, data, and mailto links can be fully checked by the crawler.

Examples of unsupported schemes include:

  • tel: Used to link to a phone number that is dialed when clicking the link (example: tel:+1-123-555-5555).
  • javascript: Used to specify JavaScript code that is executed when clicking the link (example: javascript:alert('Hello, world!')).
  • ftp: Used to link to a file on an FTP (File Transfer Protocol) server (example: ftp://speedtest.tele2.net/10MB.zip). Since FTP is deprecated in modern browsers, we recommend replacing FTP links with links to HTTP(S) servers where possible.
  • file: Used to link to a file on the user’s local file system (example: file:///c:/path/to/the%20file.txt). If you see a file link on a website, it’s almost always a mistake from copy/pasting content from a local computer to the web server. In this case, update the link to point to the correct location of that content on the server.

Sometimes our crawler encounters URLs like htttps://www.example.com/ or httphttp://www.example.com/. These URLs are probably incorrect and contain a typo. Nevertheless, they are not counted as broken and instead are marked as Unsupported because in theory htttps and httphttp could be valid URL schemes supported by some app or browser extension. It’s up to you to decide whether URLs like these are intentional and correct or not.

In general, we recommend manually reviewing each link that is marked as Unsupported.


Host not found

The domain name used in the URL (such as www.example.com) could not be resolved to an IP address (like 93.184.216.34 or 2606:2800:220:1:248:1893:25c8:1946).

This means that our crawler failed to find a name server responsible for the domain and query that server to get at least one A (IPv4 address) or AAAA (IPv6 address) record for the domain.

Possible causes for this error include the following:

  • The domain name is not registered, either because it expired, or it was never registered in the first place (in which case it likely contains a typo).
  • The domain name is registered, but the DNS lookup didn’t yield any A or AAAA records for the domain name. This means that no IPv4 or IPv6 address is currently associated with the domain name. If this is your own domain, check with your domain manager or hosting company to see if the domain’s DNS records are correctly configured.
  • The name server is not reachable. This usually shouldn’t happen, because most domain names have at least two separate name servers for redundancy, but it’s a possibility. It’s also possible that there is a temporary network issue somewhere along the way between our servers and the name servers.
  • The name server returns bad or inconsistent data, probably due to a configuration error on the server.
  • Technically possible, but very rare: The domain name is registered with a domain name registrar but its existence was not yet announced to the global Domain Name System.

Please note that Host not found can be a transient error that resolves itself after some time. It’s possible that the problem has already been fixed, but the old DNS records are still cached somewhere because their Time-to-Live (TTL) has not expired yet. This error can also be caused by a temporary problem with the internet connection or an overloaded name server.

A useful website for diagnosing domain and DNS issues is GWhois.org. It simply performs a WHOIS lookup to check if a domain is registered and also displays the DNS records for the domain.

If you are not afraid of the command line, you can also use dig (macOS, Linux) or nslookup (Windows) to perform DNS lookups:

  • On macOS or Linux, open a terminal window and type dig www.example.com A to get the A records (IPv4 addresses), or dig www.example.com AAAA to get the AAAA records (IPv6 addresses).
  • On Windows, open the Command Prompt, type nslookup -type=a www.example.com to get the IPv4 addresses, or nslookup -type=aaaa www.example.com for the IPv6 addresses.

Connect error

Our crawler could not establish a connection to the target server.

This means that there appears to be a server available at the target address, but the connection attempt fails for one of the following reasons:

  • There is no service listening for new HTTP connections on the link’s IP address and port. This happens, for instance, when the web server software is temporarily shut down for maintenance.
  • The server actively refuses the connection request. This is often the result of a misconfigured firewall. Some servers also flag our crawler’s activity as suspicious and start blocking new connection requests after a while.
  • A network problem prevents our crawler from reaching the server.

Quite often, connect errors are only temporary and disappear when re-running the check.


SSL handshake error

The crawler failed to complete an SSL/TLS handshake with the target server.

An SSL/TLS handshake is the first step in establishing an HTTPS connection. During the SSL/TLS handshake, our crawler and the target server try to agree on which version of SSL/TLS and cipher they will use to encrypt and authenticate the communication. When a handshake fails, it’s typically for one of the following reasons:

If you see an SSL connect error error for on outbound link, you can try contacting the owners of the website and asking them to update their servers to support a current and secure TLS version.

If your own website still only supports SSLv2 or SSLv3, you should strongly consider enabling TLS 1.2 and 1.3. This might be as simple as editing a configuration file, or it may require upgrading your web server software and SSL/TLS library.

A helpful online tool for debugging SSL configuration issues is Qualys SSL Server Test. It simulates several SSL/TLS handshakes and rates the security and quality of the server’s configuration in terms of common SSL vulnerabilities.


SSL certificate problem

The SSL certificate presented by the target server failed verification.

Common certificate issues include:

  • The certificate has expired or is not yet valid.
  • The certificate was not issued by a trusted certificate authority (CA). This is the case with self-signed certificates.
  • The value of the Common Name (CN) field in the certificate doesn’t match the domain name in the URL.
  • The certificate has an invalid signature.

For a deeper analysis of a server’s SSL certificate, see Namecheap’s SSL Checker online tool.


Send/receive error

An error occurred while transferring data between crawler and target server.

This issue typically results from a sudden interruption of the connection, possibly due to a sudden server outage or a network hiccup.

Send/receive errors are often temporary and clear up on their own.


Timeout

The target server didn’t respond in time.

A timeout error can be the result of any of the following:

  • The crawler isn’t able to establish a connection to the server within 40 seconds.
  • The entire transfer (which involves connecting to the server, sending the request, and receiving the response) doesn’t complete within two minutes.
  • The server’s response data rate drops below 100 bytes per second over a period of 30 seconds.

Server timeouts can be caused by different things:

  • The server doesn’t exist or is currently offline.
  • The server or a part of the network (router, firewall, etc.) is overloaded or otherwise not running properly.
  • The server purposely slows down its responses because it identified requests from our crawler as unwanted bot traffic. We sometimes see this behavior with websites using content delivery networks (CDNs) like Akamai. If this applies to your own website, please contact us for a list of IP addresses to whitelist in your CDN’s control panel.

HTTP error code

The target server returned an HTTP status code outside the 2xx and 3xx range.

Every HTTP response begins with a status line, consisting of the protocol version, a numeric status code, and a text phrase that explains the status code (example: HTTP/1.1 200 OK). The status code is typically from one of the following ranges:

  • 2xx: Indicates a successful operation. The most common code from this range is 200 (OK).
  • 3xx: Redirects the client to a different location, usually because the resource was moved.
  • 4xx: Tells the client that it did something wrong, like trying to access a resource that is restricted (403 Forbidden) or doesn’t exist (404 Not Found).
  • 5xx: Indicates that something went wrong on the server’s side. 500 (Internal Server Error) is the most commonly used code from this range.

If the HTTP status code returned by the server is outside the range of 200 to 399, our crawler considers the link to be broken and reports an HTTP error code.

Below you can find a list of the most common HTTP error status codes and their meanings.


400 Bad Request

The target server rejected the request as malformed or incorrect.

This issue is normally due to one of two causes:

  • The URL contains unallowed characters or otherwise has invalid syntax. We often see Bad Request errors for URLs with unencoded special characters like < or & (which should be encoded to %3E and %26). Sometimes servers also use this error to complain about missing required query parameters.
  • The server didn’t “like” one of HTTP headers our crawler sent. This is often the case if the server is not configured to handle the domain name that was sent in the Host header.

Even though the specification considers 400 Bad Request a client error, it doesn’t necessarily mean that the URL is incorrect or our crawler is doing something wrong. More often than not, this error is triggered by a configuration or programming issue on the server-side.


401 Unauthorized

The requested resource is restricted and requires authentication.

This isn’t necessarily an error that needs fixing. 401 Unauthorized is often used to indicate that a visitor is currently not logged in and therefore doesn’t have permission to access the content. This is perfectly fine as long as the website provides a way to log in.

However, it can also mean that something went wrong on the server. Microsoft’s IIS web server, for instance, reports an Unauthorized error if it can’t access a local file due to missing read permissions on the folder.


402 Payment required

Some type of payment is required to access the resource.

Different websites use this status code in different contexts:

  • Shopify servers return a 402 if the shop owner didn’t pay the bill.
  • The website of The New Yorker magazine uses the status code to indicate that an article is not available until the reader signs up for a paid subscription.
  • Several APIs and services report this code if an account doesn’t have sufficient funds to complete the requested operation.

403 Forbidden

The server doesn’t allow access to the resource.

This error code is widely used for anything that’s “not allowed,” including the following:

  • The request is triggering a prohibited action, like deleting a record or listing the content of a directory.
  • The visitor doesn’t have the right permissions for the resource, possibly because a login is required.
  • The content is not available in the visitor’s country.
  • The server recognizes our crawler as an unwanted bot and blocks the request. In this case the link will probably work when you check it out in your browser, so you can ignore the error, as it doesn’t affect regular visitors.

404 Not Found

The requested resource was not found.

This is by far the most common error code. It simply means that the server cannot find any content at the requested URL.

It’s possible that the resource was deleted, moved (without setting up a proper redirect), or that it was never available and the URL was incorrect in the first place.

If you want to replace a previously working link that now returns a 404 Not Found error and are struggling to find a suitable alternative, give the Wayback Machine a try. It might have an archived copy of the original page that you can link to instead.

Please note that some websites return a 404 status code in the HTTP header but still deliver normal-looking content. This typically indicates a server configuration issue.


405 Method Not Allowed

The request method (GET, POST, etc.) is not allowed for the resource.

Our crawler usually sends a GET request to a server to ask for the desired document. If the server responds with 405 Method Not Allowed, it means that GET is not an appropriate HTTP method for this resource and that a POST, PUT, PATCH, or DELETE request was expected instead.

We sometimes see this error with websites that mistakenly use an <a href> element to link to a form URL that requires data to be POSTed and should have been used within a <form> element.


406 Not Acceptable

The resource is not available in the requested form.

When our crawler makes a request, it sends various Accept headers indicating the type, encoding, and language of the content it would prefer to receive from the server. The headers typically look like this:

Accept: text/html,application/xhtml+xml,*/*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9

If the server cannot produce a response matching the acceptable formats and is unwilling to supply a default representation, it returns a 406 Not Acceptable error.


408 Request Timeout

The server timed out waiting for the request to complete.

This means that the server didn’t receive a full HTTP request within the time it was prepared to wait. Consequently, the server gave up and killed the connection.

Possible reasons for this error include an unstable network connection, an overloaded system or a server configuration problem. Quite often, this is a temporary issue that resolves on its own after a time.


409 Conflict

The request could not be processed due to a conflict on the server.

This status code is supposed to be used in cases where the request conflicts with another request or with the server configuration. For example, you may see this error when multiple simultaneous updates cause version conflicts. Ideally, the response should include enough information for the user to resolve the issue.

In practice, we most often see this error with websites served by Cloudflare (a large content delivery network) when there’s a problem/conflict in the domain’s DNS settings.


410 Gone

The requested resource is no longer available and will not be available again.

This error is a more specific version of 404 Not Found. It indicates that a resource was available in the past but has intentionally been removed and will no longer be available at any URL.

If an outgoing link on your website generates this error, it’s best to remove the link or replace it with a link to a different site.


414 URI Too Long

The request URL is too long for the server to process.

Some web servers impose a limit on the length of URLs they are willing to accept. If a request for a URL longer than that limit comes in, they refuse the request with a 414 URI Too Long error.

This error often occurs with dynamically generated pages that contain dynamically generated URLs leading to other pages. If each page takes its own URL and appends something new, the resulting URLs get longer and longer.


421 Misdirected Request

The request was sent to the wrong server.

This error indicates that the server is unable or unwilling to generate responses for the requested combination of scheme, host, and port.

For instance, some web servers use the 421 code to signal that access is no longer available via http and further requests should use https instead.


423 Locked

The requested resource is locked.

This status code is only intended to be used with WebDAV (a protocol for transferring files) and shouldn’t occur when browsing the web. Nevertheless, we sometimes see 423 Locked errors on websites that have been temporarily suspended by their web hosting companies.


429 Too Many Requests

Too many requests were sent in too short a period of time.

Many web servers have rate limits in place that affect how many requests a client is allowed to make in a given amount of time. If the rate limit is exceeded, a 429 status code is returned.

When our crawler comes across a 429 code, it doesn’t consider the link to be broken, but instead marks it as blocked.


451 Unavailable for legal reasons

The server refused to fulfill the request for legal reasons.

The code 451 signals censorship and is a reference to the dystopian novel Fahrenheit 451, where books are banned and burnt. It is typically used in the one of the following situations:

  • The website is not allowed to be accessed from the visitor’s country, possibly due to violating a specific national law. We often see 451 errors being served by websites that block access from European visitors for GDPR (General Data Protection Regulation) compliance reasons.
  • The operator of the server received a legal order to take down the requested content. The video platform Vimeo, for instance, uses the 451 error code to indicate that a video was removed for copyright infringement.

500 Internal Server Error

The server encountered an unexpected error.

This is a generic “catch-all” error that is used for all kinds of different server problems, including the following:

  • A configuration issue, e.g. a syntax error in the .htaccess file.
  • Incorrect permissions on a file or directory on the server.
  • A bug in a script or a server component.
  • An overloaded or otherwise unavailable database.

501 Not Implemented

The server lacks the functionality required to fulfill the request.

This error is often used to indicate that a resource or function is not available yet but planned for later. In that case you can consider it a “coming soon” response.

Some web hosting companies return a 501 Not Implemented status if a customer’s domain or website has not yet been fully set up.


502 Bad Gateway

The server was acting as a proxy for a different server, and that server didn’t respond as expected.

It’s a common setup to have a proxy server accept incoming requests and forward them to one or more other servers. A specific example is an NGINX web server that proxies requests and passes them on to a PHP-FPM service running a PHP application. Another example is a load balancer that distributes incoming traffic across multiple web servers.

If the origin server returns an invalid response (like a malformed HTTP header) or is not available (due to being overloaded or down for maintenance), the proxy typically returns a 502 Bad Gateway error back to the requester.


503 Service Unavailable

The server is temporarily unable to process the request.

It’s possible that the server is overloaded, down for maintenance, or has some other problem that is expected to be resolved soon.

We sometimes also see this status code being used to indicate that a request has been blocked. This can be the case if the server flagged our crawler as an unwanted bot.


504 Gateway Timeout

The server was acting as a proxy for a different server, and that server didn’t respond in time.

This error is similar to 502 Bad Gateway, except that 502 is typically returned when the proxy received an invalid response, while 504 should be used to indicate that the origin server didn’t respond at all (within the time the proxy was willing to wait).

Possible reasons are that the upstream server is currently overloaded or temporarily down for maintenance.


509 Bandwidth Limit Exceeded

The website has been temporarily suspended for exceeding its allowed traffic limit.

Status code 509 is often used by web hosting providers that limit the amount of bandwidth customers can use for their sites. If this limit is reached, the website is automatically suspended for the remainder of customer’s the billing period.


520 Unknown Error

The origin server returned an unknown error.

This error code is not officially specified anywhere, but is used by several CDNs (content delivery networks) to indicate an unspecific problem with the origin server that the request was forwarded to. It’s possible that the origin server unexpectedly reset the connection or that it returned an invalid HTTP response to the CDN servers.


521 Web Server Is Down

The origin server is not returning a connection.

This status code is not specified by any standard, but it’s used by Cloudflare to indicate that the origin server (that the request was supposed to be forwarded to) is down or blocking requests from the Cloudflare network. Specifically, it means that Cloudflare tried to connect to the origin server but received a connection refused error.


522 Connection Timed Out

The attempt to connect to the origin server timed out.

This is another unofficial status code used by Cloudflare. It signals that the Cloudflare server couldn’t establish a full TCP connection to the origin web server within the time it was prepared to wait. There might be a firewall in place that is blocking Cloudflare’s requests, or the server is currently overloaded and unresponsive.


523 Origin Is Unreachable

The origin server could not be reached.

This status code is not defined in any official specification but is Cloudflare-specific. If Cloudflare servers respond with this code, it typically means that the DNS records for the origin server (which was supposed to provide the actual response) are incorrect.


524 A Timeout Occurred

The origin server didn’t provide an HTTP response in time.

This Cloudflare-specific status code indicates that Cloudflare successfully connected to the origin server, but it didn’t reply with an HTTP response within the time the Cloudflare server was willing to wait (typically 100 seconds).

Most likely the origin web server is overloaded and therefore too slow or unable to respond.


525 SSL Handshake Failed

The SSL/TLS handshake with the origin server failed.

This response code is used by Cloudflare when one of their servers fails to negotiate an SSL/TLS handshake with an origin web server.

A missing SSL certificate is a common cause of the 525 SSL Handshake Failed error.


526 Invalid SSL certificate

The origin’s SSL certificate could not be validated.

Cloudflare servers return this error when they are unable to successfully validate the SSL certificate presented by the origin web server.

This typically happens for one of the following reasons:

  • The certificate is expired.
  • The certificate is self-signed instead of being signed by a trusted certificate authority.
  • The certificate has been revoked.
  • The certificate is issued for a different domain name.

527 Railgun Error

The connection between Cloudflare and the origin’s Railgun Listener was interrupted.

Railgun is a service that speeds up the delivery of dynamic content from an origin server to the Cloudflare network. Taking advantage of the fact that dynamic content is usually mostly static, Railgun compares the generated content to its previous version and sends only the changes. For this to work, a so-called “Listener” needs to be installed on the origin server. This Listener communicates with the Sender component that runs on the Cloudflare servers all over the world.

Error 527 indicates an interrupted connection between the Sender (on a Cloudflare server) and the Listener (on the origin server). Common causes include TLS/SSL-related errors, firewall blacks, and other network issues.


530 Error

An error occurred with the proxied website.

530 is used by Cloudflare as a general error code for different types of issues. The specific error is included in the response and displayed in the browser when visiting the URL.

We see the 530 status code mostly being used to indicate an “Origin DNS error” (Cloudflare-specific error code 1016). This occurs when Cloudflare is unable to resolve the origin server’s IP address via DNS.


Other HTTP Status Code

The server returned an unrecognized HTTP status code.

Although there is an official registry of HTTP status codes, which is maintained by the Internet Assigned Numbers Authority (IANA), nothing prevents programmers from inventing and using their own 3-digit codes.

If the code returned by the server is not in the 2xx or 3xx range, our crawler considers it to be an error code and reports the corresponding link as broken.


Too many redirects

The link was redirected more than 20 times.

A redirect occurs when the web server responds with a redirect HTTP status code (301, 302, 303, 307, or 308) and gives our crawler a new URL at which it can find the requested resources. Our crawler then sends a new request to this URL, to which the server might again respond with a redirect instruction. If this goes on and on, our crawler gives up after following 20 redirects and reports a Too many redirects error.

This issue is often caused by a redirect loop, where a page either redirects to itself or redirects to another page that then redirects back to the original page. If you are getting this error for your own website, you should check your web server’s configuration for errors. Here are a few pointers that may help your troubleshooting efforts:

  • If you use an Apache web server, take a look at your site’s .htaccess file. It might contain a faulty RewriteRule, possible the one for redirecting http:// URLs to https:// URLs.
  • If you use NGINX, inspect your website’s configuration file for rewrite and return directives.
  • If you are using WordPress, make sure that the WordPress Address (URL) and Site Address (URL) settings are correctly configured. If you are unable to access your site’s admin area, you will need to edit the wp-config.php file directly on the server.
  • Try disabling plugins in WordPress, or whatever content management system you are using, one by one.
  • If your website is served through a content delivery network (CDN) like Cloudflare or Akamai, look into the CDN’s HTTPS settings and try clearing the CDN cache.

Bad content encoding

The encoding of the HTTP response body could not be recognized.

A web server typically includes a Content-Encoding header in the response to indicate whether and how the data is compressed (example: Content-Type: gzip). If no Content-Encoding header is provided, it is assumed that the response body is uncompressed and plain-text.

The following Content-Encoding values are supported by our crawler:

  • (empty or missing value), identity, none: No compression is used.
  • deflate: The data is compressed using the deflate algorithm as implemented by zlib.
  • gzip, x-gzip: The data is compressed using the gzip algorithm. This is the most common server compression method.
  • br: The data is compressed using the Brotli algorithm.

Our crawler reports a Bad content encoding error in the following situations:

  • The server returns a Content-Encoding value not included in the list above. We sometimes see servers returning Content-Encoding: UTF-8, which is apparently the result of a mix-up of Content-Encoding and Content-Type.
  • The server specifies that the body is encoded in one format, but then actually delivers it using a different encoding.
  • The server’s payload data is corrupt and cannot be decoded using the specified algorithm.

In any case, this issue is usually caused by a configuration or programming error on the server side.


Crawler trap

The link appears to be part of a so-called crawler trap.

A crawler trap is an issue with a website that results in crawlers discovering a never-ending number of new links while navigating from page to page as it follows each new link. Below are three common examples:

  • Infinite calendar: Some websites have online calendars of events or announcements with automatically generated “Next week” and “Next month” links. If the date range is not limited, a crawler will go through every calendar page it can find, up to the year 5000 and beyond.
  • Expanding URLs: Simply forgetting a slash and linking to “page2/” instead of “/page2/” can result in an infinite chain of continuously expanding URLs such as https://example.com/, https://example.com/page2/, https://example.com/page2/page2/, https://example.com/page2/page2/page2/, etc.
  • Filter combinations: Online stores often allow shoppers to filter and sort products by category, price, brand, rating, color, and a myriad of other criteria. If filter values can be combined arbitrarily and each combination leads to a new page with a new URL, this results in a virtually infinite number of links to crawl.

Our crawler attempts to detect traps like these by looking for characteristic patterns in a link’s URL structure and the overall structure of the website. The algorithm is designed to identify as many crawler traps as possible, while not classifying legitimate links as traps. If you think a link was mistakenly flagged, please get in touch to let us know.

In general, there are good reasons for avoiding crawler traps:

  • They cause search engines to waste crawl budget on irrelevant pages. If a search engine bot gets lost in a crawler trap, it may never get around to indexing other pages with real, new content.
  • A crawl trap consisting of a large number of identical pages can hurt the website’s SEO by diluting the authority of the original pages across the duplicates.
  • A crawler falling deeper and deeper into a crawler trap can burden a web server to the point of an outage.

It’s always best to resolve crawler traps by making changes to the website’s code. If that’s not possible or too cumbersome, you can instead block the trap URLs in the website’s robots.txt file. For instance, if your online store uses query parameters named “category” and “color” for filtering (as in https://example.com/products?category=shoes&color=black), you can instruct crawlers to ignore filter links with the following robots.txt instructions:

User-agent: *
Disallow: /*?*category=
Disallow: /*?*color=

MX record not found

There is no mail server configured for the email address’s domain name.

When our crawler discovers a mailto link (like mailto:john.doe@example.com), it performs a DNS query to retrieve the MX records for the recipient’s domain name. An MX (short for mail exchange) record contains the address of the mail server responsible for handling emails for a domain. If there is no MX record present, it’s likely that emails sent to the specified address will bounce.


Unknown error

There was an error of unknown type when checking the link.

This is an error you should ideally never see, because it means that something unexpected happened. It’s possible that the target server sent an invalid response that our crawler didn’t know how to handle or that the crawler crashed while checking the link.

If you get an Unknown error and suspect the problem to be on our crawler’s side, please let us know and we will look into it.


Blacklisted

The link is blacklisted for hosting phishing or malware content.

Phishing is a scheme where scammers clone the websites of well-known organizations (like banks or e-commerce sites) in order to lure visitors into entering their login credentials. Malware is any kind of software that is designed with malicious intent, such as viruses, ransomware, or spyware.

Websites hosting phishing or malware attacks are often hacked without the owners ever knowing that their servers have been compromised. If you want to inspect a blacklisted website, be careful, because it might try to exploit vulnerabilities in your browser.

In order to determine if a URL is safe, our crawler checks it against up to four different blacklists (depending on your subscription plan):

  • Google Safe Browsing: A blacklist of phishing and malware URLs that is used by Google Search and web browsers like Chrome, Safari, and Firefox to protect users from visiting dangerous websites. You can use Google’s Site Status tool to request more details on a blacklist entry.
  • PhishTank: A blacklist of phishing URLs that is powered by a community of volunteers who submit and review suspected phishing sites.
  • OpenPhish: A blacklist of phishing URLs that were automatically detected using a proprietary phishing detection algorithm.
  • URLhaus: A blacklist of malware URLs that is operated by abuse.ch.

Please note: If a URL appears on a blacklist, it doesn’t necessarily mean that it’s actually dangerous. There is always a chance that a safe website is mistakenly identified as risky.


Soft error

Even though the server responded with a success code (such as 200 OK), this link is considered broken based on the page’s content.


For sale

The link points to a domain or website for sale.

This often happens when a domain name’s registration expires and the domain is purchased by a professional domain investor. The domain is then used to host a “For sale” page that lets visitors know that the domain name is available to buy.


Ads only

The link points to a parked domain filled with nothing but ads.

This is typically the result of a domain expiring and being purchased by someone else. The new owner then monetizes the existing traffic by serving ads to visitors.

Even if the original content on the domain is no longer available, parked domains still return 200 OK. The domain parkers clearly don’t want their domains to be identified, because that would put them at risk of losing valuable backlinks and organic traffic.


Placeholder

The link points to a page with placeholder content.

This can be the default page that comes with the web server or a page with a generic “Coming Soon” or “Under Construction” message.


Out of service

The link points to a website that is no longer in service.

It’s possible that the domain name registration expired, the hosting account was suspended, or maybe the website owner simply shut down the site and put up a “Closed” sign.


No content

The link points to a page with no or little content.

A completely empty page together with a 200 OK HTTP status code is often the result of a configuration error on the server.

We also sometimes see pages with nothing more than a “Test” or “Hello World” message. Even if these pages are set up intentionally, they are not worth linking to.


Directory listing

The link points to a default page that lists the contents of the current directory on the server.

It’s possible to configure web servers (like Apache or NGINX) to automatically list the content of directories that don’t have an index file.

Although there can be reasons to provide directory listings, more often than not a directory listing page is an indicator of something missing on the server, which is why we consider it an error.


Error message

The linked page looks like an error page for a 4xx or 5xx HTTP status code (such as “404 Not Found”, “500 Internal Server Error” or “503 Service Unavailable”).

This means that the web server sent a 2xx success status code in the header, but the text in the response body indicates there has been some kind of server error.

For example, some sites incorrectly send a 200 OK status even though the message displayed in the browser clearly states that the requested file is not available. This is called a soft 404.


Blocked

The link could not be checked because the target server blocked our crawler’s request.

Many websites have measures in place to identify and block unwanted bot traffic. Unfortunately, this means that sometimes our crawler gets blocked as well. In cases like these, the server typically returns an error message (like Request denied or Too many requests) and one of the following HTTP status codes:

  • 403 (Forbidden): This is the most common status code used when blocking requests. We often see this error with websites served through Akamai and Cloudflare.
  • 429 (Too Many Requests): Some servers use this error code to indicate that our crawler is acting too fast and should reduce the rate of requests.
  • 430: This is a non-standard status code used by Shopify servers if too many requests are made from a single IP address in too short a time.
  • 503 (Service Unavailable): Amazon servers return this error when detecting automated requests. Another example is Cloudflare using this code when serving their “I’m Under Attack” DDoS protection page.
  • 999 (Request Denied): This non-standard code is used by LinkedIn servers, which have very strict crawling limits.

Most of the time, the blocking appears on one of these levels:

  • Content delivery network (CDN): Large CDNs like Akamai, Fastly, and Cloudflare offer so-called “bot management” solutions. They use their massive scale to detect, classify, and block automated requests over their entire networks.
  • Firewall: Hardware or software firewalls like F5 Advanced Web Application Firewall and Fortinet FortiWeb include bot detection and mitigation capabilities.
  • Web server: All major web servers like Apache, NGINX, and IIS have built-in or plugin-based support for rate-limiting and blocking requests.
  • Security plugin: There are a variety of plugins for WordPress and other content management systems that promise to protect websites against security threats, including bot attacks. Wordfence and Sucuri Security are typical examples.

If you notice that your website blocks our crawler, you can try the following:

  • Temporarily disable the bot management feature for the time of the check.
  • Contact us for a list of crawler IP addresses to whitelist.
  • Ask us to slow down the crawler for your website to reduce the risk of being blocked.