1. Getting Started

Welcome to Dr. Link Check, a web-based service that scans your website and reports links that need your attention. If a link on your website no longer works or leads to a site with malicious or unwanted content, Dr. Link Check makes sure you are the first to know.

Getting started with Dr. Link Check is easy – simply go to the home page, enter the address of your website, and click on Start Check.

Start Link Check

This will automatically create a temporary account with a free Lite subscription and start a link check for your site. After familiarizing yourself with Dr. Link Check, you should finalize your account by providing an email address and password.

With your email address and password, you always can login to your account via the Login link from the top menu. (If you forget your password, use this link to request a password reset email.)

2. Reports

After logging in, you will by default see the Overview report, which is a dashboard providing a visual summary of the results of the link check.

Overview Report

Other reports can be selected from the sidebar on the left:

  • All Links: All links found on the website
  • Issues
    • All Issues: All broken, blacklisted, or parked links
    • Broken: Non-working links
    • Blacklisted: Links that are blacklisted for hosting phishing or malware attacks
    • Parked: Links leading to parked domains with ads or placeholder content
  • Outbound: Links to external websites
  • New: Links that were added to the website since the last check (if available)
  • Unsupported: Links that use a URL scheme other than http:, https:, data:, or mailto: (typical examples include javascript: and tel: links); it’s recommended to check these links manually on the linking page
  • Blocked: Links that could not be checked because requests from our servers were blocked (some hosts, like linkedin.com, have systems in place to detect and prevent automated activity)

3. Link Details

By hovering over an entry in the link table and clicking on the Details button, you can request more information about a link, including the sources where the link was found and any available redirection URLs.

Link Details

If you want to know where exactly Dr. Link Check found the link in the code, click the Source button next to one of the entries in the Linked from section. The source document will be fetched from the server and will be displayed in a popup with the link locations highlighted.

By clicking on the down arrow next to the Source button and selecting Source link details, you can request the source link’s details to be loaded from the server and displayed in the Link Details dialog. This feature is especially useful for retracing how our crawler arrived at a document because it allows you to jump from link to link up to the URL used to start the check.

4. Custom Reports

A particularly powerful feature of Dr. Link Check is the ability to define filters to show only the links you are interested in. You can use filters for a variety of purposes, including the following:

  • Report all email address links on your website
  • Report internal links that generate a 5xx server error
  • Report outbound links that permanently redirect to new locations
  • Report outbound links that are dofollow
  • Report all URLs within a specific subdirectory
  • Report all JavaScript files loaded from external servers

To filter the results, click the Add button in the Filter bar on top of the link table and select the criteria you want to filter for:

  • Issue: Filter for links that are broken, blacklisted, and/or parked
  • URL: Filter for links whose URLs match or contain a particular text string
  • Scheme: Filter links based on their URL scheme (such as "https" or "mailto")
  • Host: Filter for links whose hostnames match or contain a certain string (such as "www.example.com")
  • Path: Filter links based on the path parts of their URLs (such as "/path/to/page")
  • Direction: Filter for internal or outbound links
  • Is (not) new: Filter for links that were added to the site since the last link check
  • Is (not) changed: Filter for links based on whether the linked document was significantly updated since the last check
  • Redirect type: Filter for links that temporarily (HTTP status codes 302 and 307) or permanently (301, 308) redirect to a new location
  • Link type: Filter links based on the location where they were found in the code (<a href>, <img src>, etc.)
  • Media type: Filter links based on the content type of the linked resource (HTML, image, CSS, etc.)
  • Nofollow/Dofollow: Filter for links with rel="nofollow" attributes (which instruct search engines to ignore the links for ranking purposes)
  • robots.txt status: Filter for internal links that are not allowed to be crawled (based on the allow/disallow rules found in the website’s robots.txt file)
  • Broken check result: Filter links based on the results of checking whether they are functionable
  • HTTP status code: Filter http:// and https:// links based on the HTTP status code received from the server

Link Filter

When adding multiple criteria, only links matching all those criteria will be returned. If you want to remove a criterion from the filter, click the x icon next to it.

Once you have defined a custom filter, you will notice a new button, Save as Custom Report, at the top right-hand corner of the reports table. This button lets you save the current report and add a shortcut to the sidebar.

4.1. Advanced Filter Rules

Instead of creating a filter by point and click, you can also specify it in textual form. This feature is intended for advanced users who need to define filters that cannot be expressed via the normal user interface. To turn the filter into text-edit mode, double click somewhere on an empty spot in the filter bar.

Link Filter Rule

A simple filter rule follows the pattern below:

<Property> <Comparison Operator> <Value>

<Property> can be any of the following:

  • Url: The full URL of the link (as a string)
  • Scheme: The scheme part of the URL, e.g. "https" or "mailto" (string)
  • Host: The hostname part of the URL, e.g. "example.com" or "www.example.com" (string)
  • Port: The port number part of the URL, e.g. 80 or 443 (number value)
  • Path: The absolute path of the URL, e.g. "/path/to/page" or "/" (string)
  • Query: The query string part of the URL, including the leading question mark, e.g. "?name=ferret&color=purple" (string)
  • Status: The current status of the link as part of the link check (as one of the enumeration values below)
    • Queued: The link is queued to be checked
    • InProgress: The link is currently being checked
    • Checked: The link was successfully checked
    • Unsupported: The link was not checked because it has an unsupported URL scheme (like “tel” in “tel:+1-555-1234567”)
    • Aborted: The check of the link was aborted
    • Failed: An error occurred while checking the link
    • Blocked: The link could not be checked because the request from our server was blocked
  • Direction: The direction at which the link is pointing (enumeration value)
    • Internal: The link points to an internal resource
    • Outbound: The link points to a resource outside of the current website
  • IsNew: Specifies whether the link was added to the website since the last link check (Boolean value)
  • IsChanged: Specifies whether the linked document was significantly updated since the last link check (Boolean value)
  • RedirectType: Specifies how the link was redirected to a new location, if applicable (enumeration value)
    • Permanent: The redirection is permanent (HTTP status code 301 or 308)
    • Temporary: The redirection is temporary (HTTP status code 302 or 307)
  • LinkType: The location where the link was found in the code (enumeration value)
    • AuthUrl: The URL which was used to authenticate with the server
    • StartUrl: The URL with which the link check was started
    • Ahref: Anchor element (<a href="URL">)
    • ImgSrc: Image element (<img src="URL">)
    • LinkStylesheet: Link stylesheet element (<link rel="stylesheet" href="URL">)
    • ScriptSrc: Script element (<script src="URL">)
    • MetaRefresh: Meta refresh element (<meta http-equiv="refresh" content="0; url=URL">)
    • FrameSrc: Frame element (<frame src="URL"> or <iframe src="URL">)
    • SocialMetaTag: Open Graph (Facebook) or Twitter Card meta tag
    • CssImport: CSS @import rule
    • SitemapLoc: The URL was found in an XML sitemap file
    • CssUrl: CSS url() function
    • Other: The URL was found somewhere else in the code
  • MediaType: The content type of the linked resource (enumeration value)
    • Html: HTML document
    • Image: Image file
    • Css: CSS (style sheet) file
    • JavaScript: Script file
    • Json: JSON document
    • Font: Font file
    • Xml: XML document
    • XmlSitemap: XML sitemap
    • Text: Human-readable text file
    • Audio: Audio file
    • Video: Video file
    • Binary: Other binary (not human-readable) file
    • Unknown: File of unknown content
  • NoFollow: Specifies whether the link has a rel="nofollow" attribute (boolean value)
  • NoIndex: Specifies whether a noindex directive was found, instructing web crawlers not to index the document (boolean value)
  • DisallowedByRobotsTxt: Specifies whether the website’s robots.txt file instructs crawlers to ignore the link (boolean value)
  • BrokenCheckResult: The result of checking whether the link is functionable (enumeration value)
    • Ok: The link works fine
    • InvalidUrl: The link’s URL is not properly formatted
    • UnsupportedScheme: The link’s URL uses a scheme that is not supported
    • HostNotFound: The hostname could not be resolved
    • ConnectError: Failed to establish a connection to the server
    • SslHandshakeError: Failed to complete an SSL/TLS handshake with the server
    • SslCertProblem: The server’s SSL certificate failed verification
    • SendReceiveError: An error occurred while sending the request to the server or receiving the response from it
    • Timeout: The server didn’t respond in time
    • HttpErrorCode: The server returned an HTTP error status code
    • TooManyRedirects: The link was redirected more than 20 times
    • BadContentEncoding: The transfer encoding could not be recognized
    • CrawlerTrap: A so-called crawler trap was detected, meaning the website produced an unusual high amount of irrelevant links without any new unique content
    • MxRecordNotFound: No MX record was found for the email address’s domain name
    • UnknownError: An error of unknown type occurred
  • HttpResponseCode: The final HTTP status code received from the server, if available (number value)
  • BlacklistCheckResult: The result of checking whether the link is blacklisted (enumeration value)
    • Ok: The link is not blacklisted
    • Blacklisted: The link is blacklisted for hosting a phishing or malware attack
  • ParkedCheckResult: The result of checking whether the link points to a parked domain (enumeration value)
    • Ok: The link doesn’t point to a parked domain
    • Parked: The link points to a parked domain

As <Comparison Operator>, you can choose from these:

  • =: Is equal to
  • !=: Is not equal to
  • CONTAINS: Contains string
  • STARTSWITH: Begins with string
  • ENDSWITH: Ends with string
  • >: Is greater than
  • <: Is less than
  • >=: Is greater than or equal to
  • <=: Is less than or equal to

And <Value> is either a string enclosed in double quotes ("example.com"), a number (404), a boolean value (true, false), or an enumeration value depending on the chosen property (see above).

With this information, you can construct simple filters like the ones below:

Url STARTSWITH "https://www.example.com/path/"

Direction = Internal

MediaType != Html

Logical operators (AND, OR) and parentheses allow you to create more complex filters:

HttpResponseCode >= 500 AND HttpResponseCode <= 599

Direction = Outbound AND (LinkType = ScriptSrc OR LinkType = LinkStylesheet)

Expressions can also be negated by prepending them with NOT:

NOT (MediaType = Image OR MediaType = Audio OR MediaType = Video)

5. Export

If you are on the Professional or Premium plan, you can export the entire results of a report to CSV or PDF format. The Export to CSV option generates a file that can be opened in spreadsheet software like Microsoft Excel or Apple Numbers for further processing. If you are looking for a document suitable for printing, choose the Export to PDF option.

Export Options

6. Rerun a Link Check

After a check has completed or was aborted, you can restart it via the Rerun Check button on the Overview page.

Rerun Check Button

7. Projects

In Dr. Link Check, a project comprises the settings and check results for a specific website.

7.1 Add Project

To add a new project (and start a new link check), click the + button at the top of the sidebar.

Add Project Button

You will need to provide the following input:

New Project Dialog

  • URL to check: The check starts with the address you enter into this field. If you have several URLs that you need checked, enter them on separate lines – the input field will automatically expand and will allow you to submit up to 10,000 URLs.
  • URLs to crawl: This setting determines which URLs are considered “internal” as belonging to your website. Internal links will be followed and scanned for additional links.
    • Same root domain (*.example.com/*): URLs with the same root domain as the start URL (or one of the start URLs if several are present) are crawled. For instance, if you enter https://www.example.com/ as the start URL, our crawler considers https://subdomain.example.com/ as belonging to your website. This is the default option and typically the right choice if you want your entire website, including all subdomains, to be checked.
    • Same domain (www.example.com/*): URLs with the exact same domain name as the start URL (or one of the start URLs if several are present) are crawled. For instance, if you enter https://www.example.com/ as the start URL, our crawler considers https://subdomain.example.com/ as an outbound link not belonging to your website.
    • Same folder (www.example.com/folder/*): URLs with the same domain name and the same directory path prefix as the start URL (or one of the start URLs if several are present) are crawled. For instance, if you enter https://www.example.com/path/to/page1.html as the start URL, our crawler considers https://www.example.com/path/to/page2.html as internal and https://www.example.com/index.html as outbound.
    • Exact URL (www.example.com/folder/page.html): Only the provided start URLs are crawled. This means that the start URLs and all links one hop away are checked.
    • None: No URLs are crawled. This means that only the provided start URLs are checked without trying to discover any further links. If you have a list of different URLs and want to find out which of them are working, this is the option you want.
    • Custom rule: Choosing this option allows you to enter your own rule to determine which URLs to consider “internal” and crawl for more links. See below for examples and detailed information on how to write custom crawl rules.
  • Check frequency: This allows you to schedule how often to run the link check.
    • Just once: The check is run only once and will not be repeated automatically.
    • Recurring: The check is run immediately after creating the project, followed by automatic checks on a monthly, biweekly, weekly, or daily basis.
      • Monthly/biweekly checks are only available in the Standard plan and above.
      • Weekly checks are only available in the Professional plan and above.
      • Daily checks are only available in the Premium plan.

Additional settings are available in the Advanced Settings section:

  • Project Name: This is the name of the project as it appears in the sidebar and on the Overview page. If you leave this field empty, the project name will be automatically generated from the (first) start URL.
  • Respect robots.txt allow/disallow rules?: Many websites have a robots.txt file to tell search engine bots and other crawlers what they are allowed and not allowed to do on the site. Our crawler processes the Disallow and Allow rules for the user agents “Googlebot” (Google’s web crawler) and “*” (a wildcard for all other bots). Internal links that are disallowed for both “Googlebot” and “*” are excluded from the check. The start URL(s) and external links (pointing to other websites) are never excluded, regardless of what the robots.txt file specifies. Other robots.txt directives, such as Crawl-delay and Sitemap, are currently not supported. If you don’t want our crawler to obey to the rules found in a site’s robots.txt file, activate the Ignore robots.txt option.
  • Ignore links if…: This is a rule that defines which links to completely ignore and not to include in the report. If no rule is specified, all found links will be checked. For information on the rule syntax, including examples, see the section below.
  • Email results of recurring check to…: This setting only applies if you have selected Recurring as check frequency. For each completed scheduled check, an overview of the results is sent to the email address listed here. If you want the email to be delivered to multiple recipients, enter the recipient addresses separated by comma.
    • If you don’t want to be notified about checks that didn’t reveal any broken, blacklisted, or parked links, check the Only send if issues were found option.

7.2.1 Crawl and Ignore Rules

A simple crawl or ignore rule follows the pattern

<Property> <Comparison Operator> <Value>

with <Property> being one of the following:

  • Url: The full URL of the link
  • Scheme: The scheme part of the URL, e.g. "https" or "mailto"
  • Host: The hostname part of the URL, e.g. "example.com" or "www.example.com"
  • Port: The port number part of the URL, e.g. 80 or 443
  • Path: The absolute path of the URL, e.g. "/path/to/page" or "/"
  • Query: The query string part of the URL, including the leading question mark, e.g. "?name=ferret&color=purple"
  • PathAndQuery: The path and query parts of the URL, e.g. "/path/to/page?name=ferret&color=purple"
  • HtmlElement: A CSS-like element selector to match a link’s HTML tag, e.g. "div.sidebar > a" (see this blog post for more information)
  • LinkDepth: The distance between the link and the start URL, i.e. 0 for the start URL itself, 1 for links found directly on the start page, 2 for links found on pages linked from the start page, etc.

The supported <Comparison Operator>s are:

  • =: Is equal to
  • !=: Is not equal to
  • CONTAINS: Contains string
  • STARTSWITH: Begins with string
  • ENDSWITH: Ends with string
  • >: Is greater than
  • <: Is less than
  • >=: Is greater than or equal to
  • <=: Is less than or equal to

<Value> can be either a string enclosed in double quotes ("example") or a number (123).

This allows you to construct simple rules like the ones below:

Scheme = "https"

Url STARTSWITH "https://www.example.com/path/"

Path ENDSWITH ".html"

Port = 81

HtmlElement = "img"

LinkDepth > 2

For more complex rules, the rule language offers support for logical operators (AND, OR) as well as parentheses for grouping expressions:

(Host = "example.com" OR Host ENDSWITH ".example.com") AND Path STARTSWITH "/path/"

You can also negate a rule by prepending NOT:

NOT (Path ENDSWITH ".png" OR Path ENDSWITH ".jpg" OR Path ENDSWITH ".gif")

7.2 Edit Project

If you want to make changes to the settings of a project, select the project from the drop-down menu in the sidebar and click the wrench icon next to it.

Edit Project Button

7.3 Delete Project

You can delete a project by opening the drop-down menu in the sidebar, hovering over the project’s menu item, and clicking on the trash icon.

Delete Project Button

8. Subscription

Up-to-date details on your subscription are always visible at the bottom of the sidebar. You can see the plan you are currently on and the available link quota.

Subscription Details

A click on the wrench icon opens the Subscription Settings dialog, which allows you to upgrade, downgrade, or cancel your subscription as well as update your billing information.

When upgrading to a more expensive plan, the change will take effect immediately, and you will be billed for the price difference. A downgrade or cancellation takes effect at the end of the current term.

Subscription Settings

Your subscription payments are securely processed by our e-commerce partner Paddle, which accepts all major credit cards and PayPal.

9. Account

You can verify and change your account settings by clicking on Account at the right-hand side of the title bar and selecting Account Settings from the drop-down menu.

Account Menu

A complete account requires your full name, email address, and a password that’s at least six characters long.

10. FAQ

10.1 What type of links does Dr. Link Check find and check?

Dr. Link Check finds links in HTML documents (supporting HTML tags like <a>, <area>, <frame>, <iframe>, <img>, <script>, <audio>, <video>, and several more) and CSS files (supporting @import and url(...)). The crawler currently isn’t able to execute JavaScript code and search for links in JavaScript-generated pages.

Supported URL schemes include “http,” “https,” “data,” and “mailto.” Links with “http” and “https” schemes are checked by connecting to the server and requesting the resource, “data” URLs are checked for syntax errors, and for “mailto” links, the crawler verifies that the email address’s domain exists and has MX records.

10.2 Why is a link reported as broken even if it works fine in my browser?

Sometimes problems are just temporary–maybe the target server was overloaded at the time of the check or there was a hiccup somewhere in the network. Issues that often resolve themselves over time include the following: “Timeout,” “Connect error,” “Send error,” “Receive error,” and HTTP 5xx server errors.

Other times, web servers block or limit requests from our servers. For instance, linkedin.com servers deny all requests originating from the Amazon cloud (where our servers are located) with a “999 Request Denied” response. Many servers also have a rate-limiting mechanism in place that blocks requests or slows down responses after a certain number of hits. In these cases, you will typically see 429 (Too Many Requests) or sometimes 403 (Forbidden) and 503 (Service Unavailable) HTTP error codes.

10.3 How much stress does a link check place on my server?

Dr. Link Check limits parallel downloads from a single host to a maximum of two. This is less than what is generally allowed by modern web browsers, which typically allow up to six (Chrome, Firefox) or more (Internet Explorer) connections per host.

10.4 What if I have a question that is not answered here?

You may contact us at any time with any questions you have. We are glad to help!

Still have questions?

Contact us by email or via the contact form.

Follow Us