Getting Started

Welcome to Dr. Link Check, a web-based service that scans your website and reports links that need your attention. If a link on your website no longer works or leads to a site with malicious or unwanted content, Dr. Link Check makes sure you are the first to know.

Getting started with Dr. Link Check is easy: simply go to the home page, enter the address of your website, and click on Start Check.

Start Link Check

You will automatically be taken to the Overview report, where you can monitor the progress while the crawler goes through the site and examines all links it can find.

If you weren’t already logged into an account, a new one with a free Lite subscription will have been created for you. This account is temporary and will be deleted once you log out, unless you make it permanent by providing your name, email address and a password.

Account

With your email address and password, you can log in to your account via the Login link from the top menu.

If you don’t remember your account password, use this link to request a password reset email.

After logging in, you can access your account settings by clicking on Account at the right-hand side of the title bar and selecting Account Settings from the drop-down menu.

Account Menu

A complete account requires your full name, email address, and a password that’s at least six characters long.

Please note that changing your account’s email address will not automatically update your billing email address (which is listed under Subscription Settings).

Subscription

Your subscription determines which features are available and how many links can be checked per website.

The subscription plan you are currently on and the maximum number of links allowed per website are always visible at the bottom of the sidebar.

Subscription Details

Clicking on the wrench icon opens the Subscription Settings dialog, which allows you to upgrade, downgrade, or cancel your subscription as well as update your billing information.

Subscription Settings

When upgrading to a more expensive plan, the change will take effect immediately, and you will be billed for the price difference. A downgrade or cancellation takes effect at the end of the current term. This means that after clicking the Cancel Subscription button, you still have full access to the service until the end of the current billing period.

Your subscription payments are securely processed by our e-commerce partner Paddle, which accepts all major credit cards and PayPal.

If you need your billing contact email address to be updated, please send us a message and we will make the change in Paddle’s ordering system – unfortunately, this is something that cannot be automated and has to be done manually.

Project

A project comprises the settings and check results for a specific website.

The name of the currently active project is displayed at the top of the left sidebar. If you have several projects, you can switch between them by selecting an item from the menu.

Projects Menu

To add a new project (and start a new link check), click the + button at the top of the sidebar.

Add Project Button

Project Settings

You can configure the following settings for your project:

New Project Dialog

  • URL to check: The check starts with the address you enter into this field. If you have several URLs that you need checked, enter them on separate lines. The input field will automatically expand and allow you to submit up to 10,000 URLs.
  • URLs to crawl: This setting determines which URLs are considered “internal” (belonging to your website). Internal links will be followed and scanned for additional links.
    • Same root domain (*.example.com/*): URLs with the same root domain as the start URL (or one of the start URLs if several are present) are crawled. For instance, if you enter https://www.example.com/ as the start URL, our crawler considers https://subdomain.example.com/ to belong to your website. This is the default option and typically the right choice if you want your entire website, including all subdomains, to be checked.
    • Same domain (www.example.com/*): URLs with the exact same domain name as the start URL (or one of the start URLs if several are present) are crawled. For instance, if you enter https://www.example.com/ as the start URL, our crawler considers https://subdomain.example.com/ to be an outbound link not belonging to your website.
    • Same folder (www.example.com/folder/*): URLs with the same domain name and the same directory path prefix as the start URL (or one of the start URLs if several are present) are crawled. For instance, if you enter https://www.example.com/path/to/page1.html as the start URL, our crawler considers https://www.example.com/path/to/page2.html to be internal and https://www.example.com/index.html to be outbound.
    • Exact URL (www.example.com/folder/page.html): Only the provided start URLs are crawled. This means that the start URLs and all links one hop away are checked.
    • None: No URLs are crawled. This means that only the provided start URLs are checked, without trying to discover any further links. If you have a list of different URLs and want to find out which of them are working, this is the option you want.
    • Custom rule: Choosing this option allows you to enter your own rule to determine which URLs to consider “internal” and crawl for more links. See below for examples and detailed information on how to write custom crawl rules.
  • Check frequency: This allows you to schedule how often to run the link check.
    • Just once: The check is run only once and will not be repeated automatically.
    • Recurring: The check is run immediately after creating the project, followed by automatic checks on a monthly, biweekly, weekly, or daily basis.
    • Monthly/biweekly checks are only available in the Standard plan and above.
    • Weekly checks are only available in the Professional plan and above.
    • Daily checks are only available in the Premium plan.

Additional settings are available in the Advanced Settings section:

  • Project Name: This is the name of the project as it appears in the sidebar and in the Overview report. If you leave this field empty, the project name will be automatically generated from the (first) start URL.
  • Respect robots.txt allow/disallow rules?: Many websites have a robots.txt file to tell search engine bots and other crawlers what they are allowed and not allowed to do on the site. Our crawler processes the Disallow and Allow rules for the user agents “Googlebot” (Google’s web crawler) and “*” (a wildcard for all other bots). Internal links that are disallowed for both “Googlebot” and “*” are excluded from the check. The start URL(s) and external links (pointing to other websites) are never excluded, regardless of what the robots.txt file specifies. Other robots.txt directives, such as Crawl-delay and Sitemap, are not supported. If you don’t want our crawler to obey to the rules found in a site’s robots.txt file, activate the Ignore robots.txt option.
  • Ignore links if…: This is a rule that defines which links to exclude from being checked. If no rule is specified, all found links will be checked. For information on the rule syntax, including examples, see the section below.
  • Email results of recurring check to…: This setting only applies if you have selected Recurring as the check frequency. For each completed scheduled check, an overview of the results will be sent to the email address listed here. If you want the email to be delivered to multiple recipients, enter the recipient addresses separated by commas.
    • If you don’t want to be notified about checks that didn’t reveal any problematic links, check the Only send if issues were found option.

Crawl and Ignore Rules

Dr. Link Check supports a rule language that allows you to define exactly which links to crawl for more links and which to ignore entirely. A simple crawl or ignore rule follows the pattern

<Property> <Comparison Operator> <Value>

with <Property> being one of the following:

  • Url: The full URL of the link
  • Scheme: The scheme part of the URL, e.g. "https" or "mailto"
  • Host: The hostname part of the URL, e.g. "example.com" or "www.example.com"
  • Port: The port number part of the URL, e.g. 80 or 443
  • Path: The absolute path of the URL, e.g. "/path/to/page" or "/"
  • Query: The query string part of the URL, including the leading question mark, e.g. "?name=ferret&color=purple"
  • PathAndQuery: The path and query parts of the URL, e.g. "/path/to/page?name=ferret&color=purple"
  • HtmlElement: A CSS-like element selector to match a link’s HTML tag, e.g. "div.sidebar > a" (see this blog post for more information)
  • LinkDepth: The distance between the link and the start URL, i.e. 0 for the start URL itself, 1 for links found directly on the start page, 2 for links found on pages linked from the start page, etc.
  • LinkType: The location where the link was found in the code (enumeration value)
    • Ahref: Anchor element (<a href="URL">)
    • ImgSrc: Image element (<img src="URL">)
    • LinkStylesheet: Link stylesheet element (<link rel="stylesheet" href="URL">)
    • ScriptSrc: Script element (<script src="URL">)
    • MetaRefresh: Meta refresh element (<meta http-equiv="refresh" content="0; url=URL">)
    • FrameSrc: Frame element (<frame src="URL"> or <iframe src="URL">)
    • SocialMetaTag: Open Graph (Facebook) or Twitter Card meta tag
    • CssImport: CSS @import rule
    • CssUrl: CSS url() function
    • JavaScriptLocation: JavaScript-triggered location change
    • JavaScriptOpen: JavaScript open(…) function
    • RobotsTxtSitemap: Link to an XML sitemap found in a robots.txt file
    • SitemapLoc: The URL was found in an XML sitemap file
    • Other: The URL was found somewhere else in the code
  • NoFollow: Specifies whether the link has a rel="nofollow" attribute (boolean value)

The supported options for <Comparison Operator> are:

  • =: Is equal to
  • !=: Is not equal to
  • CONTAINS: Contains string
  • STARTSWITH: Begins with string
  • ENDSWITH: Ends with string
  • >: Is greater than
  • <: Is less than
  • >=: Is greater than or equal to
  • <=: Is less than or equal to

<Value> can be either a string enclosed in double quotes ("example") or a number (123).

This allows you to construct simple rules like the ones below:

Scheme = "https"

Url STARTSWITH "https://www.example.com/path/"

Path ENDSWITH ".html"

Port = 81

HtmlElement = "img"

LinkDepth > 2

For more complex rules, the rule language offers support for logical operators (AND, OR) as well as parentheses for grouping expressions:

(Host = "example.com" OR Host ENDSWITH ".example.com") AND Path STARTSWITH "/path/"

You can also negate a rule by prepending NOT:

NOT (Path ENDSWITH ".png" OR Path ENDSWITH ".jpg" OR Path ENDSWITH ".gif")

Rerun Check

If you want to check your website again, you don’t need to create a new project. Simply go the Overview report and click the Rerun Check button.

Rerun Check Button

Delete Project

Your subscription includes a limited number of projects. If you reach this number, you can either upgrade to a higher plan or make room by deleting one of the existing projects.

To delete a project, open the drop-down menu in the sidebar, hover over the project’s menu item, and click on the trash icon.

Delete Project Button

Reports

In the sidebar on the left, you can select one of the the following reports.

Overview Report

The Overview report provides a high-level summary of the results of the link check.

Overview Report

  • Total Links: The total number of unique URLs found during the crawl.
  • Links with Issues: The number of links that are broken, blacklisted or in some other way problematic. You should aim to get this number down to zero.
  • New Links: The number of new links found on the website since the last crawl. This value is only available after re-running a check and can be a useful change indicator.
  • Issue Types: A breakdown of links by error type.
  • Link Types: A breakdown of links by their location in the HTML or CSS code.
  • Top Hosts: A breakdown of links by their host, which is either a domain name or an IP address.
  • Link Schemes: A breakdown of links by their URL scheme (the part at the beginning of the URL, like http, https, or tel).
  • Redirects: A breakdown of links by redirect type.
    • Permanent HTTP redirect: Redirects via HTTP response status codes 301 and 308. If a redirect is permanent, it makes sense to update the link to point to the new location. This avoids unnecessary HTTP requests and lowers the risk of breaking in the future.
    • Temporary HTTP redirect: Redirects via HTTP response status codes 302, 303, and 307.
    • HTTP refresh redirect: Redirects triggered by Refresh HTTP response headers.
    • Meta refresh redirect: Redirects triggered by meta refresh HTML elements.
    • JavaScript redirect: Redirects initiated by JavaScript code.
    • Frame redirect: Redirects implemented by loading the redirect target in a full-size frame.
  • Dofollow/Nofollow: A breakdown of links according to whether they are marked as nofollow (telling search engines to ignore the link for ranking purposes) or not. This information can be useful for SEO (search engine optimization) purposes, since dofollow links should only be set for high-quality and verified targets.

Clicking one of these items takes you to a tabular report of matching links.

Link Reports

Dr. Link Check provides several tabular reports listing links by different criteria.

  • All Links: A report of all links found on the website.

  • Issues

    • All Issues: All broken, blacklisted, or soft error links.
    • Broken: Non-working links.
    • Blacklisted: Links that are blacklisted by Google or other services for hosting phishing or malware attacks.
    • Soft errors: Links we consider broken, based on the content of the linked page, even though the server returned a 2xx success status code.
  • Outbound: Links to other websites/domains.
  • New: Links that were added to the website since the last check (if available).
  • Unsupported: Links that use a URL scheme other than http, https, data, or mailto (typical examples include javascript and tel links), which cannot be checked by our crawler. We recommend that you check these links manually.
  • Blocked: Links that could not be checked because requests from our servers were blocked (some hosts, like linkedin.com, have systems in place to detect and prevent automated activity).

Each report has two columns:

  • Result: The current status of the link (Queued, Checking…, etc.) or the check result, which is either OK or an error such as 404 Not found. When hovering over the text, a tooltip gives you more details.
  • URL: The absolute URL of the link. This URL is not necessarily in the same format as it was found in the source code. Relative URLs like ../page2.html are expanded into full absolute URLs with a scheme (https://www.example.com/pages/page2.html) and fragments (#fragment) are removed.
    • Next to the URL you can find small labels such as Start URL or Outbound that provide additional details about the type of link. Hover over a label to see a tooltip with an explanation.
    • Linked from tells you the URL of the first document the link was found in.

Link Details

By hovering over an entry in the link report and clicking on the Details button, you can view more information about the link, including the sources where it was found and any available redirection URLs.

Link Details

If there is an issue with the link, click the error message (such as 404 Not found) to open the documentation with more in-depth information, including tips on how to fix the problem.

If you want to know where exactly Dr. Link Check found the link in the code, click the Source button next to one of the entries in the Linked from section. The source document will be fetched from the server and displayed in a popup with the link locations highlighted.

By clicking on the down arrow next to the Source button and selecting Source link details, you can request the source link’s details to be loaded from the server and displayed in the Link Details dialog. This feature is especially useful for retracing the path our crawler took to arrive at a document because it allows you to jump from link to link up to the URL used to start the check.

Filter

A particularly powerful feature of Dr. Link Check is the ability to define filters to show only the links you are interested in. You can use filters for a variety of purposes, including the following:

  • Report all email address links on your website
  • Report internal links that generate a 5xx server error
  • Report outbound links that permanently redirect to new locations
  • Report outbound links that are dofollow
  • Report all URLs within a specific subdirectory
  • Report all JavaScript files loaded from external servers

To filter the results, click the Add button in the Filter bar on top of the link table and select the criteria you want to filter for:

  • Issue: Filter for links that are broken, blacklisted, and/or indicate a soft error.
  • URL: Filter for links whose URLs match or contain a particular text string
  • Scheme: Filter links based on their URL scheme (such as "https" or "mailto")
  • Host: Filter for links whose hostnames match or contain a certain string (such as "www.example.com")
  • Path: Filter links based on the path parts of their URLs (such as "/path/to/page")
  • Link depth: Filter links based on their distance from the start URL (0 for the start URL itself, 1 for links found directly on the start page, 2 for links found on pages linked from the start page, etc.)
  • Direction: Filter for internal or outbound links
  • Is (not) new: Filter for links that were added to the site since the last link check
  • Is (not) changed: Filter for links based on whether the linked document was significantly updated since the last link check
  • Redirect type: Filter based on how a link was redirected to a new location
  • Link type: Filter links based on the location where they were found in the code (<a href>, <img src>, etc.)
  • Media type: Filter links based on the content type of the linked resource (HTML, image, CSS, etc.)
  • Nofollow/Dofollow: Filter for links with rel="nofollow" attributes (which instruct search engines to ignore the links for ranking purposes)
  • robots.txt status: Filter for internal links that are not allowed to be crawled (based on the allow/disallow rules found in the website’s robots.txt file)
  • Broken check result: Filter links based on the results of checking whether they are functioning
  • HTTP status code: Filter http:// and https:// links based on the HTTP status code received from the server

Link Filter

When adding multiple criteria, only links matching all those criteria will be returned. If you want to remove a criterion from the filter, click the x icon next to it.

Free-form Filter

Instead of creating a filter with the point and click method outlined above, you can also specify the filter in textual form. This feature is intended for advanced users who need to define filters that cannot be expressed via the normal user interface. To turn the filter into text mode, double-click on an empty spot in the filter bar.

Link Filter Rule

A simple filter rule follows the pattern below:

<Property> <Comparison Operator> <Value>

<Property> can be any of the following:

  • Url: The full URL of the link (as a string)
  • Scheme: The scheme part of the URL, e.g. "https" or "mailto" (string)
  • Host: The hostname part of the URL, e.g. "example.com" or "www.example.com" (string)
  • Port: The port number part of the URL, e.g. 80 or 443 (number value)
  • Path: The absolute path of the URL, e.g. "/path/to/page" or "/" (string)
  • Query: The query string part of the URL, including the leading question mark, e.g. "?name=ferret&color=purple" (string)
  • Status: The current status of the link as part of the link check (as one of the enumeration values below)
    • Queued: The link is queued to be checked
    • InProgress: The link is currently being checked
    • Checked: The link was successfully checked
    • Unsupported: The link was not checked because it has an unsupported URL scheme (like “tel” in “tel:+1-555-1234567”)
    • Aborted: The check of the link was aborted
    • Failed: An error occurred while checking the link
    • Blocked: The link could not be checked because the request from our server was blocked
  • LinkDepth: The distance between the link and the start URL, also known as click or page depth (number value)
  • Direction: The direction at which the link is pointing (enumeration value)
    • Internal: The link points to an internal resource
    • Outbound: The link points to a resource outside of the current website
  • IsNew: Specifies whether the link was added to the website since the last link check (Boolean value)
  • IsChanged: Specifies whether the linked document was significantly updated since the last link check (Boolean value)
  • RedirectType: Specifies how the link was redirected to a new location, if applicable (enumeration value)
    • Http301: HTTP redirect using a 301 (Moved Permanently) status code
    • Http302: HTTP redirect using a 302 (Moved Temporarily) status code
    • Http303: HTTP redirect using a 303 (See Other) status code
    • Http307: HTTP redirect using a 307 (Temporary Redirect) status code
    • Http308: HTTP redirect using a 308 (Permanent Redirect) status code
    • HttpRefresh: Redirect triggered by a Refresh HTTP header
    • MetaRefresh: Redirect triggered by a meta refresh HTML element
    • JavaScript: Automatic location change triggered by JavaScript code
    • Frame: Redirect implemented by loading the redirect target in a full-size frame
  • RedirectUrl: The final URL in the redirect chain, if available (string)
  • LinkType: The location where the link was found in the code (enumeration value)
    • AuthUrl: The URL which was used to authenticate with the server
    • StartUrl: The URL with which the link check was started
    • Ahref: Anchor element (<a href="URL">)
    • ImgSrc: Image element (<img src="URL">)
    • LinkStylesheet: Link stylesheet element (<link rel="stylesheet" href="URL">)
    • ScriptSrc: Script element (<script src="URL">)
    • MetaRefresh: Meta refresh element (<meta http-equiv="refresh" content="0; url=URL">)
    • FrameSrc: Frame element (<frame src="URL"> or <iframe src="URL">)
    • SocialMetaTag: Open Graph (Facebook) or Twitter Card meta tag
    • CssImport: CSS @import rule
    • CssUrl: CSS url() function
    • JavaScriptLocation: JavaScript-triggered location change
    • JavaScriptOpen: JavaScript open(…) function
    • RobotsTxtSitemap: Link to an XML sitemap found in a robots.txt file
    • SitemapLoc: The URL was found in an XML sitemap file
    • Other: The URL was found somewhere else in the code
  • MediaType: The content type of the linked resource (enumeration value)
    • Html: HTML document
    • Image: Image file
    • Css: CSS (style sheet) file
    • JavaScript: Script file
    • Json: JSON document
    • Font: Font file
    • Xml: XML document
    • XmlSitemap: XML sitemap
    • Text: Human-readable text file
    • Audio: Audio file
    • Video: Video file
    • Binary: Other binary (not human-readable) file
    • Unknown: File of unknown content
  • NoFollow: Specifies whether the link has a rel="nofollow" attribute (boolean value)
  • NoIndex: Specifies whether a noindex directive was found, instructing web crawlers not to index the document (boolean value)
  • DisallowedByRobotsTxt: Specifies whether the website’s robots.txt file instructs crawlers to ignore the link (boolean value)
  • BrokenCheckResult: The result of checking whether the link is functionable (enumeration value)
    • Ok: The link works fine
    • InvalidUrl: The link’s URL is not properly formatted
    • UnsupportedScheme: The link’s URL uses a scheme that is not supported
    • HostNotFound: The hostname could not be resolved
    • ConnectError: Failed to establish a connection to the server
    • SslHandshakeError: Failed to complete an SSL/TLS handshake with the server
    • SslCertProblem: The server’s SSL certificate failed verification
    • SendReceiveError: An error occurred while sending the request to the server or receiving the response from it
    • Timeout: The server didn’t respond in time
    • HttpErrorCode: The server returned an HTTP error status code
    • TooManyRedirects: The link was redirected more than 20 times
    • BadContentEncoding: The transfer encoding could not be recognized
    • CrawlerTrap: A so-called crawler trap was detected, meaning the website produced an unusual high amount of irrelevant links without any new unique content
    • MxRecordNotFound: No MX record was found for the email address’s domain name
    • UnknownError: An error of unknown type occurred
  • HttpResponseCode: The final HTTP status code received from the server, if available (number value)
  • BlacklistCheckResult: The result of checking whether the link is blacklisted (enumeration value)
    • Ok: The link is not blacklisted
    • Blacklisted: The link is blacklisted for hosting a phishing or malware attack
  • SoftErrorCheckResult: The result of analyzing the page content for signs that indicate an error although the server returned a 2xx HTTP status code (enumeration value)
    • Ok: The page content doesn’t indicate an error
    • ForSale: The link points to a domain or website for sale
    • AdsOnly: The link points to a parked site filled with ads
    • Placeholder: The link points to a common placeholder page
    • OutOfService: The link points to an expired, suspended, or otherwise closed website
    • NoContent: The link points to a page with no or very low content
    • DirectoryListing: The link points to a default page that lists the directory’s contents
    • ErrorMessage: The page content indicates a 4xx or 5xx error (such as 404 Not Found or 500 Internal Server Error)

The options for <Comparison Operator> are:

  • =: Is equal to
  • !=: Is not equal to
  • CONTAINS: Contains string
  • STARTSWITH: Begins with string
  • ENDSWITH: Ends with string
  • >: Is greater than
  • <: Is less than
  • >=: Is greater than or equal to
  • <=: Is less than or equal to

Finally, <Value> is either a string enclosed in double quotes ("example.com"), a number (404), a boolean value (true, false), or an enumeration value depending on the chosen property (see above).

With this information, you can construct simple filters like the ones below:

Url STARTSWITH "https://www.example.com/path/"

Direction = Internal

MediaType != Html

Logical operators (AND, OR) and parentheses allow you to create more complex filters:

HttpResponseCode >= 500 AND HttpResponseCode <= 599

Direction = Outbound AND (LinkType = ScriptSrc OR LinkType = LinkStylesheet)

Expressions can also be negated by prepending them with NOT:

NOT (MediaType = Image OR MediaType = Audio OR MediaType = Video)

Custom Reports

Once you have defined a custom report filter, you will notice a new button, Save as Custom Report, in the top right-hand corner. This button lets you save the current report and add a shortcut to the sidebar.

Save as Custom Report Button

To delete a custom report, open it from sidebar and click on Delete Custom Report in the upper right corner of the report.

Export

If you are on the Professional or Premium plan, you can export the entire results of a report to CSV or PDF format. The Export to CSV option generates a file that can be opened in spreadsheet software like Microsoft Excel or Apple Numbers for further processing. If you are looking for a document suitable for printing, choose the Export to PDF option.

Export Options