Dr. Link Check lets you export link reports in CSV format so you can analyze the results using third-party tools.
One of our larger customers recently asked why he couldn’t open and analyze his CSV export file in Microsoft Excel. I explained that Excel has a limit of 1,048,576 rows, which means it can’t handle CSV files that go beyond that number. My suggestion was to either use a dedicated CSV viewer or, even better, import the CSV file into a database and work with it using SQL.
In this post, I’ll walk you through how to create an SQLite database, import a CSV file, and run SQL queries to get the data you need. SQLite is a lightweight and powerful database engine that’s easy to set up and free to use.
To keep things simple, I’ll be using SQLiteStudio as a graphical client for working with SQLite. It’s much easier than using the command line, especially when you want features like a visual schema editor, a spreadsheet-style data browser, and a built-in SQL editor with autocomplete.
Go to sqlitestudio.pl and click the Download button. SQLiteStudio is free and available for Windows, macOS, and Linux. It comes with SQLite bundled in, so there’s nothing else you need to install.
Once the installation is complete, launch SQLiteStudio to get started.
The first step is to create a new database file.
Click the Add a database button in the toolbar.
In the dialog that opens, click the green Create a new database file button. Choose a folder where you want to save the database, then enter a filename (for example, test.db).
Click OK to finish. Your new SQLite database is now ready to use.
Double click the database entry in the Databases tree to establish a connection to the database.
With your database created and opened, the next step is to import your CSV data into a new table.
Open the Tools menu and select Import.
In the import wizard, enter a name for the new table. For example, you can name it export, then click Next to continue.
Select your CSV file and review the import settings. If the data does not line up correctly, try changing the Field separator setting. Some files use commas, while others may use semicolons depending on how they were generated.
Click Finish to begin the import. Once it is complete, the new table will appear under Tables in the database tree on the left.
You can now start querying your data using SQL. Go to the Tools menu and choose Open SQL editor. This opens a new window where you can enter SQL statements. To run a statement, click the blue Execute command button at the top.
For the following examples, I’ll assume that you created a table named export from a CSV file exported using Dr. Link Check’s Export to CSV feature.
Let’s start simple by querying all URLs that, according to Dr. Link Check, returned a 404 error:
SELECT *
FROM export
WHERE BrokenCheck = '404 Not found';
The next example returns all links with any issues, ordered by URL:
SELECT *
FROM export
WHERE BrokenCheck <> 'OK' OR BlacklistCheck <> 'OK' OR ParkedDomainCheck <> 'OK'
ORDER BY Url;
If you want to query all internal links that were permanently redirected, use this SQL statement:
SELECT *
FROM export
WHERE Direction = 'Internal' AND RedirectType = 'Permanent';
Using the LIKE
operator, you can get all mailto:
links:
SELECT *
FROM export
WHERE Url LIKE 'mailto:%';
With a GROUP BY
clause, you can get all URL schemes and the count for each scheme in descending order:
SELECT substr(Url, 0, instr(Url, ':')) AS scheme, count(*) as count
FROM export
WHERE scheme <> ''
GROUP BY scheme
ORDER BY count DESC;
Another useful query returns all non-secure http://
links that redirect to the same URL using the https://
scheme:
SELECT *
FROM export
WHERE Url LIKE 'http://%' AND RedirectUrl = 'https://' || substr(RedirectUrl, length('https://') + 1);
Now it gets a bit more advanced. The following query counts the number of links by file extension:
SELECT
-- Everything right to the last dot in the URL
lower(substr(url_without_query_string, length(rtrim(url_without_query_string, replace(url_without_query_string, '.', ''))) + 1)) AS file_ext,
count(*) as count
FROM (
-- Remove query string from URL
SELECT CASE WHEN instr(url, '?') > 0 THEN substr(url, 0, instr(url, '?')) ELSE url END AS url_without_query_string
FROM export
WHERE url REGEXP '^(https?|ftp)://' -- Only query http://, https://, and ftp:// URLs
)
WHERE file_ext <> '' AND instr(file_ext, '/') = 0
GROUP BY file_ext
ORDER BY count DESC;
The last practical example allows you to compare the results of two exports and find which URLs were added or removed. This assumes that you imported the first export into a table called export_a and the second into a table called export_b:
SELECT 'Removed' as action, a.url
FROM export_a a
LEFT JOIN export_b b ON b.url = a.url
WHERE b.url IS NULL
UNION ALL
SELECT 'Added' as action, b.url
FROM export_b b
LEFT JOIN export_a a ON a.url = b.url
WHERE a.url IS NULL;
If you are new to SQL, visit sql-easy.com for a short, interactive course that covers the basics.
SQLite is a powerful tool for working with CSV files, especially when you need to handle large datasets that other tools like Excel cannot manage. When paired with SQLiteStudio as a graphical front end, it becomes easy to set up and use.
If you are exporting reports from Dr. Link Check and want more control over how you filter, sort, or analyze the data, give this workflow a try.