Blog – Dr. Link Check

WordPress XML-RPC: Disable or Don’t Disable?

Have you ever wondered if you can post content to your WordPress blog using your phone or tablet? The answer is yes, but you need XML-RPC enabled on the WordPress blog. If you read about cyber security and WordPress, you might come across the idea that XML-RPC is a security threat and it should be disabled. Here are some facts to help you decide.

XML-RPC

What is XML-RPC?

XML-RPC is a remote protocol that works using HTTP(S). The remote procedure call (RPC) protocol uses XML to transport data to and from the WordPress blog. XML-RPC isn’t for WordPress only. You can find other applications that use it, although it is an older protocol not used much in newer applications. XML is widely replaced by JSON, but WordPress is an old platform that still uses many traditional protocols and procedures.

When you log into your WordPress account, you use regular HTTP and form submissions. XML-RPC allows you to log in using something other than a web browser and perform any action allowed by the WordPress API. You can create and edit posts, review tags and categories, read and post comments, and even get statistics on your blog. If you have a mobile app that does any of these actions, it probably uses XML-RPC.

What Advantages Do You Have with XML-RPC?

Before you disable anything on your WordPress blog, you should understand what it’s used for. Disabling the wrong protocol or service can leave your blog unusable by certain applications and other parts might not be visible to your users.

XML-RPC is needed if you want to remotely publish any content. Suppose that you have an application on your tablet that allows you to write ideas in an application and upload them to your blog. You create new posts based on these ideas, but you leave them as a draft until you can write better and more thorough content. For the application to communicate with your blog, you need XML-RPC.

The plugin Jetpack is one of the most common applications that requires XML-RPC to be enabled. Jetpack is a popular plugin for site analytics and sharing on social media sites such as Facebook, Twitter, LinkedIn, and Reddit. It also lets you communicate with your blog using WhatsApp. Jetpack has several features, and many of them rely on XML-RPC. Once you disable the protocol, the plugin no longer functions properly.

What Cyber Security Issues Does XML-RPC Bring?

The negative issues that come with XML-RPC are related to cyber security. It should be noted that WordPress has worked long and hard on combating cyber security issues related to its API and any remote procedures. For the most part, any cyber security issues have been patched. Make sure you look at the date before judging any WordPress feature based on published reports. A cyber security issue could have been a problem years ago, but it only takes WordPress a few months before the issue is patched.

The main issues that come with XML-RPC are DDoS attacks and brute force attacks on the administration password. You can still combat these issues by downloading cyber security plugins such as WordFence or Sucuri.

DDoS attacks using XML-RPC are mostly on the pingback system. When your article is mentioned and you have pingbacks enabled, the remote site sends your WordPress blog an alert. This alert is called a pingback, and you can get thousands of them a day when an article goes viral. If an attacker wants to DDoS your WordPress blog, they can flood your site with pingbacks until it can’t handle them anymore. The result is a crash on your site, which makes it unavailable to readers.

You can stop DDoS attacks using Akismet, an anti-spam tool that blocks spam comments until you review them. It’s even included with the WordPress installation, so you just need to sign up and receive an API key to use the software. Akismet should be included on any blog unless you have a better anti-spam tool for your WordPress comment system.

The second type of attack is brute-force attacks on the administrator username and password. The most common attempts to brute force a password are to guess the administrator password. Before you can guess the password, you need the administrator’s username. The default is “administrator,” so brute force scripts use this account name first. They then try the name of your blog. These two account names are common on most blogs, and it’s a mistake to use either name for your administrator account.

A brute force is an attack where the hacker takes a list of dictionary terms and runs them in a script that attempts to log in to your site. If the first attempt fails, it continues with the next one, and then the next one and then the next until all “guesses” expire. If given enough time, the attacker can guess your password and log into your administration dashboard.

The best way to defend against a brute-force attack is to block an IP after a certain amount of login attempts. You can do this using the plugin WordFence. It allows for a certain number of login attempts (usually 5) and then blocks the IP from any more attempts for several minutes. The attacker can still try, but the time it takes to find a password is too long when several hours of the day are blocked.

When you give a password to your administrator account, use a complex one. It should be at least 10 characters and use upper and lowercase letters and special characters.

Should You Disable XML-RPC?

If you know that no plugins or apps use the protocol, then disabling XML-RPC is an option. It’s recommended that you disable it only if you don’t use it, but it could be a problem in the future. If you’ve forgotten that it’s disabled and try to work with a plugin that needs it, you could get frustrated with the plugin’s issues until you realize that the fault lies with XML-RPC being disabled.

WordPress has also fixed many of the issues with this protocol, so it doesn’t carry the negative cyber security issues that it used to. You can keep the protocol on to ensure that you can remotely work with your WordPress blog, as long as you have at least one cyber security plugin installed that blocks pingback DDoS attacks and brute-force scripts.

By the way, don’t forget to check out Dr. Link Check and run a free broken link check on your WordPress blog.

301, 302, 303, 307, and 308: Which HTTP Redirect Status Code is for What?

HTTP redirects are a way of forwarding visitors (both humans and search bots) from one URL to another. This is useful in situations like these:

Restructuring: After moving content to a different address, you want visitors who use the old link to be automatically redirected to the new location (instead of having them land on a 404 error page).
HTTP to HTTPS: You want to keep your visitors’ data safe by redirecting them from the unencrypted HTTP version of the website to the secure HTTPS version.
Geotargeting: You want to redirect visitors to localized pages based on their geographic location (inferred from the IP address) or their browser’s language settings.
Device targeting: You want to redirect users of smartphones and tablet devices to the mobile-friendly version of your website.
A/B testing: You want to redirect different visitors to different pages and then compare visitors’ behavior to see which page performs best.
Aliases: You want to create a short URL that is easy to remember and redirects to a longer one.
Maintenance: You want to temporarily redirect visitors to a static “Under Construction” page while you are working on the web server.

How Does an HTTP Redirect Work?

HTTP is a request/response protocol. The client, typically a web browser, sends a requests to a server and the server returns a response. Below is an example of how Firefox requests the home page of the drlinkcheck.com website:

HTTP Request and Response

In this example, the server responds with a 200 (OK) status code and includes the requested page in the body.

If the server wants the client to look for the page under a different URL, it returns a status code from the 3xx range and specifies the target URL in the Location header.

HTTP Response Status Code 301

Permanent Redirects: 301 and 308

HTTP status codes 301 and 308 are used if a resource is permanently moved to a new location. A permanent redirect is the right choice when restructuring a website or migrating it from HTTP to HTTPS.

The difference between code 301 and 308 is in the details. If a client sees a 308 redirect, it MUST repeat the exact same request on the new location, whereas the client may change a POST request into a GET request in the case of a 301 redirect.

This means that, if a POST with a body is made and the server returns a 308 status code, the client must do a POST request with the same body to the new location. In the case of a 301 status code, the client may do this but is not required to do so (in practice, almost all clients proceed with a GET request).

The problem with HTTP status code 308 is that it’s relatively new (introduced in RFC 7538 in April 2015) and, therefore, it is not supported by all browsers and crawlers. For instance, Internet Explorer 11 on Windows 7 and 8 doesn’t understand 308 status codes and simply displays an empty page, instead of following the redirect.

Due to still limited support of 308, the recommendation is to always go with 301 redirects, unless you require POST requests to be redirected properly and are certain that all clients understand the 308 response code.

Temporary Redirects: 302, 303, and 307

The 302, 303, and 307 status codes indicate that a resource is temporarily available under a new URL, meaning that the redirect has a limited life span and (typically) should not be cached. An example is a website that is undergoing maintenance and redirects visitors to a temporary “Under Construction” page. Marking a redirect as temporary is also advisable when redirecting based on visitor-specific criteria such as geographic location, time, or device.

The HTTP/1.0 specification (released in 1996) only included status code 302 for temporary redirects. Although it was specified that clients are not allowed to change the request method on the redirected request, most browsers ignored the standard and always performed a GET on the redirect URL. That’s the reason HTTP/1.1 (released in 1999) introduced status codes 303 and 307 to make it unambiguously clear how a client should react.

HTTP status code 303 (“See Other”) tells a client that a resource is temporarily available at a different location and explicitly instructs the client to issue a GET request on the new URL, regardless of which request method was originally used.

Status code 307 (“Temporary Redirect”) instructs a client to repeat the request with another URL, while using the same request method as in the original request. For instance, a POST request must almost be repeated using another POST request.

In practice, browsers and crawlers handle 302 redirects the same way as specified for 303, meaning that redirects are always performed as GET requests.

Even though status codes 303 and 307 were standardized in 1999, there are still clients that don’t implement them correctly. Just like with status code 308, the recommendation, therefore, is to stick with 302 redirects, unless you need a POST request to be repeated (use 307 in this case) or know that intended clients support codes 303 and 307.

SEO Considerations

When Google sees a permanent 301 or 308 redirect, it removes the old page from the index and replaces it with the page from the new location. The question is how this affects the ranking of the page? In this video, Matt Cuts explains that you lose only “a tiny little bit” of link juice if you do a 301 redirect. Therefore, permanent redirects are the way to go if you want to restructure your site without negatively affecting its Google rankings.

Temporary redirects (status codes 302, 303, and 307) on the other hand are more or less ignored by Google. The search engine knows that the redirect is just of temporary nature and keeps the original page indexed without transferring any link juice to the destination URL.

The Cost of a Redirect

Another aspect to consider when using HTTP redirects is the performance impact. Each redirect requires an extra HTTP request to the server, typically adding a few hundred milliseconds to the loading time of the page. This is bad from a user experience perspective and puts unnecessary stress on the web server. While a single redirect doesn’t hurt too much, redirect chains in which one redirect leads to another redirect should definitely be avoided.

If you want to identify all redirects on your website, our link checker can help – just enter the URL of your website and hit the Start Check button. Once the check is complete, you will see the number of found temporary and permanent redirects under Redirects. Click on one of the items to get a list of all redirected links. If you hover over a link item and click the Details button, you can see the entire redirect chain.

Conclusion

Five different redirect status codes – no wonder many website owners get confused when it comes to redirects. My advice is the following:

Use a 301 redirect when a page or file is now permanently available under a new URL and you want search engines to recognize that fact.
Use a 302 redirect when a page is only temporarily available under a different URL and you don’t want search engines to replace the original URL in their indexes.
Use 303, 307, and 308 redirects only if you really need them and know what you are doing.
Also, keep redirects to a minimum because each redirect requires an additional HTTP request.

I hope this sheds some light onto how HTTP redirects work and which HTTP status code to choose in which situation.

AWS EC2: T3 vs. M5 Instances (with Benchmark)

Our link checker heavily relies on AWS and EC2 instances in particular. One of the more difficult decisions when dealing with EC2 is choosing the right instance type. Will burstable and (potentially) cheap T3 instances do the job, or should you pay more for general purpose M5 instances? In this blog post, I will try to shed some light on this and provide answers to the following questions:

How fast is a T3 instance compared to an equivalently sized M5 instance?
How does the T3 CPU credit model work, exactly?
What is T3 Unlimited and how reliable is it?

CPU Performance

Without further ado, let me dive directly into some actual testing. I create a t3.large instance (without Unlimited mode) in the us-east-2 region and select Ubuntu 18.04 as the operating system. The first thing I check is the processor the machine is running on:

$ cat /proc/cpuinfo | grep -E 'processor|model name|cpu MHz'

processor       : 0
model name      : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
cpu MHz         : 2500.000
processor       : 1
model name      : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
cpu MHz         : 2500.000

As you can see from the output, the T3 instance has access to two vCPUs (= hyperthreads) of an Intel Xeon Platinum 8175M processor.

In order to test the performance of the processor, I install sysbench …

$ sudo apt-get install sysbench

… and start the benchmark (which stresses the CPU for ten seconds by calculating prime numbers):

$ sysbench --threads=2 cpu run

sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1623.24

General statistics:
    total time:                          10.0012s
    total number of events:              16237

Latency (ms):
         min:                                  1.16
         avg:                                  1.23
         max:                                  9.89
         95th percentile:                      1.27
         sum:                              19993.46

Threads fairness:
    events (avg/stddev):           8118.5000/0.50
    execution time (avg/stddev):   9.9967/0.00

The events per second (1623 in this case) is the number you should care about. The higher this number, the higher the performance of the CPU. As points of reference, here are the results from the same test on several other cloud machines as well as some desktop and mobile processors:

Instance Type	Processor	Threads	Burst	Baseline
t3.micro	Intel Xeon Platinum 8175M	2	1654	175
t3.medium	Intel Xeon Platinum 8175M	2	1653	343
t3.large	Intel Xeon Platinum 8175M	2	1623	510
m5.large	Intel Xeon Platinum 8175M	2	1647	1647
t3a.nano	AMD EPYC 7571	2	1510	75
t2.large	Intel Xeon E5-2686 v4	2	1830	538
a1.medium	AWS Graviton	1	2281	2281
t4g.micro	AWS Graviton2	2	5922	592
m4.large	Intel Xeon E5-2686 v4	2	1393	1393
c4.large	Intel Xeon E5-2666 v3	2	1583	1583
GCP f1-micro	Intel Xeon (Skylake)	1	956	190
GCP e2-medium	Intel Xeon	2	1434	717
GCP n1-standard-2	Intel Xeon (Skylake)	2	1435	1435
Linode 2GB	AMD EPYC 7501	1	1236	1170
Hetzner CX11	Intel Xeon (Skylake)	1	970	860
Dedicated Server	Intel Xeon E3-1271 v3	8	7530	7530
Desktop	Intel Core i9-9900K	16	19274	19274
Laptop	Intel Core i7-8565U	8	8372	8372

While bursting, T3 instances provide the same CPU performance as equally sized M5 instances, which answers my first question from above.

Investigating the T3 CPU Credit Model

For the next test, I let the t3.large instance sit unused for 24 hours until it has accrued the maximum number of CPU credits.

t3.large CPU Credit Balance after 24 hours

T3 instances always start with zero credits and earn credits at a rate determined by the size of the instance. A t3.large instance earns 36 CPU credits per hour with a maximum of 864 credits. According to the AWS documentation, one CPU credit is equal to one vCPU running at 100% for one minute. So how long should my t3.large instance be able to burst to 100% CPU utilization?

If the instance has 864 CPU credits, uses 2 credits per minute, and refills its credits at a rate of 0.6 (= 36/60) per minute, it should have the capacity to burst for 864 / (2 - 0.6) = 617 minutes = 10.3 hours. Let me put that to a test.

sysbench --time=0 --threads=2 --report-interval=60 cpu run

Just as expected, the performance drops to about 30% (the baseline performance for t3.large instances) after about ten and a half hours.

t3.large CPU Utilization

T3 Unlimited Mode

T3 instances with activated Unlimited Mode are allowed to burst even if no CPU credits are available. This comes at a price: a T3 instance that continuously bursts at 100% CPU costs approximately 1.5 times the price of an equally sized M5 instance. However, how reliable is Unlimited Mode? My worry is that AWS puts too many instances on a single physical machine, so not enough spare burst capacity is available. To answer this question, I launch a t3.nano instance with Unlimited Mode and let it run at full steam for about four days.

t3.nano CPU Utilization (Unlimited Mode)

As promised, there is no drop in CPU performance. The t3.nano instance delivers the full capacity of 2 vCPUs (almost) all the time. Quite impressive!

Network Performance

Instead of running my own network performance tests, I rely on the results that Andreas Wittig published on the cloudonaut blog. He used iperf3 to determine the baseline and burst network throughput for different EC2 instance types. Here are the values for different T3 instances and an m5.large instance:

Instance Type	Burst (Gbit/s)	Baseline (Gbit/s)
t3.nano	5.06	0.03
t3.micro	5.09	0.06
t3.small	5.11	0.13
t3.medium	4.98	0.25
t3.large	5.11	0.51
m5.large	10.04	0.74

Although an m5.large instance costs only about 15% more than a t3.large instance, it provides a 50% higher baseline throughput and more than double the burst capacity.

When determining how much network throughput you need, consider that EBS volumes are network-attached.

Conclusion and Recommendation

T3 instances are great! Even a t3.nano instance at a monthly on-demand price of less than $4 gives you access to the full power of two hyperthreads on an Intel Xeon processor and at burst runs as fast as a $70 m5.large instance. By activating Unlimited Mode, you can easily insure yourself against running out of CPU credits and being throttled.

If you don’t need the 8 GiB of memory (RAM) that an m5.large instance provides and can live with the lower network throughput, one of the smaller T3 instances with activated Unlimited Mode might be the much more cost-effective choice. In the end, it depends on how high the average CPU usage of your instance is. The table below lists the CPU utilizations up to which bursting T3 instances remain cheaper than an m5.large instance. Please note that the calculations are based on the on-demand prices and might be different when using reserved instances.

Instance Type	Memory	Cost-effective if average CPU usage is less than
t3.nano	0.5 GiB	95.8%
t3.micro	1 GiB	95.6%
t3.small	2 GiB	95.2%
t3.medium	4 GiB	74.4%
t3.large	8 GiB	42.5%

Last but not least, let me state that you shouldn’t rely solely on the benchmarks and comparisons in this post. You are welcome to use my finding as a first guide when choosing the right instance type, but don’t forget to run your own tests and make your own calculations.

How to Find Flash Content on a Website

The days of Flash are numbered. Chrome blocks Flash content by default since version 76 (released in July 2019), Firefox since version 69 (September 2019), and Google has announced it will stop indexing SWF files by the end of 2019. The final nail in the coffin will come later in 2020, when Adobe officially end-of-lifes the Flash Player.

According to W3Techs, only about 3 percent of all websites utilize Flash nowadays. This doesn’t sound like very much, but considering the huge number of websites on the web, it still means that millions of sites rely on Flash. If your website is among them, it’s definitely time to act!

The first step in migrating a website away from Flash is identifying the pages that need to be updated. In this post I will demonstrate how to use Dr. Link Check to crawl a website in order to find all SWF files and the pages linking to them.

Step 1: Start a link check

Click here to navigate to the Dr. Link Check home page, enter the address of your website into the input field, and click the Start Check button.

Step 2: Show all links

The service immediately starts crawling through your website, which may take a while. Once the crawl is complete, click the All Links item in the sidebar on the left.

All links

Step 3: Filter links by file extension .swf

Click the Add button in the Filter bar, select URL from the drop-down menu, enter .swf into the input field, and press Enter to confirm the input.

Add filter

The list will now only display links containing “.swf” in their URL. In order to see the pages linking to the Flash file, hover over the link and click the Details button.

Conclusion

Although Dr. Link Check is primarily a broken link checker, its flexibility also makes it an excellent tool for finding specific files on a website. You can not only use it to search for Flash, but also for Java, Silverlight, or any other type of content that’s identifiable by filename. Give it a try!

The Four Pillars of Modern SEO

Social media may have been the star of the digital marketing scene in recent times, but reports of SEO’s demise are greatly exaggerated. Effective search engine optimization can still drive plenty of profitable traffic to a website, but it needs to be approached correctly. What do you need to consider when developing a successful modern strategy?

4 Pillars of SEO

1. Link Building

Links have always been the foundation of the Google algorithm. Although the specifics have changed over time, a good link will always be a positive for your ranking. But what should you be looking for when building links?

Always strive for links that appear natural and freely given by the source site. If your link looks as if it’s been bought or traded, it will be downgraded in value, or may even work against you in extreme cases.
Links from highly trafficked sites are always preferable. Not only will these links tend to have a greater algorithmic impact, they’ll also send you traffic directly.
Your incoming links should have a wide range of anchor texts (the text used within the link itself). Aim for a mixture of descriptive keywords and phrases: generic terms such as “click here,” website addresses, and longer sentences which describe your page content. Also, mix it up by including links based around images rather than text.
Your link profile should be as diverse as possible. Aim for links from sites with high and low traffic, closely related themes and more general blogs, and from a variety of countries and organizations. The greater the mix, the more natural your profile will look, and the greater the impact your best links will have.

2. On-Page Optimization

On-page optimization is all about getting your page’s ducks in a row to leave the algorithm with no doubt about its theme. Although it’s nowhere near the make-or-break factor it once was, it’s still vital.

Optimize your page title and meta description so that they are readable, interesting, and accurately descriptive, and also include your target keywords.
Structure your content well, using headings and subheadings where appropriate, again including relevant keywords where they naturally fit.
Use image alt attributes to describe the image content accurately, while also adding the keywords for which you want to rank.
Tweak your content to naturally include your main terms and a good spread of related phrases. However, never do this at the expense of readability.
Add links to other pages on your website and to respected external sites, where they fit the text and add value for the user.

3. Technical SEO

Technical SEO is in some ways the ugly duckling of the optimization family. It’s rarely exciting, but it provides the foundation on which other parts of the discipline can build. In essence, it means ensuring your site is easily understood by the search engine algorithm, with no technical glitches or confusion to trip the spiders up. Here are the most important things to consider.

Identify and remove as much content as possible that’s duplicated across multiple pages.
Ensure that several different URLs can’t reach the same page. That is an easy trap to fall into when the URL is parsed by your website software to build a page from database entries.
Check that your pages load quickly and reliably with no server errors, even under heavy load. Google’s PageSpeed Insights tool and PingDom’s Website Speed Test can help you identify performance bottlenecks and improvement opportunities.
Ensure that your site works appropriately across a wide range of devices. Google now actively demotes sites which offer a poor mobile experience.
Use a broken link checker to track down and fix broken internal and external links. Broken links send poor quality signals to search engines, implying sloppy maintenance and a frustrating user experience.

4. Quality Content

The final part of the optimization jigsaw is high-quality content. Search engines strive to direct users to genuinely useful sites, and if your content is poor, you won’t fit this criterion. Not only that, but high-quality content increases user engagement, and this feeds directly back into the ranking system via tracking through Google’s advertising, analytics, and social media platforms.

Make your content as unique as possible. While it’s difficult to create something original in a crowded market, publishing rehashed material isn’t going to make you rise above your competitors.
Make your content genuinely useful, entertaining, or otherwise attractive. Think beyond the search engine spiders – user satisfaction should be your prime aim. Content that doesn’t engage or convert in some way is worth virtually nothing.
Make sure your content is easily readable and clearly presented. Your visitors immediately need to recognize that they’ve found what they’re seeking. If they don’t, they’ll hit the back button, inflating your bounce rates. Google will notice this and lower your rank accordingly.

Modern online marketing offers countless ways of driving traffic to a website, but search engine optimization remains one of the most powerful and cost-effective. But it’s not something you can take for granted or approach half-heartedly. Pay attention to these four pillars, and you’ll be giving your SEO efforts an essential underpinning for profitable success.

New Notification Features

We have added two new options to the Project Settings dialog that allow you to configure where and when to send notification emails about completed link checks. Previously, the results of a recurring check were always sent to the account holder’s email address. Now you can use different recipients for different projects.

Email notification settings

In order to have the emails delivered to multiple recipients, simply enter the addresses separated by a comma. The first email address will be used for the email’s To field, the remaining addresses go to the Cc field.

If you are not interested in getting emails about checks that didn’t identify any dead or bad links, tick the checkbox next to Only send if issues were found. Dr. Link Check will then keep quiet until something worth reporting is found.

Older Posts Newer Posts

Dr. Link Check Blog

WordPress XML-RPC: Disable or Don’t Disable?

What is XML-RPC?

What Advantages Do You Have with XML-RPC?

What Cyber Security Issues Does XML-RPC Bring?

Should You Disable XML-RPC?

301, 302, 303, 307, and 308: Which HTTP Redirect Status Code is for What?

How Does an HTTP Redirect Work?

Permanent Redirects: 301 and 308

Temporary Redirects: 302, 303, and 307

SEO Considerations

The Cost of a Redirect

Conclusion

AWS EC2: T3 vs. M5 Instances (with Benchmark)

CPU Performance

Investigating the T3 CPU Credit Model

T3 Unlimited Mode

Network Performance

Conclusion and Recommendation

How to Find Flash Content on a Website

Step 1: Start a link check

Step 2: Show all links

Step 3: Filter links by file extension .swf

Conclusion

The Four Pillars of Modern SEO

1. Link Building

2. On-Page Optimization

3. Technical SEO

4. Quality Content

New Notification Features