AidanMontareDotNet

You are on the old part of aidanmontare.net, which I am no longer maintaining. Newer versions of some of this content can be found on the active part of my site, which you can reach from my homepage.

Finding Broken Links

(last updated

Having broken links on your website is really annoying. Especially in WordPress, where simple errors like forgetting http:// will leave you with a messed up URL (the browser appends the link to your site’s URL). And if you move pages on a large site, the amount of broken links to deal with will soon be astonishing.

Luckily, there are many automated tools that allow you to crawl a website and generate a list of broken links.

There are too many solutions available for me to list all of them, but these are the applications I have tried and found useful.

For WordPress Users

Broken Link Checker

While checking for broken links doesn’t require a plugin (and it’s good security practice to install plugins only when necessary), The Broken Link Checker plugin for WordPress is nice because it displays a single list of all the broken links and also lets you fix the links right then and there. If you have a lot of pages, the inconvenience having to go to every page, find the broken link, and fix it might be enough to warrant installing this plugin.

The plugin’s support forum at WordPress.org is filled with people reporting errors of every link being reported wrong. I too experienced this error initially, but I managed to get the plugin working after doing the following:

  1. Install php5-curl. If you had problems with this plugin and looked at the logs in the settings, you might see that it reports curl as not being installed. That seems like an issue, and it is. So sudo apt-get install php5-curl and then sudo service apache2 restart will give the plugin everything it needs.
  2. Add an exception to Bad Behavior (and possibly other spam/security software). The Bad Behavior WordPress plugin is a simple way to help prevent spam on your site. However, it will also block the Broken Links Checker plugin from accessing your site. The easiest fix is to add an exception so that requests over localhost will not be blocked by Bad Behavior. Open your Bad Behavior whitelist and your server’s IP address to the IP Address whitelist (adding localhost or 127.0.0.1 did not work, only the actually public IP of my server).
  3. Add an exception to WP-Statistics. If you have the WP-Statistics plugin, you might notice a spike in traffic when using Broken Link Checker. Unfortunately, this is not because your site has become suddenly more popular, but because the plugin is detecting Broken Link Checker’s scans as it checks for broken links. In your WP-Statistics settings, find the “Access/Exclusions” tab, and the “Excluded IP address list:” field. Add (for redundancy) both your server’s public IP address and localhost to this list:
    50.116.56.154/255.255.255.255
    127.0.0.1/255.255.255.255
  4. Add an exception to Fail2ban. If you have the Fail2banreactive IP banning software setup on your server, check to see if the filter apache-proxy is enabled. For some reason, this filter will wind up blocking your local system when your run the link checker. If you are using this filter, find the line in your /etc/fail2ban/jail.local file that says ignoreip =. Add a space at the end of this line and then your server’s public IP like last time. I do not know why this filter is causing issues, so any suggestions are welcome.
  5. Install the Broken Link Checker plugin. Then open up the settings, go to the advanced tab, and select “Re-check all pages” to force a scan of your entire site.
  6. Go make sure the plugin works. See if the links it returns are actually broken. The above fixes were the only things I needed to do to get my server working.

If the plugin works, you will now be able to see a list of broken links, with the option to edit them without leaving the page. It’s great for fixing careless URL errors, as well as cleaning up after moving pages around.

Note I did not test the emailing feature of the plugin, so I don’t know how well that works. Also, the plugin found my test broken link, but refused to check it, even though I manually asked for a recheck. This did not happen with test broken link #2, likely because the first was misspelled as “htttp://”, and the plugin did not know how to check that protocol.

For Linux Users

LinkChecker

LinkChecker is available on most Linux systems with something like sudo apt-get install linkchecker-gui. It is a relatively simple program, but does the job well.

KLinkStatus

KLinkStatus is available with KDE, or can be installed with the klinkstatus package on most Linux distributions. It is a little nicer than the LinkChecker program, though both accomplish the same task.

This is the program I use to scan non-WordPress sites.