WGet is a pretty awesome tool for many reasons. But there’s a handy trick I picked up today while I tried to scan a website for broken links.
wget --spider -o wget.log -e robots=off --wait 1 -r -p http://example.com
This will run a recursive scan of HTML files and check to see if the links are valid. If it finds some bad ones then it will let you know.
The downfall of this is that you can’t see where the links have been accessed from, so you will also need to grep the contents of the website for the broken link, or check the server logs while you’re running the scan.
You also cannot check CSS files, so a good tool to run is the W3C Link Checker on these. This should give you a good result as well.