![]() ![]() (Please note – This is not a data extraction or scraping feature yet.) Custom Source Code Search – The spider allows you to find anything you want in the source code of a website! Whether that’s analytics code, specific text, or code etc.User-Agent Switcher – Crawl as Googlebot, Bingbot, or Yahoo! Slurp.Images over 100kb, missing alt text, alt text over 100 characters Images – All URIs with the image link & all images from a given page.Follow & Nofollow – At link level (true/false).Outlinks – All pages a URI links out to.Canonical link element & canonical HTTP headers.Meta Refresh – Including target page and time delay.Meta Robots – Index, noindex, follow, nofollow, noarchive, nosnippet, noodp, noydir etc.H2 – Missing, duplicate, over 70 characters, multiple.H1 – Missing, duplicate, over 70 characters, multiple.Meta Keywords – Mainly for reference as it’s only (barely) used by Yahoo.Meta Description – Missing, duplicate, over 156 characters, multiple. ![]() Page Title – Missing, duplicate, over 70 characters, same as h1, multiple.Duplicate Pages – Hash value / MD5checksums lookup for pages with duplicate content.URI Issues – Non ASCII characters, underscores, uppercase characters, dynamic uris, long over 115 characters.External Links – All followed links and their subsequent status codes.Redirects – (3XX, permanent or temporary).Errors – Client & server errors (No responses, 4XX, 5XX).A quick summary of some of the data collected.
0 Comments
Leave a Reply. |