Why do I get the message ‘We have been unable to find a content article on this page’ when using the Simplify Tool?
- This means that the ‘main body of text’ or ‘article’ cannot be identified
- The tool works better with valid Html. You can use this tool to check a web page http://validator.w3.org/
- Html element names, css class names and id’s are important!
Any element (not BODY or A tag) with a class name or id that contains any of the following will be ignored combx|comment|community|disqus|extra|foot|header|menu|remark|rss|shoutbox|sidebar|side|sponsor|ad-break|agegate|pagination|pager|popup|tweet|twitter but NOT any of and|article|body|column|main|shadow
<div class=‘comment1’>...</div> would be ignored
<div id=‘main’ class=‘comment1’>...</div> would NOT be ignored
<div id=‘leftSide’>...</div> would be ignored
<div id=‘leftSide’ class=‘main’>...</div> would NOT be ignored
Any element with a class name or id that contains any of the following will be more likely to be found (given extra weighting by the algorithm) article|body|content|entry|hentry|main|page|pagination|post|text|blog|story
- Not enough text - there needs to be at least 1 paragraph (or div) within the main article with at least 25 characters
- Forms and iframes will be ignored
- The algorithm will also take into account link density, number of embedded objects (videos) and depth of elements within the html