SEO CHECK DUPLICATE CONTENT

Matthew Carter
Hello friends, my name is Matthew Carter. I’m a professional link builder for a large SEO agency in New York City.

TRY OUR PACKAGES

Small Package
$49
Human Articles with Readable Spin
TIER 1
15 High DA web 2.0 Properties
10 High DA Trusted Profiles
20 High DA Bookmarks
5 EDU Profiles
50 Powerful Web 2.0 Profiles
10 Blog Platform Articles
10 High DA Documents Links
10 Manual Image Submission Links
10 Niche Related Blog Comments
1 Weebly Post
1 Tumblr Post
1 Wordpress Post
1 Blogspot Post
1 Medium Post
50 Facebook Reshares
50 Twitter Retweets
100 Pinterest Repins
TIER 2
Blog Comments LinkJuice
Bookmarks LinkJuice
Instant Link Indexer Services
Drip Feed Pinging
Order Now
Medium Package
$99
80 Unique Human Articles no spin
TIER 1
30 High DA web 2.0 Properties
25 High DA Trusted Profiles
30 High DA Bookmarks
7 EDU Profiles
70 Web 2.0 Media Profiles
25 Blog Platform Articles
15 High DA Documents Links
12 Image Sharing Backlinks
20 Niche Related Blog Comments
10 High DA Forum Profiles
10 Press Releases
Video Creation
10 Video Submissions
Power Point Creation
10 Power Point Submissions
1 EDU Blog Post
1 Weebly Post
5 High PA Tumblr Posts
1 Wordpress Post
1 Blogspot Post
1 Medium Post
1 Mix.com Share
1 Flickr Share
1 Myspace Share
100 Facebook Reshares
100 Twitter Retweets
250 Pinterest Repins
TIER 2
Blog Comments LinkJuice
Bookmarks LinkJuice
Article Submission
Guestbook Comments
Social Network Profiles
Static Links
Referrer Links
Instant Link Indexer Services
Drip Feed Pinging
Order Now
Big Package
$159
140 Unique Human Articles no spin
TIER 1
50 High DA web 2.0 Properties
40 High DA Trusted Profiles
40 High DA Bookmarks
10 EDU Profiles
100 Web 2.0 Media Profiles
50 Blog Platform Articles
20 High DA Documents Links
15 Image Sharing Backlinks
30 Niche Related Blog Comments
20 High DA Forum Profiles
20 Press Releases
Video Creation
20 Video Submissions
Power Point Creation
20 Power Point Submissions
1 EDU Blog Post
1 Weebly Post
10 High PA Tumblr Posts
1 Wordpress Post
1 Blogspot Post
1 Medium Post
1 Mix.com Share
1 Flickr Share
1 Myspace Share
1 Penzu Post
1 Ex.co Post
1 Behance Post
1 Voog Post
1 Linkedin Post
1 EzineArticle Post
250 Facebook Reshares
300 Twitter Retweets
500 Pinterest Repins
TIER 2
Blog Comments LinkJuice
Bookmarks LinkJuice
Article Submission
Guestbook Comments
Social Network Profiles
Static Links
Referrer Links
Instant Link Indexer Services
Drip Feed Pinging
Order Now

PORTFOLIO













For ‘near duplicates’, click the ‘Duplicate Details’ tab at the bottom which populates the lower window pane with the ‘near duplicate address’ and similarity of each near-duplicate URL discovered.

The SEO Spider will identify near duplicates with a 90% similarity match, which can be adjusted to find content with a lower similarity threshold.

Only ‘exact duplicates’ is available to view in real-time during a crawl. ‘Near Duplicates’ require calculation at the end of the crawl via post ‘Crawl Analysis‘ for it to be populated with data.

You can now view the populated near-duplicate filter and columns.

You’re able to filter by the following –

7) View Duplicate URLs Via The ‘Duplicate Details’ Tab.

The guide above should illustrate how to use the SEO Spider as a duplicate content checker for your website. For the most accurate results, refine the content area for analysis and adjust the threshold for different groups of pages.

Duplicate content identified by any tool, including the SEO Spider needs to be reviewed in context. Watch our video, or continue to read our guide below.

You can also untick other items that also require post crawl analysis to make this step quicker.

For example, if there are 4 near-duplicates discovered for a URL in the top window, these can all be viewed.

In the above screenshot, each URL has a corresponding exact duplicate due to a trailing slash and non-trailing slash version.

If there is any duplicate content in the duplicate details tab that you don’t wish to be part of the duplicate content analysis, exclude or include any HTML elements, classes or IDs (as highlighted in point 2), & re-run crawl analysis.

This tutorial walks you through how you can use the Screaming Frog SEO Spider to find both exact duplicate content, and near-duplicate content where some text matches between pages on a website.

The right-hand side of the ‘Duplicate Details’ tab will display the near duplicate content discovered from the pages and highlight the differences between the pages when you click on each ‘near duplicate address’.

When crawl analysis has completed the ‘analysis’ progress bar will be at 100% and the filters will no longer have the ‘(Crawl Analysis Required)’ message.

Preventing duplicate content puts you in control over what’s indexed and ranked – rather than leaving it to the search engines. You can limit crawl budget waste and consolidate indexing and link signals to help in ranking.

2) Adjust ‘Content Area’ For Analysis Via ‘Config > Content > Area’

Please also read our Screaming Frog SEO Spider FAQs and full user guide for more information on the tool.

The SEO Spider will also only check ‘Indexable’ pages for duplicates (for both exact and near-duplicates).

By excluding the ‘mobile-menu__dropdown’ in the ‘Exclude Classes’ box under ‘Config > Content > Area’, the mobile menu is removed from the content preview and near-duplicate analysis.

It’s worth remembering that duplicate and similar content is a natural part of the web, which often isn’t a problem for search engines who will, by design, canonicalise URLs and filter them where appropriate. However, at scale it can be more problematic.

You’re able to configure the content used for near-duplicate analysis. For a new crawl, we recommend using the default set-up and refining it later when the content used in the analysis can be seen, and considered.

To populate the ‘Near Duplicates’ filter, the ‘Closest Similarity Match’ and ‘No. Near Duplicates’ columns, you just need to click a button at the end of the crawl.

This means if you have two URLs that are the same, but one is canonicalised to the other (and therefore ‘non-indexable’), this won’t be reported – unless this option is disabled.

4) View Duplicates In The ‘Content’ Tab.

Both exact and near-duplicates can be exported in bulk via the ‘Bulk Export > Content > Exact Duplicates’ and ‘Near Duplicates’ exports.

To get started, download the SEO Spider which is free for crawling up to 500 URLs. The first 2 steps are only available with a licence. If you’re a free user, then skip to number 3 in the guide.

A crawl of a larger website, such as the BBC will reveal many more.

The right hand ‘overview’ pane, displays a ‘(Crawl Analysis Required)’ message against filters that require post crawl analysis to be populated with data.

For example, the Screaming Frog website has a mobile menu outside the nav element, which is included within the content analysis by default. While this isn’t much of an issue, in this case, to help focus on the main body text of the page its class name ‘mobile-menu__dropdown’ can be input into the ‘Exclude Classes’ box.

As outlined earlier, the Screaming Frog website has a mobile menu outside the nav element, which is included within the content analysis by default. The mobile menu can be seen in the content preview of the ‘duplicate details’ tab.

If you’re interested in finding crawl budget issues, then untick the ‘Only Check Indexable Pages For Duplicates’ option, as this can help find areas of potential crawl waste.

Open up the SEO Spider, type or copy in the website you wish to crawl in the ‘Enter URL to spider’ box and hit ‘Start’.

This will give you all the URLs on-site that are more than 115 characters and can help you identify issues with overly long URLs.

Understanding how content is segmented within a site, or somehow syndicated, is useful for divvying up original content on a site from syndicated content on a site, especially when syndicated content is a heavy site feature.

Here’s what you need to check for and how to do it.

Check and investigate the following:

In Screaming Frog, after you identify the page you want to check outbound links on, click on the URL in the main window, then click on the Outlinks tab.

URL Length.

As if that weren’t enough, you have other WordPress-specific duplicate content issues to worry about, such as duplicate content on product pages and category pages.

Identifying duplicate content issues is a crucial part of your SEO audit. Here’s what you need to check for and how to do it.

It gives an easy-to-see view that shows you which pages have a match percentage, and which pages match other pages.

If you see something strange going on in terms of the quantities of links, it merits further investigation into both their quality and quantity.

If you’re obsessed with your competitors, you could go as far as performing a crawl on them every month and keeping this data on hand to determine what they’re doing.

To identify the number of internal links pointing to a page, click on the URL in the main Screaming Frog window then click on the Inlinks tab.

If you aren’t a professional writer, use the Hemingway App to edit and write your content.

Using the tool Siteliner.com (made by Copyscape) can help identify duplicate content issues on your site quickly.

In this check, we want to identify all of the 400 errors, 500 errors, and other page errors.

If the goal of your audit is to identify and remove affiliate links from an affiliate-heavy website, then the next tip is a good path to follow.

How to Check.

The number of outbound links on a page can interfere with a page’s performance.

This trick is especially useful for identifying thin content and creating custom filters for finding helpful supplementary content.

It would be pretty easy to analyze and keep this data updated in an Excel table, and identify historical trends if you want to see what competitors are doing in terms of developing their content.

To identify URLs over 115 characters in Screaming Frog, click on the URL tab, click on Filter then click on Over 115 Characters.

In Screaming Frog, click on the H1 tab then take a look at the H1, H2, and H3 tags.

Using the exported Excel document from the step where we bulk exported the links, it’s easier to judge the quality of internal links pointing to each page on the site:

This will organize pages in descending order so you can see all of the error pages before the live 200 OK pages.

How to Check.

Different types of content issues can plague a site – from URL-based content issues to physical duplicate content, actually replicated from page to page without many changes.

For a high-level overview of page categories, it’s useful to identify the top pages of the site via Screaming Frog’s site structure section, located on the far right of the spider tool.

You can also click on Bulk Export > All Inlinks to identify site-wide inlinks to all site pages.

There has been a study by RebootOnline.com that contradicts this one:

In addition, using conditional formatting in Excel, you can filter out affiliate links and identify where they are in the bulk exports from Screaming Frog.

In Screaming Frog, scroll all the way to the right, and you’ll find a Last Modified column. This can help you:

Once Screaming Frog has finished your site crawl, click on the Internal tab, select HTML from the Filter: dropdown menu, and sort the pages by status code.

Alternatively, you can also click on the H2 tab. In addition, you can set up a custom filter to identify H3 tags on the site.

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.