Duplicate content, years after Google has claimed to have solved the issue, is still a threat to sites' rankings. In the recent decade or so, yet another reason for duplicate content has been competitors trying to steal a site's rankings via copying the site, cloaking and redirecting their copy and waiting till Google merges the original site's rankings with their own's. Because the copying site is often cloaked, it can sometimes be very difficult to detect such content theft. However, it happens often and in the competitive search verticals, almost every client site we audit contains traces of having been copied at some point.

Google has traditionally not been very good at identifying the original source and the copy of content, and neither has it been very helpful in detecting the existing issues. E.g. there is no report in the Google Search Console warning the site owner their content has been found on a different domain. However, here are at least 2 unintended ways to identify the attempt to copy your site in GSC:

1. GSC’s Links report does not look very informative and it only provides partial data about the link sources of your site known to Google, but it can be useful in case of stolen content. Sometimes in the Links report, you can spot an unusually high number of incoming links from a domain you are not familiar with - if your site has been copied fully and the thief has not changed the paths e.g. to images, css, scripts, internal links etc. right away, it will look like links from another domain to you. (And this is yet another reason to use absolute paths on your site – make your site more difficult to steal!)

2. Check the Page Indexing report for your site – this report gives you the reasons why some groups of the URLs are not indexed. The grouping is very approximate and sometimes completely different issues get thrown into the same group but one group is useful in our case. If you see some of your pages not indexed due to Google choosing a canonical different from the one you have set up, it is worth checking which URL Google shows as canonical - sometimes, the canonical will be on a different domain! Check the Google cache of the offending URL on the third party domain – more likely than not, you will see the exact copy of your site, and the cache window may even say that’s the cache of your site.

How to deal with these issues and how to protect your site from losing rankings?

  • If the offending site is no longer live (which may sometimes be the case), you could try using the Remove Outdated Content tool to request its removal from Google’s index, then just observe if the offending URLs disappear from Google’s index over the next few weeks;
  • If the site is live, there are some things you can do:
    • File a DMCA removal request via this form: https://support.google.com/legal/troubleshooter/1114905
    • Contact the host and registrar of the offending domain, informing them that the domain in question is copying your content and a DMCA removal request has been filed.

We’re interested in talking
about your business.