Andrew joined the State of Digital Publishing team in 2021, bringing with him more than a decade and a half of editorial experience in B2B...Read more
Plagiarism is a rarely discussed reality within the digital publishing industry. A quick search online will turn up any number of academic discussions about the hidden cost of plagiarism in higher education, both to academic institutions and the wider economy.
For digital publishers, however, plagiarism is often pitched in terms of safeguarding against writers passing off another publisher’s material as their own. Indeed, there are several programs on the market that can check individual articles for plagiarism.
Where these programs miss the mark is that they fail to provide content owners with a) the means to ensure their content hasn’t been stolen, and b) a simplified means of addressing any IP breaches that are discovered.
These are problems that PlagiaShield was created to address. In addition to a traditional document scanner, the software also promises not only to help publishers find any stolen content but also simplify the process of having it taken down.
Stay up to date on the latest news, trends, and best practices in digital publishing.
It’s a lot to offer, but does it deliver on those promises? And, if it does, how easily does it fit within a publisher’s workflow? Let’s find out.
What Is PlagiaShield?
PlagiaShield is a browser-based, online plagiarism checking tool that automatically scans the internet looking for stolen content.
It was designed to help brands, agencies and publishers root out instances of plagiarism and help protect their positioning within the search engine results pages (SERPs).
The software’s unique selling point (USP) is that after scanning a domain it will continue to monitor the internet for instances of plagiarism and duplicate content and will provide email updates on any discoveries it makes.
The distinction between plagiarism and duplicate content may seem unnecessarily academic, but it’s actually quite important here.
PlagiaShield provides a granular level of detail on infringements that enables users to see whether their entire site or a single page has been scraped. Indeed, it goes as far as showing if single sentences have been lifted from a particular article and even offers a percentage match on those sentences.
It’s easy to see whether small amounts of a sentence has been edited — a word here, some punctuation there — to help avoid traditional plagiarism checkers. This means the tool is one step closer to the holy grail of semantic plagiarism detection.
This level of detail offers some immediate advantages. By using a percentage model of duplicated content, users can instantly see where their biggest risks are, giving them a starting point for the review process.
It also helps publishers understand whether a site is actively plagiarizing their content or has simply failed to provide correct citations. This means that instead of submitting a takedown request, publishers can ask for an appropriate backlink.
Duplicate Content’s SEO Risk
As noted above, PlagiaShield’s stated goal is to help publishers from losing visibility with the SERPs.
This is because, when it comes to duplicate content, search engines can struggle to know which content to index and which to include. Even if search engines index every version, they’ll still just choose one to display in response to a search query in the name of providing the best search experience. There’s a reason why SEO professionals advise the use of canonical tags during the content syndication process.
Picking just one page dilutes the other pages’ SERP visibility — a major problem if the algorithm doesn’t pick the original piece. Google has also even acknowledged that its systems favor high ranking pages to the extent that even if it identifies the content’s original creator it could still pick a higher ranked site.
But Google has also said that it values original content with the company rolling out a core SERP algorithm update in August 2022, dubbed the Helpful Content Update, which it described as “part of a broader effort to ensure people see more original, helpful content written by people, for people, in search results.”
Digital publishers are not only investing in high-quality, original content but also increasingly in content optimization, and plagiarism represents a very real threat to that investment. This brings us back to PlagiaShield, which ran its own study into how much content had been stolen from leading news publishers in 2022, finding that 62% of the articles scanned were no longer unique.
PlagiaShield Pricing and Features
PlagiaShield offers a free version that includes a single monthly scan of up to 100 web pages on a single domain as well as 10 free plagiarism scans of up to 2,000 words per month. This tier is useful as either a demo for existing publishers or for new publishers to ensure their content is not being duplicated by higher authority websites.
The company’s three paid tiers offer significantly more functionality, however.
The Pro tier starts at $29 per month and is pitched at brands and content agencies, offering monthly scans of up to 1,000 pages per month across five domains. Users have the option to add an extra 1,000 pages for $20 per month. As part of the package, users also gain access to chat support and the company’s DMCA Filler Chrome Extension, which speeds up the process of filling in DMCA takedown requests that are then submitted to Google via the Search Console.
The Publisher tier is the next jump on pricing, starting at $499 per month. However, this tier is aimed towards larger news outlets, offering weekly scans for up to 50 domains and up to 25,000 pages in addition to the Pro benefits. An additional 10,000 pages per month can be purchased for $99. In addition, this level also provides team management and API access.
PlagiaShield also offers an Enterprise plan that comes with custom features, support and billion options. For example, this will suit publishers that use a subscription model, as PlagiaShield can integrate directly with their site and protect paywalled content.
Getting Started With PlagiaShield’s Dashboard
The PlagiaShield dashboard is an exercise in minimalism, consisting of just three main sections: Domains, Documents and Your Account.
The first deals with website plagiarism monitoring, the second with analyzing individual documents for plagiarism and the third with billing and plan settings as well as team management.
Let’s have a closer look at each.
Once a publisher has signed up for an account they’re invited to add a domain. Each of the paid subscriptions allows for several domains to be added.
However, it’s important to remember that each account shares from a communal pool of monitored web pages. What does that mean? Well, users who opt for the base-level Pro package can monitor a total of 1,000 pages shared across five domains.
While users can limit the number of pages monitored on each domain added, the lowest page limit they can set is 1,000. This effectively means that to be able to monitor all five domains available with the Pro plan, users need to pay an additional $80 per month for those extra 4,000 pages.
When talking about this quota It is worth noting that PlagiaShield only monitors pages that contain more than 500 characters. Moreover, the platform offers users the option to exclude sections of their site that aren’t worth monitoring — such as category and author pages. Both of these features will help to preserve the page quota.
While it’s up to individual users to decide whether they’re willing to pay $109 per month to monitor five domains, we don’t consider the price to be a sticking point. Rather, what we’d like to see is greater transparency around what’s required to get the most out of each subscription tier.
Once the domain selection process has been completed, publishers can easily see their websites under the “List domains” tabs on the right hand side.
Clicking on the domain image to the right of the menu brings up a detailed overview of the potential copyright infringements.
It’s from this page that users can begin to establish whether their content has been duplicated elsewhere on the web. PlagiaShield offers both micro and macro review filters — Review Pages and Review Domains respectively — for users to identify potential infringements. Let’s take a closer look at how the micro filter works first.
This option allows users to drill down into specific pages that PlagiaShield has found to be duplicated. Users can filter the results by either selecting the number of similar pages or by the percentage of common content found.
The first filter is useful for quickly identifying whether the software has flagged the duplication of open-source, boilerplate copy such as privacy declarations or member FAQs. Ideally, though, users should have already filtered these out already during the sign-up process.
If users miss this, they can tell PlagiaShield to ignore that page, but will have to wait until the end of their plan period for it to reset their page quota.
The second filter is where things start to get interesting. By filtering based on the amount of data that’s shared between your page and pages suspected of IP theft, you’re able to see which pages have been scraped.
As you can see below, SODP’s directory of prominent publishing companies in Los Angeles shares 92% of its content with another page.
Clicking on the page in question brings up a dialogue box with a more detailed breakdown of the infringement. As can be seen below, there are 156 sentence matches between the two pages, going far beyond content similarity.
The offending page has scraped all of SODP’s content, even going as far as to include State of Digital Publishing in the URL and the page title. Looking at the sources section shows a more detailed breakdown of the plagiarized content, allowing users to see at a glance which sentences are direct copies (highlighted in red), which are similar (yellow) and which have no match (gray).
We have no interest in pillorying another website here, which is why we’ve blurred its URL. What we will talk about is PlagiaShield’s takedown request tool, which helps tackle such issues.
It’s here, however, that we arrive at a minor UX misstep for the software. Oddly enough the takedown request can’t be started from the Review Pages section, requiring users to navigate to the Review Domains section instead.
We felt it would be more user friendly to be able to address such issues from whichever filter they happen to be in and it’s really not clear to us why that’s not the case. Anyway, let’s look at the Review Domains.
Once here, users can begin using the tools they need to begin to address infringements.
For example, clicking on a domain allows users to classify whether the content has been duplicated or not, the pages where the offending material appears, the potential contact information of site owners along with an email template, as well as a guide to the DMCA infringement process.
PlagiaShield recommends attempting to contact offending sites before opting for the more serious route of filing a DMCA through the Google Search Console (GSC). The company claims that, in its experience, around 70% of the time directly contacting the site will deliver the desired outcome.
Here’s an example of the email template PlagiaShield has drafted for its users.
If there aren’t any contact details available, as was the situation in our case, then users should move straight to submitting a DMCA takedown request to Google.
PlagiaShield has developed a system to speed up the completion and submission of Google’s DMCA forms, using a Chrome extension. When we started using the tool, this step proved to be extremely difficult to navigate, owing to the fact that PlagiaShield hadn’t updated its user guide.
This was thankfully rectified during the course of the review and there’s now a detailed user guide, which simplified the process greatly.
Users simply download a spreadsheet — which is in .json format — and then upload it to the extension, which will do the heavy lifting for them. The extension will both fill in the DMCAs and then slowly submit them to avoid triggering Google’s suspicions that a robot is handling the process on your behalf.
We recommend keeping the full guide in the extension’s FAQ section handy when tackling this task for the first time.
PlagiaShield also offers a plagiarism detector for documents, which works similarly to many other such tools.
However, it does come with some filtering options that allow users to restrict the search to pages solely from their own domains or limit to pages from external websites.
We tried out the tool on a first draft of a story we’d commissioned from a writer and received the expected result that it was a completely original article.
However, we also tested the first five sentences of a CNN story on the state of the UK’s National Health Service (NHS), which had been published just seven hours earlier. The tool tracked down the original copy in a matter of seconds.
What surprised us wasn’t the fact that the tool correctly identified the infringement, it was the level of information we received. Not only did the tool identify CNN’s copy, but it also showed the more than 20 other sites that had also published the exact story.
This is a useful tool for news publishers looking to quickly check the deluge of stories that are submitted on a daily basis.
However, its 2,000-word count limit does mean that any outlet publishing longer features, investigations, analysis and research papers, or even op-ed pieces will find splitting these articles up to be something of an annoyance.
Another oddity is that access to the tool is tied to the account’s page quota, so that those users that reach their page monitoring limit are not allowed to use the tool.
We’re not quite sure why PlagiaShield designed the tool this way, given that users can simply set up a free account at any time to circumvent this issue.
Help and Support
PlagiaShield doesn’t have much in the way of a support guide or the best way to use the tool. There is a guidance button on the top right of each doman screen that gives a brief overview of each section.
The company has said it prefers to guide users with behavior-based emails and that, depending on what the user did or didn’t do, they send emails to help them take their next step. In practice, however, we’d have appreciated something more strategic and upfront to help us become as productive as possible as quickly as possible.
We’re not saying the absence of documentation is a critical misstep, especially after the support team proved so helpful guiding us through any issues that cropped up. But if the goal is to help publishers quickly identify and address plagiarism, any that more quickly helps us scale the initial learning curve would be welcome.
That said, once we had a few hours under our belt we felt much confident about how to leverage the platform.
PlagiaShield in Review
Despite the inevitable teeth problems that arise when using any new piece of software, we couldn’t help but be impressed with the suite of tools PlagiaShield has developed. It truly is a complete package when it comes to checking for plagiarism, with both commercial and academic applications.
What We Love About PlagiaShield
- Multi-domain monitoring
- Detailed breakdowns of suspected plagiarism
- Protection for paywalled content
- DMCA automation
- Inclusion of a document checker
- Minimalist interface
- Laser focus on plagiarism
- Responsive support team
Where There’s Room for Improvement
- Greater clarity over page monitoring quotas
- Smoother UX when handling takedown requests
- More detailed productivity guides
We think PlagiaShield has cracked the nut with its suite of tools. Sure, we encountered some issues when going hands on with the platform, but as we spent more time with it and began to see it in action it was an amazing insight into the reach of the online plagiarism checker.
The company takes pride in the fact that it was developed with contributions from the SEO community, and rightly so. For a tool to address as serious an issue as content theft as comprehensively as this one does is no mean feat.
While there’s room for PlagiaShield to smooth out some bumps in its user journey, that should in no way detract from what the team has managed to pack into the platform.