Posted on Friday 14th of September 2007 at 17:13 in Web Development

How I could steal all your content from your RSS

Do you use a full RSS feed? Well, many bloggers feel that the use of a full feed only helps people steal their content; but that's not strictly true. It's awfully easy to steal your content if you have any kind of RSS feed at all, and here's how it could be done.

As I documented in my tutorial scraping website content with PHP using curl; it's very easy to get the full generated source of a website. So if you take the script as discussed in the tutorial, load in a URL (http://www.thepcspy.com for example) then you have the full code for that page. Content and all.

Yes, a full RSS feed does make life easy for the thief because you're providing all your content right there for them. However, by definition your RSS feed has a URL in it and a description - that's probably enough.

RSS feeds are structured in a specific way (theoretically). Your feed will have a "link" attribute as well as a limited "description" and these are where we can do the damage. Your description is almost certainly the start of your content and the link is where we can find it.

Using the script outlined in the tutorial listed above I can curl the contents of the URL you've supplied. All I need to do then is look through the page for something that indicates the start of the article (such as what was provided in the description attribute of the RSS feed).

I can now happily curl your content, search for a specific string of characters and I've found the start of your content. The real trick is knowing where it ends - but that's not necessarily a big issue because they don't really *need* to steal ALL your content. If I can take 1/2 of it and get it indexed on the search engines before you - then I win.

Just a passing thought about how thieves could be a bit more "intelligent" about their theft. However, most I see just copy and paste from my site. It'd be more flattering if they used their brains, wrote a PHP script to scrape the URLs from your feed, tokenize the content, match the start against the description, pull the content into a database and publish it out. They'd still get a cease and desist but at least I'd have a little more respect for them.

 

Enjoy this article? Why not subscribe to the full RSS feed?

Add Your Comments








Comments

No comments yet
Subscribe to the RSS Feed

Stay up to date with Seopher.com by subscribing to the RSS feed, either in your browser or subscribe via email using the form below

Updates by Email

By subscribing by email you’re also subscribing to the Seopher.com newsletter; a periodical email outlining new reviews, competitions and other subscriber-only content

  • 125x125 banner only $50 pcm
  • Dreamhost Hosting $5.95 per month
Want to give your product/website exposure?

Paying for a featured review is a great way to give your product, service or website exposure. For as little as $50 you can have a full review on the site forever.

Advertising Bundle! Review + Banner = $70

To kick start the new improved Seopher.com, buy a review and get a 125x125 advert half price. Your banner gets displayed on over 450 pages for a full month.