Nov 08 2007
Scraping, Splogs and Spam
Scraping sites and splogs are ever on the rise, and I’m noticing more and more pingbacks of the “so and so had an interesting item [short excerpt] read the full story here…” type.
Even full reposting in a foreign language.
But today I got a Google Alert with my name on it (as you do), leading me to this unusual snippet of my recent post on the 2007 Weblog awards. Those links are to adbrite ads, and there is no link to me (not that this worries me), or the awards. I’m not linking to the site (it’s got intrusive pop up ads), but from the screen shot it’s not too hard to work out.
Update: I just realised this is a rewording of Anthony’s Post (which has links):
Meg Tsiamis has a full run down of the people in the running and how to vote. It’s the day of voting so go show your support and vote for your favourite Australian bloggers.
It makes a little more sense now, just change a few words here and there!
Digging a bit deeper
This site has been established since March 2007, and was sold in August for $1,500. Prior to the sale, the author maintained a legitimate blog with genuine content. The blog took a break after the sale, but started posting again in November and from what I can see all the posts are these bizarre reworded snippets.
The site has a PageRank of 4, an Alexa rank of 132,496 and a Technorati rank of 11,198 – nothing to sneeze at!
Most of the other posts are similar “quality” to the one with my name.
Why am I even bothering to mention this?
I’m not having a shot at bloggers whose first language is not English.
I just wanted to point out that this type of site makes a mockery of PageRank, Technorati and Alexa rank (probably preaching to the converted here), and what a site can denigrate to when flipped.
Spam
I’m sure you get enough of your own. But I can’t believe the length of this one spam message! Yes, it even came complete with exactly 100 links to “unsavoury” sites.
I swear, if you’re into p0rrn, no need to go looking for it – start a blog and all the links you’ll ever need will be delivered free to your blog.
Those spammers are seriously determined!
9 Responses to “Scraping, Splogs and Spam”
LMAO I’d love to know what quality translation service they used. Wonder what on earth crapper occurrences started out as.
A new spam I have noticed over the past few days, is a bunch of spam links in an RSS feed. Seems they hack your blog and inject the spam into your feed, but your blog looks fine. Another good reason to subscribe and keep an eye on your own feeds.
You know, I nearly stopped blogging because of the spam and p(rn. It drove me to distraction in the beginning.
Our days started by me greeting Paul with, not with a coffee, a kiss, or a “Good morning, I love you my darling,” (I get up long before he does) but with me moaning. “207 pieces of p(rn this morning.”
Leigh I am too traumatised to look at my feed now! lol….you may well remember me whinging to you about the p(rn previously.
Crapper Occurrences indeed Meg, crapper, crapper and double crapper to the spammers.
And…as a total aside, I rather like my new way of spelling p(rn. It sums up my mood around it quite well.
Actually, they are likely to be machine-translated using software developed by those with linguistic background Say, translated to French and then translated back to English again so Google would not be able to detect duplicated content. Using machine translation also makes sense as spammers need to make hundreds or thousands of these splogs to make any real income.
And yeah. They are bad.
Articles written online and published via RSS are pretty much copied, mangled and published elsewhere straight away these days. Fortunately Google has been doing a half decent job weeding these spam sites out of SERP.
On the other hand if people simply reported these sites to Google and had their adsense pulled they might simply just go away.
Leigh,
hehe I think Scott answered your question.
Wow – hacking an RSS is pretty sad, I agree always a good idea to subscribe to your own. I’ve seen a few hacked templates lately with hidden text links (not possible to see unless you look at the source code) – very sneaky.
Megan
Sorry you’ve had so many problems Though I do like your p(rn variation 😉 I’m trying to mix up the spelling a bit myself!
Scott
Insidious all right. Thanks for the insight on how it’s done. With a PR of 4, this site obviously hasn’t come under the radar yet.
FB
Yeah, they EVENTUALLY go away, and crop up on a new domain the next day!
I’m using TanTan Noodles spam filter, and since my blog has been open that has blocked over 2290 spams.
I tend to get a lot of drug spam, and some bizarre chinese car websites.
TanTan has the option of adding “regex” patterns. I can’t understand head or tail of it but I have asked the other half to check it out and see if there is a way for me to add the following patterns –
urls with .cn/ in them
urls with drug names in them (though they will spell the drug name wrong in the comment it is *always* spelt right in the URL)
urls with shop or pharmacy in them <– i don’t know anyone who has shop of pharmacy in their url legitimately so this should be ok.
Akismet is supposed to be able to “learn” but I haven’t seen much sign of it to be honest. It always lets comments through which have random combinations of numbers and letters. I keep marking them as spam, but they keep getting through.
Actually as far as free p)rn, which I like to send to the Nigerian Scammers (in particular big burly hairy nak3d men if possible, never ladies) because they are often in the internet cafe with all the other scammers when reading their emails and being into men is frowned upon in that culture it is very difficult to find good free p)rn. Seriously! I spent ages looking and built up a reasonable collection with the help of other scam baiters, but you can’t just go out there and find that stuff. There’s a lot of websites with popup ads and nasties and ones that are just there to grab email addresses etc.
I did find a fantastic website one time – we ran “search for a p)rnstar” and I needed some legit p)rn sites to pretend like they belonged to my company. It was called smoking 5lut5 (change the 5 for s) and it was all about half nak3d women smoking – many of them dressed as french maids. I laughed for some hours and the memory of it still causes much laughter. Sadly, the site no longer exists. Who knew there wasn’t a market for that? 😉
Cheers,
Snoskred
I believe there are some rubbish scripts you can run that re-write content for you, replacing and re-ordering words etc. Perhaps that’s what has happened here? So basically the splogger feeds in the text they’ve cut and pasted from a legit blog, and it then spits out a revised version (that rarely makes sense) in order to avoid a google duplicate content smacking.
Wow, that’s some hardcore splogging!
It’s amazing the software sploggers have now-adays… those posts could pass off as being by someone who doesn’t know how to write in English well – but it’s just ripped off content. And may not be detected by those who use services to detect ripped content because some words are changed.
Spam technology is advancing.
-Mike
Oh, I feel your pain! Three different spam filters, and there are still a sizable number getting through – pingbacks to low-rent scraper sites are the worst, but I must say this example of yours really takes the cake! What really troubles me is clearing out the spam logs — when you’re bulk-deleting upwards of 500 spam comments and pingbacks a day, there’s such a risk of accidentally deleting a valid comment from a real live person, that just happened to get filtered out as spam. What I just don’t get, is what these people hope to achieve in the long run… but you’re right, it makes all the hoopla and fretting over things like PageRank look pretty silly. 😛
(Thanks for letting me vent, er, commiserate on your blog!)