Link & Comment Spamming - A possible solution.
Recently one of the sites I am developing for my self was link spammed. Some unpleasant individual decided that it would be fun to post 160 ?comments? spread over all the blog posts. All the comments contained was URL?s. Even more stupid they used BB tags, but as I wrote the site it doesn?t use them.
Any way, obviously this isn?t some thing I want, so I deleted them all with a quick bit of SQL. No one else has posted a comment to the site because like I said, its still under development.
However, it happened once so there is no reason to think it wont happen again. I thought about the problem for a while, and the only solution is to incorporate some kind of humanity check. Because lets face it, its not like some one sat there and entered them all in. Its was some kind of bot.
Now, I don?t really like the ones that ask you to type the letters from some hard to read image. I can do that, no problem but they look ugly, and if the user was colour blind, or any other sight related issue, then you buggered.
So, I have come up with a different solution. The idea is to ask a random question, some thing that?s so easy any every one will know the answer, but unless you can read, you wont know what the answer is.
While I was busy implementing this solution, and believe me it didn?t take very long, another 20 comments of a very similar nature where posted. How annoying is that?
The solution seems to work for now. There have been no more comments since I completed the changes, but then maybe its only time until the bot gets adjusted, time will tell. But I thought other might benefit from having it so here goes, how to add random questions to your site.
Oh, one thing, I am not going to list my questions here, for a start it took me a shockingly long time to think of 30 really really easy questions, and I also don?t want to give a list of the question text and answers away.
So? on with the show. First off you will need 2 tables, one for the questions and one to keep track of what questions you have asked each user.
The questions table is easy, 3 coloums. Question_id, question and answer. The question id is just a unique number, the question and answer are both varchar.
CREATE TABLE `capture_questions` (
`question_id` bigint(20) unsigned NOT NULL auto_increment,
`question` varchar(255) NOT NULL default '',
`answer` varchar(255) NOT NULL default '',
PRIMARY KEY (`question_id`)
) ;
Then we have the table used to store the asked questions. This is even less complicated. All you need is 2 fields, one for the question that was asked ID, and one to store the users Session ID.
CREATE TABLE `capture_question_asked` (
`sid` varchar(60) NOT NULL default '0',
`qid` bigint(20) unsigned NOT NULL default '0'
);
So far, that?s all easy enough. You fill in the question table with as many questions as you can think of, along the lines of ?is the sky blue? with an answer of ?yes? or perhaps ?What is 25 + 30? answer, obviously 55.
Of course you can create the tables and this isn?t going to do a hell of a lot. So, you?ll need a class to deal with it all. This is a copy of the code, though you will notice the use of the functions ?performQuery? and ?fetchRow?, these are from my own DB layer. They replace the standard MySQL commands by using a wrapper. It makes it easy to port code from one RDBMS to the other. I personally like this solution because its light weight and simple. But it?s a bit beyond the scope of this post.
So, the class? Its got 2 methods: getQuestion and getAnswer. They both take the a single parameter of ?sid?. This is the session id, but for compatibility it is passed in to the function so it can basically be any thing.
class captureClass extends baseClass {
function getQuestion($sid) {
$question = false ;
$deleteSQL = "DELETE FROM capture_question_asked " ;
$deleteSQL .= " WHERE sid = '$sid' " ;
performQuery($deleteSQL) ;
$selectQuestionSQL = "SELECT question_id, question " ;
$selectQuestionSQL .= "FROM capture_questions " ;
$selectQuestionSQL .= "ORDER BY rand(NOW()) LIMIT 0 , 1 " ;
$selectQuestionQuery = performQuery($selectQuestionSQL) ;
if ($selectQuestionQuery) {
if ($row=fetchRow($selectQuestionQuery)) {
$question = $row['question'] ;
$qid = $row['question_id'] ;
$insertSQL = "INSERT INTO capture_question_asked (sid, qid) " ;
$insertSQL .= "VALUES ('$sid', '$qid') " ;
performQuery($insertSQL) ;
}
}
return $question ;
}
function getAnswer($sid) {
$answer = '' ;
$selectQuestionIdSQL = "SELECT qid FROM capture_question_asked " ;
$selectQuestionIdSQL .= "WHERE sid = '$sid' " ;
$selectQuestionIdQuery = performQuery($selectQuestionIdSQL) ;
if ($selectQuestionIdQuery) {
if ($row=fetchRow($selectQuestionIdQuery)) {
$qid = $row['qid'] ;
$selectAnswerSQL = "SELECT answer FROM capture_questions " ;
$selectAnswerSQL .= " WHERE question_id = '$qid' " ;
$selectAnswerQuery = performQuery($selectAnswerSQL) ;
if ($selectAnswerQuery) {
if ($answerRow=fetchRow($selectAnswerQuery)) {
$answer = $answerRow['answer'] ;
}
}
}
}
return $answer ;
}
}
How do you use it? Well, when your page displays the form you make a call to getQuestion and display it. For a while I thought about putting the question ID into the page, but only for a couple of seconds as I realised any half decent attempt to beat the system would just replace the ID with one with a known answer, infact as I suspect that the form is not used, simple data ?posted? to the page, then it wouldn?t even matter.
That?s why we keep that information in the database.
Any way, once the form is submitted you then ask the class to get the answer for the current session, and compare what the user entered to the correct answer. I?d suggest forcing lower case, or upper case if you want, but basically make the comparison case insensitive.
If the answers don?t match then the person is, well an idiot or a bot. If no answer is available, then some ones messed with the session, or never even used the form. Doesn?t matter which, either way its an error.
I don?t think I will bother to explain the code it self, its really not that complicated. I think maybe the only bit that might seem a bit strange is the sql used to select a question:
SELECT question_id, question FROM capture_questions
ORDER BY rand(NOW()) LIMIT 0 , 1
This simply selects a random record from the table, because its ordered by ?rand?. This basically means that for each record in the table a random number is generated, and then the records are ordered by the value. Because we only want one question we use the limit to only select the first record, how ever because each time the records are selected they will be in a different order, each time you get a different record? cool ha?
I hope this proves to work over time. I?ll have to keep any eye on it. Just to see how it goes. If any one can see any thing wrong with it, well, let me know.