So i have dug out the akismet module I was working on a long time back and decided to investigate it a little further ( gives me something to do while I'm away ).
basically I'm after some thoughts, idea's, abuse on what should be there, shouldn't be there.Basically any thoughts that may help me. (and don't be shy to yell out if your interested in helping)
easiest way is a list of thoughts so far i suppose.
a way to submit missed spam
a way to submit false positives (ham)
Key Verification
Include all $_SERVER i.e akismet would like this
I was using a class I found but decided to start fresh with functions, I have got as far as;
checking a connection is possible to akismet servers.
checking key verification
checking if a comment is spam or ham (all $_SERVER is included)
This function sends the POST request to akismet with a timeout of 3 seconds (The timeout could be made configurable???). The HTTP request is constructed with full headers. The response headers are discarded and the function returns the body of the response.
Code:
function rnAkismet_http_post($request, $host, $path) {
The key verification call should be made before beginning to use the service. It requires two variables, key and blog.
key (required)
The API key being verified for use with the API
blog (required)
The front page or home URL of the instance making the request. For a blog, site, or wiki this would be the front page. Note: Must be a full URI, including
Only registered users can see links on this board! Get registered or login to the forums!
The call returns "valid" if the key is valid. This is the one call that can be made without the API key subdomain.
ok this is where Im having design difficulties.Akismet request the following;
Quote:
This is basically the core of everything. This call takes a number of arguments and characteristics about the submitted content and then returns a thumbs up or thumbs down. Almost everything is optional, but performance can drop dramatically if you exclude certain elements. I would recommend erring on the side of too much data, as everything is used as part of the Akismet signature.
blog (required)
The front page or home URL of the instance making the request. For a blog or wiki this would be the front page. Note: Must be a full URI, including
Only registered users can see links on this board! Get registered or login to the forums!
user_ip (required)
IP address of the comment submitter.
user_agent (required)
User agent information.
referrer (note spelling)
The content of the HTTP_REFERER header should be sent here.
permalink
The permanent location of the entry the comment was submitted to.
comment_type
May be blank, comment, trackback, pingback, or a made up value like "registration".
comment_author
Submitted name with the comment
comment_author_email
Submitted email address
comment_author_url
Commenter URL.
comment_content
The content that was submitted.
Other server enviroment variables
In PHP there is an array of enviroment variables called $_SERVER which contains information about the web server itself as well as a key/value for every HTTP header sent with the request. This data is highly useful to Akismet as how the submited content interacts with the server can be very telling, so please include as much information as possible.
This call returns either "true" or "false" as the body content. True means that the comment is spam and false means that it isn't spam. If you are having trouble triggering you can send "viagra-test-123" as the author and it will trigger a true response, always.
Idea: - use function akismet_prepare_comment_data to prepare the data like user ip, referrer, username etc, the function also checks if a user is registered and inserts some info automatically. We could then send the data via the function rnAkismet_comment_check.The $slink would be the link to the particular content. e.g. if you were looking at a news article here
article1.html
or probably more like
article.html'.$sid.'
The function;
Code:
function akismet_prepare_comment_data($sAuthor = 'Anonymous', $sEmail = '', $sLink, $sComment) {
global $user;
if (is_user($user)) {
$userinfo = getusrinfo($user);
} else {
$userinfo = '';
}
// Prepare data that is common to nodes/comments.
$comment_data = array();
// IP address of the comment submitter.
$comment_data['user_ip'] = $_SERVER['REMOTE_ADDR'];
// User agent information of the comment submitter.
$comment_data['user_agent'] = $_SERVER['HTTP_USER_AGENT'];
// The content of the HTTP_REFERER header should be sent here.
$comment_data['referer'] = isset($_SERVER['HTTP_REFERER']) ? $_SERVER['HTTP_REFERER'] : '';;
// Submitted name with the comment.
$comment_data['comment_author'] = (isset($userinfo['username']) ? $userinfo['username'] : $sAuthor);
The following function would send the data via the function rnAkismet_http_post.The rnAkismet_comment_check function also ties in the server variables akismet requested (see further down).
Other server enviroment variables
In PHP there is an array of enviroment variables called $_SERVER which contains information about the web server itself as well as a key/value for every HTTP header sent with the request. This data is highly useful to Akismet as how the submited content interacts with the server can be very telling, so please include as much information as possible.
we could achieve this by the following function i borrowed from egroupware.
Code:
function rnAkismet_include_request() {
// You may add more elements here, but they are often related to internal server
// data that makes little sense to check whether a comment is spam or not.
// Be sure to not send HTTP_COOKIE as it may compromise your user's privacy!
static $safe_to_send = array(
'CONTENT_LENGTH',
'CONTENT_TYPE',
'HTTP_ACCEPT',
'HTTP_ACCEPT_CHARSET',
'HTTP_ACCEPT_ENCODING',
'HTTP_ACCEPT_LANGUAGE',
'HTTP_REFERER',
'HTTP_USER_AGENT',
'REMOTE_ADDR',
'REMOTE_PORT',
'SCRIPT_URI',
'SCRIPT_URL',
'SERVER_ADDR',
'SERVER_NAME',
'REQUEST_METHOD',
'REQUEST_URI',
'SCRIPT_NAME'
);
// The contents of $_SERVER doesn't change between requests,
// so we can have this cached in static storage.
static $server_data;
if (!$server_data) {
$server_data = array();
foreach ($_SERVER as $key => $value) {
if (in_array($key, $safe_to_send)) {
$server_data[$key] = $value;
}
}
}
return $server_data;
}
Next would be all comments classified as spam would be held in the database so an administrator could resubmit the comment as ham (not spam) in case of false positive's.Accordingly each comment should have a new link that will allow administrators to mark the comment as sapm in case it gets through.
I guess we could now tie all this in with a function in mainfile.
anything im missing....or can anyone see any problems with this.
Joined: Aug 28, 2003 Posts: 6373 Location: Vsetin, Czech Republic
Posted:
Sun Nov 08, 2009 5:55 am
I would because you can just 'include' it whenever you need it, you don't have to have it loaded all the time. All the codes in one place so if Akismet ever changes, it's easier to find the bit of code you need to change.
I used it in a couple of modules but since I have not had any spam since RN 2.x it didn't seem worthwhile incorporating it any more.
Sorry I missed this originally. I was looking at this for a series of standalone form-to-email applications. I reviewed 4 libraries (either class, include functions, or both).
I remembered that Guardian was working on this, and asked him about it. His response might help others, so I am re-posting part of it here (thanks, G):
Guardian2003 wrote:
I think I would probably look at Project Honeypot as an initial filter and then add a manually updated blacklist as it would be more viable in terms of the man hours spent updating the blacklist once a month I should think.
If you're using a Class or developing your own code to access the Akismet API use the http1.0 protocol NOT http1.1 - 1.0 is much faster and works just as well."
Since Project Honeypot is IP-based, and IPs are so easy to spoof, I'm wondering how effective that would be. As far as stopping feedback / comment / forum spam, a content analysis approach like Akismet seems more effective. Checking the IP first might weed out a few spammers, but it would not be long before even those bottom-feeders circumvent that. Of course, Guardian has been looking at this MUCH longer than I - so I'd be especially interested in further discussion on this (hence this post)...
Also, I'm interested in further feedback on Akismet 1.0 vs. 1.1 - everything I've seen uses 1.1, though that could be just a "latest-is-greatest" mentality. Is there any different in the interface - or just the Akismet version parameter that gets passed?
I think it would be a nice pre registration mod on any site. Which also takes real time operation out of the equation to some extent. I've been to lazy to pursue it as its not been an issue for me either.
I'd say either could apply but I was thinking of Akismet.
Realtime as in checking validity of a posters email on an anon reply to an active comment thread. Operation could be way to slow. Where a onsite database would be feasible as an option.
(not that allowing anon posters doesn't lead to trolls and flame wars but... it is done frequently)
Where registration confirmation delay would hardly be noticed.
Am I just confusing the use of Akismet and topic?
Could be just too tired to comprehend anything
Edit some hours of sleep later:
Ok so someone registered on my site using a suspected spammer name/email so I decided to look at another option for fun.
Much like nukesentinel is
Only registered users can see links on this board! Get registered or login to the forums!
I'm testing it now.
Likes:
Easy to update csv file for ip bans.
Fast load
Dislikes:
Installer seems quirky easier to upload files to live site after setting them up locally.
Not a native solution duplicates some native checking.
Off topic will start a new thread after some testing.
View next topic View previous topic
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum