Monday, March 8, 2010

Testing a spam filter

This article discusses "Why spam-filter testing is largely a disaster."

It raises some good points, but I have two main issues with it:
1) It reads more like a press release than anything else - but they are clearly trying to make it look informational. Not which it is supposed to be, but I am guessing that it is a press release.

2) It isn't "testing" of a spam filter they are talking about - but "training" of one as it learns and the settings are tweaked.

One of the key points it mentions which I 100% agree with though are that when testing or training, forwarding spam through the server to an address is not a smarter idea for either concept. When forwarded, it will change the headers and therefore come from an address which you might have whitelisted as a trusted domain/user, but then the content is full of spam.
In terms of testing, that is then useless to you since it is coming from a good user and shouldn't be marked as spam.
In terms of training, it is again useless because the data which should be seen as spam is then going to be associated with a "good" person and therefore lesson the severity of it on future hits for that spam. (note that is only if it is a learning system using Bayesian methods or something along those lines - if it isn't one of those, then you don't have to worry as much - although I don't see much of a point of using it if it isn't one of those)

No comments:

Post a Comment