SpamAssassin and non-english languages

March 29th, 2012 | Tags:

Being a US centric business we really only care about english messages. Messages in any other language are spam for us. We decided to more accurately mark non-english emails as spam.

There are 2 methods of marking non-english messages as spam. One looks at the stated charset of the message, the other tries to determine the language used within the message itself. You will want to implement both of them.

edit /etc/mail/spamassassin/local.cf so that we specify only english as our accepted language

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
# – english
ok_languages en

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales en

edit /etc/mail/spamassassin/v310.pre so that the TextCat plugin is enabled

# TextCat – language guesser
#
loadplugin Mail::SpamAssassin::Plugin::TextCat

Be sure that Mail::SpamAssassin::Plugin::TextCat is installed:

perl -MMail::SpamAssassin::Plugin::TextCat -e 1

If you get no output then you already have the TextCat plugin installed. If an error message is displayed then you’ll need to install TextCat using CPAN.

Don’t forget to restart spamd if you’re running it, otherwise these changes will take effect immediately.

The TextCat plugin will look at the body of the message an attempt to determine the charset used. You could specify any amount of languages to allow, see the SpamAssassin documentation below:

ok_languages & TextCat
ok_locales

No comments yet.
You must be logged in to post a comment.