Announcement

Collapse
No announcement yet.

spamrunner.pl fixes and activating spam training

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • spamrunner.pl fixes and activating spam training

    Hi everybody,

    I'm not quite sure if this is the place to post or if the "OX ... related" or maybe administration/configuration (for a part of my post) subforum fits better. If so, admins please move my post.

    When you install Hyperion with the community installer you will find a perl skript called "spamrunner.pl" linked to your "/etc/cron.hourly" directory. This script is responsible for training the SpamAssassin bayesian filter your "confirmed-spam" and "confirmed-ham". Maybe this script works with the commercial distribution (OX EE) where is seems to come from, but for me it does not work.

    There are mainly two problems:

    1. If you use login2user.uid names with dots (".") inside cyrus will rewrite this to "^", eg. if your login name is "firstnam.lastname" cyrus will use "firstname^lastname" in the file system for your mailbox.

    So the spamrunner.pl can't find the directory for your mailbox.

    2. spamrunner.pl has problems if you use special characters for the folder names of "confirmed-spam" and "confirmed-ham". You can change this in "User.properties" configuration file of the OX admindaemon. For example I use customized german folder names:

    - confirmed-spam = "Junk-E-Mail/Bestätigt Spam"
    - confirmed-ham = "Junk-E-Mail/Bestätigt Ham"

    The german umlauts are the problem. Cyrus will encode this in IMAP-UTF-7, which means the directory names will be:

    - "Junk-E-Mail/Best&AOQ-tigt Spam" and
    - "Junk-E-Mail/Best&AOQ-tigt Ham"

    below your mailbox/INBOX.

    I tried to fix both problems directly in perl, but I'm not a perl programmer an perl seems not to know the "IMAP-UTF-7" encoding (perl Encode has only "UTF-7", which uses "+" instead of "&"). Also installing an old perl extension for "Encode::IMAPUtf7" did not solve the problem, because the extension seems to be faulty.

    So I decided to rewrite the script as a PHP-CLI command line script. Here is the result:

    PHP Code:
    #! /usr/bin/php
    <?php
    $DATASOURCE 
    "mysql:host=localhost;dbname=open-xchange-db";
    $DRIVER     "mysql";
    $CFPROP     "/opt/open-xchange/etc/admindaemon/configdb.properties";
    $IMAPPROP   "/etc/imapd.conf";
    $CYRSPOOL   "/var/spool/cyrus/mail/";

    # DO NOT CHANGE BELOW
    # -------------------------------------------------------------------

    $QUERY 'SELECT login2user.uid,user_setting_mail.bits,
                     user_setting_mail.confirmed_spam,
                     user_setting_mail.confirmed_ham
              FROM user_setting_mail,login2user
              WHERE user_setting_mail.user=login2user.id
              AND login2user.cid=1'
    ;
    $SPAM_ENABLED_BIT 4096;

    $DBUSER     FALSE;
    $DBPASS     FALSE;
    $HIERSEP    FALSE;
    $USEUTF8    FALSE;
    $ENCODING   FALSE;
    $UIDSEP     = array(".""^");

    $mysqldriver FALSE;
    foreach (
    PDO::getAvailableDrivers() AS $entry) {
      if (
    $entry $DRIVER) {
        
    $mysqldriver $entry;
      }
    }

    if (
    $mysqldriver === FALSE) {
      die(
    "mysqldriver not installed, exiting");
    }
      
    $PROP file_get_contents($CFPROPFILE_TEXT) or die("unable to open $CFPROP");
    if (
    preg_match("/readProperty.1=user=(.*)/"$PROP$tmp) == 1) {
      
    $DBUSER $tmp[1];
    }
    if (
    preg_match("/readProperty.2=password=(.*)/"$PROP$tmp) == 1) {
      
    $DBPASS $tmp[1];
    }
    if (
    preg_match("/readProperty.3=useUnicode=(.*)/"$PROP$tmp) == 1) {
      
    $USEUTF8 $tmp[1];
    }
    if (
    preg_match("/readProperty.4=characterEncoding=(.*)/"$PROP$tmp) == 1) {
      
    $ENCODING $tmp[1];
    }

    $PROP file_get_contents($IMAPPROPFILE_TEXT) or die("unable to open $IMAPPROP");
    if (
    preg_match("/unixhierarchysep:\s*(\w*)\s*/"$PROP$tmp) == 1) {
      
    $sep strtoupper($tmp[1]);
      if ( (
    $sep == "YES") || ($sep == "1") || ($sep == "ON") ) {
        
    $HIERSEP "/";
      } else {
        
    $HIERSEP ".";
      }
    }

    if ( (
    $HIERSEP === FALSE) || ($DBUSER === FALSE) || ($DBPASS === FALSE) || ($USEUTF8 === FALSE) || ($ENCODING === FALSE) ) {
      die(
    "unable to determine required system parameters");
    }
      
    #print "using \"$HIERSEP\" as IMAP separator\n";
    #print "using \"$DBUSER\" as db user\n";

    try {
      
    $dbh = new PDO($DATASOURCE$DBUSER$DBPASS);
      
      
    $stmt $dbh->prepare($QUERY);
      if (
    $stmt->execute()) {
        while (list(
    $uid$bits$cspam$cham) = $stmt->fetch(PDO::FETCH_NUM)) {
          
    # print "$uid $bits $cspam $cham\n";
          
    if ( ($bits $SPAM_ENABLED_BIT) == $SPAM_ENABLED_BIT ) {
            
    #print "checking for spam and ham for $uid\n";
             
            
    $cspam mb_convert_encoding($cspam"UTF7-IMAP");
            
    $cham  mb_convert_encoding($cham"UTF7-IMAP");
            
             
            
    $userdir  $CYRSPOOL.substr($uid01)."/user/".str_replace($UIDSEP[0], $UIDSEP[1], $uid);
            
    $cspamdir $userdir."/".$cspam;
            
    $chamdir  $userdir."/".$cham;       

            
    # learn spam
            
    if (file_exists($cspamdir)) {
              
    $foundfiles 0;
              
              
    $fileList getListOfMails($cspamdir);
              foreach (
    $fileList as $file) {
                
    $foundfiles++;
                
    $file $cspamdir."/".$file;
                
    pipeSALearn("spam"$file$uid);
                
    unlink($file);
              }
              
              if (
    $foundfiles 0) {
                
    cyrReconstruct($uid$cspam);
              }
            }
            
            
    # learn ham
            
    if (file_exists($chamdir)) {
              
    $foundfiles 0;
              
              
    $fileList getListOfMails($chamdir);
              foreach (
    $fileList as $file) {
                
    $foundfiles++;
                
    $file $chamdir."/".$file;
                
    pipeSALearn("ham"$file$uid);
                
    unlink($file);
              }
              
              if (
    $foundfiles 0) {
                
    cyrReconstruct($uid$cham);
              }
            }
          }
        }
      }

      
    $dbh null
    } catch (
    Exception $e) {
      echo 
    "Failed: " $e->getMessage();
      
    $dbh null;
    }

    # returns array ref containing all mails in folder
    #
    function getListOfMails($folder) {

      
    $mlist = array();

      if (
    is_dir($folder)) {
        if (
    $dh opendir($folder)) {
          while ((
    $file readdir($dh)) !== false) {
            
    $name $folder "/" $file;
            if ( (
    is_file($name)) && (ereg("^([0-9]{1,20}\.{1})$"$file)) ) {
              
    $mlist[] = $file;
            }
          }
          
    closedir($dh);
        }
      }

      return(
    $mlist);
    }

    # pipe mail into sa-learn
    # arg1 == spam or ham
    # arg2 == abs path to file
    #
    function pipeSALearn($type$file$uid) {
      
    $sacmd "/bin/su $uid -c \"/usr/bin/sa-learn --$type --no-sync\" 2> /dev/null";
      
    $SAOUT popen($sacmd"w");
      if (
    $SAOUT) {
        
    $co file($file);
        foreach (
    $co as $row) {
          
    fwrite($SAOUT$row);
        }
        
    fclose($SAOUT);
      } else {
        die(
    "unable to start sa-learn");
      }
    }

    # calling cyrus reconstruct for specific folder
    # arg1 == user name
    # arg2 == folder name
    #
    function cyrReconstruct($user$folder) {
      global 
    $HIERSEP;
      
    $cyrcmd "/bin/su cyrus -c \"/usr/sbin/cyrreconstruct -r user".
                
    escapeshellarg($HIERSEP.$user.$HIERSEP.$folder)."\"";
      
    $retVal exec($cyrcmd$output);
      if (!
    $retVal) {
        die(
    "unable to start cyrus reconstruct");
      }
    }

    ?>
    The script is as close to the original "spamrunner.pl" as possible for me. To activate the script install "PHP-CLI", PDO and "pdo_mysql" on your server. Copy the script to some location you like and link it to "/etc/cron.hourly". Then make it executable "chmod 755 spamrunner.php".

    But there is some more to do to fully activate spam training if you installed with the community installer. Or at least I had to do more (Debian 4.0r3) and Hyperion from 2008-02-06.

    Here is a short description what I did:

    Activate Spam training

    1. Install Hyperion with community edition installer

    2. Modify the following files in /opt/open-xchange/etc/admindaemon

    - In User.properties change the *_MAILFOLDER variables as you like

    - In User.properties set "UID_NUMBER_START" to a value >0 eg. 5000
    - In Group.properties set "GID_NUMBER_START" to a value >0 eg. 5000

    This will activate real uids/gids for your user and groups. Be careful, that you choose a start for both which is already in use (see /etc/passwd and /etc/group) on your system. If these variables are set to 0 OX will use the same UID/GID for all users and groups.

    - In User.properties set "CREATE_HOMEDIRECTORY=true", because SpamAssassin will save the bayesian data in the home directory of the users.

    Then restart OX! All changes will take effect after a restart and only for newly created users.

    If you already have a running Hyperion (with created users and groups) you will have to modify the MySQL database manually (give them uidNumber/gidNumber and create home dirs).

    3. libnss-mysql was not correctly configured on my system after installation. You need to do the following things:

    - In "/etc/nsswitch.conf" modify the lines

    Code:
    passwd:         compat
    group:          compat 
    shadow:         compat
    to

    Code:
    passwd:         compat mysql
    group:          compat mysql
    shadow:         compat mysql
    - In "/etc/libnss-mysql.cfg" modify the line:

    Code:
    getspnam SELECT login2user.uid,'x',user.uidNumber,user.gidNumber,user.shadowLastChange,0,0,-1,-1,'A' FROM login2user,user WHERE login2user.cid=1 AND login2user.id=user.id AND login2user.uid='%1$s' LIMIT 1
    to

    Code:
    getspnam SELECT login2user.uid,'x',user.shadowLastChange,0,0,-1,-1,-1,'A' FROM login2user,user WHERE login2user.cid=1 AND login2user.id=user.id AND login2user.uid='%1$s' LIMIT 1
    That's it. After this "spamrunner.php" will do the job.

    One last remark: "spamrunner.php" will remove the Spam/Ham files after feeding them to sa-learn.

    Best regards,
    Eike

  • #2
    hello ....

    sorry for my english, but i will try my best :-)

    I have exact the problem Eike is talking about.

    1. If you use login2user.uid names with dots (".") inside cyrus will rewrite this to "^", eg. if your login name is "firstnam.lastname" cyrus will use "firstname^lastname" in the file system for your mailbox.

    This is the first time i can read about this problem. Thank you Eike for the work.

    Eike has written a PHP-CLI command line script and i think it will work fine. But i am worried about updates in the future. Every update will overwrite the spamrunner.pl and every time i must fix the problem.

    Is it possible to integrate Eikes work in the original spamrunner.pl?

    Who is responsible for the spamrunner.pl?

    And again ... thank you for the script Eike.

    Greetings
    Christoph

    Comment


    • #3
      Hi Christoph,

      I've reported the problem to the official bugtracker. I think this will be fixed in a short time. If you don't want to use the PHP script you can fix this error in the perl script (/opt/open-xchange/sbin/spamrunner.pl) by adding

      Code:
      $uid =~ s/\./\^/;
      after line 134. So it should like like that:

      Code:
      while( $stmt->fetch() ) {
        #print "$uid $bits $cspam $cham\n";
        if( ($bits & $SPAM_ENABLED_BIT) == $SPAM_ENABLED_BIT ) {
          $uid =~ s/\./\^/;
          print "checking for spam and ham for $uid\n";
          my $userdir = $CYRSPOOL.substr($uid,0,1)."/user/".$uid;
          my $cspamdir = $userdir."/".$cspam;
          my $chamdir = $userdir."/".$cham;
      That problem was easy to fix, but the encoding problems (I mentioned in my first post) was not that easy in perl... (for me at least).

      You can also use the spamrunner.php without overwriting it after an update. Just copy spamrunner.php to somewhere outside the OX installation, eg. /usr/bin and symlink it to /etc/cron.hourly as something different than spamrunner (eg. spamrunner_NEW). Only /opt/open-xchange/sbin/spamrunner.pl and /etc/cron.hourly/spamrunner will be overwritten after an update.

      Best regards, Eike

      Comment


      • #4
        and if you are using more than one "." in the username like firstname.surename.domain the code would be
        $uid =~ s/\./\^/g;

        Comment

        Working...
        X