Dear Open-Xchange Users and Developers,
At the University of Leiden (the Netherlands) we are looking for a replacement of Netmail (novell). The two products we are looking at are Zarafa and Open-Xchange. One of the demands of our new e-mail (groupware) system is the availability of a GAL (Global Adress List). Now here comes the crux: We have at least 40.000 users in one GAL. These are students, employees and others. If we look-up a user in say MS-Exchange (yes we use that too!) it's really snappy (response time is less than 1/10 Seconds) but unfortunately the same cannot be said about Open-Xchange. The response time is ~180 seconds, furthermore the process-load of the java process shoots to a admirable 200% CPU-time. The load on the (other TIER) mysql-server is a mere 17% (nose picking!). The mysql-server is optimized for caching and it looks it behaves just as such. So where is this 200% CPU-time coming from. As you can clearly see, a CPU-time of 200% for a single user looking up a GAL is unacceptable, let alone 5000 (concurrent!!!) users are doing this. And yes they will do this as soon as they log in. Personally I'd like Open-Xchange to be the winner of Zarafa thus I'm willing to spend a lot of time into this problem but I am stuck. The settings I tried to tweak (and succeeded to some extend!) are:
/opt/open-xchange/etc/admindaemon/ox-admin-scriptconf.sh:
JAVA_XTRAOPTS="-Xms4096m -Xmx4096m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 \
-XX:SurvivorRatio=20000 -XX:+UseCMSInitiatingOccupancyOnly \
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true"
JAVA_OXCMD_OPTS="-Xms4096m -Xmx4096m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 \
-XX:SurvivorRatio=20000 -XX:+UseCMSInitiatingOccupancyOnly \
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true"
AND
/opt/open-xchange/etc/groupware/ox-scriptconf.sh:
JAVA_XTRAOPTS="-Xmx4096m -Xms4096m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 \
-XX:SurvivorRatio=20000 -XX:+UseCMSInitiatingOccupancyOnly \
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC"
These setting result in a first time login of say 3 minutes. I assume it's building some sort of cache???. The second time it will be in the order of a minute. By login-time I mean, the amount of time the JAVA process takes to process all thing it must process to let a user login. For the user (no being able to see the Linux-terminal) it is the time needed before the Global Address List is viewable. If I scroll a bit through this GAL the whole thing starts over, and it typically takes about a minute to show results.
As you can see, this will not scale up to 40.000 users let alone the 5000 concurrent users we are about to expect, leaving Zarafa the clear winner (I don't like that).
Futhermore, I we do a bulk import (via oxldapsync, the horror!) it process 1 user every 3 seconds?. The load of java shoots up and so on...
So what's to be done? Any ideas?
Kind regards,
Robert Nagtegaal.
some info about the software and machines:
MACHINE 0: LVS (linux virtual machine): Load balance between the four ox servers.
MACHINE 1: OX+Apache+dovecot+postfix: SLES 11.1, 64bit, 6GB working memory, Net-Apps FibreChannel disks.
MACHINE 2: MySQL cluster (preformance is good)
MACHINE 3: LDAP cluster (preformance is good)
At the University of Leiden (the Netherlands) we are looking for a replacement of Netmail (novell). The two products we are looking at are Zarafa and Open-Xchange. One of the demands of our new e-mail (groupware) system is the availability of a GAL (Global Adress List). Now here comes the crux: We have at least 40.000 users in one GAL. These are students, employees and others. If we look-up a user in say MS-Exchange (yes we use that too!) it's really snappy (response time is less than 1/10 Seconds) but unfortunately the same cannot be said about Open-Xchange. The response time is ~180 seconds, furthermore the process-load of the java process shoots to a admirable 200% CPU-time. The load on the (other TIER) mysql-server is a mere 17% (nose picking!). The mysql-server is optimized for caching and it looks it behaves just as such. So where is this 200% CPU-time coming from. As you can clearly see, a CPU-time of 200% for a single user looking up a GAL is unacceptable, let alone 5000 (concurrent!!!) users are doing this. And yes they will do this as soon as they log in. Personally I'd like Open-Xchange to be the winner of Zarafa thus I'm willing to spend a lot of time into this problem but I am stuck. The settings I tried to tweak (and succeeded to some extend!) are:
/opt/open-xchange/etc/admindaemon/ox-admin-scriptconf.sh:
JAVA_XTRAOPTS="-Xms4096m -Xmx4096m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 \
-XX:SurvivorRatio=20000 -XX:+UseCMSInitiatingOccupancyOnly \
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true"
JAVA_OXCMD_OPTS="-Xms4096m -Xmx4096m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 \
-XX:SurvivorRatio=20000 -XX:+UseCMSInitiatingOccupancyOnly \
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true"
AND
/opt/open-xchange/etc/groupware/ox-scriptconf.sh:
JAVA_XTRAOPTS="-Xmx4096m -Xms4096m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 \
-XX:SurvivorRatio=20000 -XX:+UseCMSInitiatingOccupancyOnly \
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC"
These setting result in a first time login of say 3 minutes. I assume it's building some sort of cache???. The second time it will be in the order of a minute. By login-time I mean, the amount of time the JAVA process takes to process all thing it must process to let a user login. For the user (no being able to see the Linux-terminal) it is the time needed before the Global Address List is viewable. If I scroll a bit through this GAL the whole thing starts over, and it typically takes about a minute to show results.
As you can see, this will not scale up to 40.000 users let alone the 5000 concurrent users we are about to expect, leaving Zarafa the clear winner (I don't like that).
Futhermore, I we do a bulk import (via oxldapsync, the horror!) it process 1 user every 3 seconds?. The load of java shoots up and so on...
So what's to be done? Any ideas?
Kind regards,
Robert Nagtegaal.
some info about the software and machines:
MACHINE 0: LVS (linux virtual machine): Load balance between the four ox servers.
MACHINE 1: OX+Apache+dovecot+postfix: SLES 11.1, 64bit, 6GB working memory, Net-Apps FibreChannel disks.
MACHINE 2: MySQL cluster (preformance is good)
MACHINE 3: LDAP cluster (preformance is good)
Comment