Announcement

Collapse
No announcement yet.

excessive number groupware threads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • excessive number groupware threads

    During last week I've run with my up to now smoothly running ox 6.18.0 rev7 install into very weird troubles with incresing number of groupware java process threads. The symptoms are: webmail login hangs at 10%, no http login possible and there are following repeating messages in open-xchange.log after hang problems start:

    Code:
    Dec 12, 2010 12:46:03 PM com.openexchange.ajp13.najp.AJPv13TaskWatcher$TaskRunCallable handleExceededTask
    INFO: AJP Listener "AJPListener-0003027" exceeds max. running time of 60000msec -> Processing time: 1468589msec
    java.lang.Throwable
            at sun.misc.Unsafe.park(Native Method)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
            at com.openexchange.mail.api.MailAccess.getInstance(MailAccess.java:262)
            at com.openexchange.mail.MailServletInterfaceImpl.initMailAccess(MailServletInterfaceImpl.java:1574)
            at com.openexchange.mail.MailServletInterfaceImpl.initConnection(MailServletInterfaceImpl.java:1558)
            at com.openexchange.mail.MailServletInterfaceImpl.getTrashFolder(MailServletInterfaceImpl.java:1525)
            at com.openexchange.groupware.settings.tree.modules.mail.folder.Trash$1.getValue(Trash.java:106)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readSharedValue(RdbSettingStorage.java:338)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readValues(RdbSettingStorage.java:280)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readSubValues(RdbSettingStorage.java:389)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readValues(RdbSettingStorage.java:276)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readSubValues(RdbSettingStorage.java:389)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readValues(RdbSettingStorage.java:276)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readSubValues(RdbSettingStorage.java:389)
            at com.openexchange.groupware.settings.impl.RdbSettingStorage.readValues(RdbSettingStorage.java:276)
            at com.openexchange.ajax.ConfigMenu.doGet(ConfigMenu.java:107)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:596)
            at com.openexchange.ajax.AJAXServlet.service(AJAXServlet.java:392)
            at com.openexchange.ajax.SessionServlet.service(SessionServlet.java:174)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
            at com.openexchange.ajp13.najp.AJPv13RequestHandlerImpl.doServletService(AJPv13RequestHandlerImpl.java:504)
            at com.openexchange.ajp13.AJPv13Request.response(AJPv13Request.java:129)
            at com.openexchange.ajp13.najp.AJPv13RequestHandlerImpl.createResponse(AJPv13RequestHandlerImpl.java:316)
            at com.openexchange.ajp13.najp.AJPv13ConnectionImpl.createResponse(AJPv13ConnectionImpl.java:207)
            at com.openexchange.ajp13.najp.AJPv13Task.call(AJPv13Task.java:346)
            at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
            at java.util.concurrent.FutureTask.run(FutureTask.java:138)
            at com.openexchange.threadpool.internal.CustomThreadPoolExecutor$Worker.runTask(CustomThreadPoolExecutor.java:738)
            at com.openexchange.threadpool.internal.CustomThreadPoolExecutor$Worker.run(CustomThreadPoolExecutor.java:764)
            at java.lang.Thread.run(Thread.java:619)
    Dec 12, 2010 12:46:03 PM com.openexchange.ajp13.najp.AJPv13TaskWatcher$TaskRunCallable keepAliveGetEmptyBody
    INFO: AJP KEEP-ALIVE: Flushed empty GET-BODY request to socket "/127.0.0.1:51366" to initiate a KEEP-ALIVE poll.
    Dec 12, 2010 12:46:03 PM com.openexchange.ajp13.najp.AJPv13TaskWatcher$TaskRunCallable keepAliveGetEmptyBody
    INFO: AJP KEEP-ALIVE: Swallowed empty REQUEST-BODY from socket "/127.0.0.1:51366" initiated by former KEEP-ALIVE poll.
    Dec 12, 2010 12:46:03 PM com.openexchange.ajp13.najp.AJPv13TaskWatcher$TaskRunCallable keepAliveGetEmptyBody
    INFO: AJP KEEP-ALIVE: Flushed empty GET-BODY request to socket "/127.0.0.1:51252" to initiate a KEEP-ALIVE poll.
    Dec 12, 2010 12:46:03 PM com.openexchange.ajp13.najp.AJPv13TaskWatcher$TaskRunCallable keepAliveGetEmptyBody
    INFO: AJP KEEP-ALIVE: Swallowed empty REQUEST-BODY from socket "/127.0.0.1:51252" initiated by former KEEP-ALIVE poll.
    Dec 12, 2010 12:46:33 PM com.openexchange.ajp13.najp.AJPv13TaskWatcher$TaskRunCallable handleExceededTask
    INFO: AJP Listener "AJPListener-0004973" exceeds max. running time of 60000msec -> Processing time: 1397705msec
    Well, looking closer at running processes discloses that there are over 4000 groupware threads running:

    Code:
    java.lang:type=Threading,ThreadCount = 4640
    java.lang:type=Threading,TotalStartedThreadCount = 48162
    But there is actually nothing going on on this test system at the moment !
    Before this hang happens, there are about 66-71 threads on ox and it takes about 2 days for thread count to explode.
    The only change I did before this happened was to setup an funambol server with ox-listener (+ sessionid patch) which seems to work. There are 2 syncml clients on funambol. Prior explosion, I can see only these messages in open-xchange log for the 2 test users:

    Code:
    Dec 10, 2010 7:31:39 PM com.openexchange.authentication.imap.impl.IMAPAuthentication handleLoginInfo
    INFO: Imap authentication for user user1 successful on host localhost:143
    Dec 10, 2010 7:31:39 PM com.openexchange.authentication.imap.impl.IMAPAuthentication handleLoginInfo
    INFO: Imap authentication for user user2 successful on host localhost:143
    Dec 10, 2010 7:31:39 PM com.openexchange.login.internal.LoginPerformer logLoginRequest
    INFO: Login:user1 IP:192.168.100.15 AuthID:114f59454c1d456595a0f2950aa5430f Agent:Jakarta Commons-HttpClient/3.0 Client:default(null) Interface:HTTP_JSON Context:1(1,defaultcontext) User:62(user1) Session:0fb1d0399742473685ed2963659576f2
    Dec 10, 2010 7:31:39 PM com.openexchange.login.internal.LoginPerformer logLoginRequest
    INFO: Login:user2 IP:192.168.100.15 AuthID:0c200cb57e954f9eb49c9733cd6097fb Agent:Jakarta Commons-HttpClient/3.0 Client:default(null) Interface:HTTP_JSON Context:1(1,defaultcontext) User:24(user2) Session:b42b1d57856649ea9b69e16ad5dcf3c0
    Dec 10, 2010 7:31:39 PM com.openexchange.login.internal.LoginPerformer logLogout
    INFO: Logout  Context:1(1,defaultcontext) User:62(user1) Session:0fb1d0399742473685ed2963659576f2
    Dec 10, 2010 7:31:39 PM com.openexchange.login.internal.LoginPerformer logLogout
    INFO: Logout  Context:1(1,defaultcontext) User:24(user2) Session:b42b1d57856649ea9b69e16ad5dcf3c0
    which repeats every 1 minute (seems to be a non-optimal funambol/ox-listener default ...) until thread explosion happens. Well, I can see here just 2 logins and 2 clean logouts, nothing suspect.

    While there might be some relevant settings in threadpool.properties to play with list limit the number of threads etc., it seems to me rather like some bug. If I understand the threadpool, any idle/zombie threads should be terminated anyway after 60seconds.

    Restarting the open-xchange-groupware solves always the problem for another 2 days.

    And yes, this is running on sun java 1.6.0_u22

    Martin, any idea why this happens or where to look further ?

  • #2
    Hi,

    i suggest to report that as a bug at https://bugs.open-xchange.com/ perhaps some developer needs more details about the configuration and log file to debug this. This can be handled best in Bugzilla.

    I'm managing a OX 6.18.1 setup with about 400 users and excessive mobility and thunderbird oxtender usage. These are the stats after about 13 days of uptime.
    Code:
    java.lang:type=Threading,PeakThreadCount = 173
    java.lang:type=Threading,TotalStartedThreadCount = 146881
    java.lang:type=Threading,ThreadCount = 114
    Perhaps the issue is some memory, jvm or thread configuration of you machine. Are you using any kind of virtualization? Pleas add such environment information to the bug report.

    Thanks for reporting!
    Last edited by Martin Heiland; 12-12-2010, 05:58 PM.

    Comment


    • #3
      Hi Martin,

      seems like this bad weather keeps you online as me ;-)

      Well, your stats example above is pretty similar to what I had before starting to play with funambol (can't say its 100% but don't remember any other change).

      OX is running in its own Xen4 DomU. but I used it this way about 1 year since 6.14 and never seen such things before. There is also no live migration or whatever else going on while this happens,
      just normal stable system with 6 months uptime.

      I just made a copy&paste into https://bugs.open-xchange.com/show_bug.cgi?id=17803
      did not want to be noisy on bugtracker, but if you mean its ok...

      Thanks for quick reply !
      Last edited by sam; 12-12-2010, 06:20 PM.

      Comment


      • #4
        So, now it seems to be clear that the LoginPerfomer is guilty of this thread explosion. Disabling Funambol listener, lets see if ox becomes again stable.

        Comment


        • #5
          Summary

          I did longer testing in background. Let me summerize the results:

          - if funambol with ox-listener (v8.7.0 & 1.1.0 used) is connected to ox server (6.18.0 up to
          rev.20 tested), the number of threads starts to explode after few hours

          - after some more hours, no login is possible

          - there are plenty of LoginPerformer threads

          - before this happensm both ox & funambol work perfectly

          - there is nothing suspect in log, only repeating logins/logouts every minute due to funambol
          polling

          - well, while this is clearly related to funambol client activity, it seems to be actually a
          nasty security issue; whatever the client is doing this should never happen

          - unfortunately did not have time to repeat the tests with 6.18.1 or 6.18.2, probably few days
          later...

          Comment


          • #6
            Well. to give here an answer for other potential victims of this trap:

            do not set the
            maxNumOfConnections in mail.properties to anything but default 0, it is broken !

            Comment

            Working...
            X