Jetty behavior with bounded execution queues

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Jetty behavior with bounded execution queues

Steven Schlansker
Hello Jetty Friends,

My Awesome Jetty Based Proxy Server 2.0 has now been in production for a few months,
delivering traffic like there's no tomorrow, and mostly it runs great.

However, we occasionally suffer transient outages, where one of our backing services
suddenly is very slow to respond.  We do have limits, but even with reasonably aggressive limits O(10 seconds)
we handle enough traffic that the very fronting proxy server queues quite quickly if one of its backends goes away.

Normally, that's fine -- we're just a proxy server -- but additionally we are observing that under periods of high load
we are experiencing "leaks" both of our internal tracking metrics as well as actual file descriptors.

Roughly pseudocoding, we do
class AwesomeJettyProxyServer20 extends AsyncMiddleManServlet {
  void service(...) {
    inFlight++;
    super.service();
  }

  void onProxyResponse{Success,Failure}(...) {
    inFlight--;
    super.onProxyResponseFailure(...);
  }
}

In normal operation this works fine, and we see all our inFlight metrics hover around 0, as they should.
However under load, we find that eventually our queue gets filled with work and rejects.  Exactly as designed!

        • 2018-02-12T22:23:49.385Z WARN  <ed9d2edc-8d56-40b4-a0a5-e051055b3f08> [default-pool-47] o.e.j.util.thread.QueuedThreadPool - QueuedThreadPool@default-pool{STARTED,192<=192<=192,i=163,q=128} rejected CEP:SocketChannelEndPoint@6ae596a1{mesos-slave23-qa-sf.qasql.opentable.com/10.21.7.45:31813<->/10.21.5.214:39492,OPEN,fill=FI,flush=-,to=1372/32000}{io=1/0,kio=1,kro=1}->HttpConnectionOverHTTP@304be30b(l:/10.21.5.214:39492 <-> r:mesos-slave23-qa-sf.qasql.opentable.com/10.21.7.45:31813,closed=false)=>HttpChannelOverHTTP@6be5713f(exchange=HttpExchange@190a2a2 req=TERMINATED/null@null res=PENDING/null@null)[send=HttpSenderOverHTTP@5d0ed377(req=QUEUED,snd=COMPLETED,failure=null)[HttpGenerator@1b1816e9{s=START}],recv=HttpReceiverOverHTTP@5b34f03c(rsp=IDLE,failure=null)[HttpParser{s=START,0 of -1}]]:runFillable:BLOCKING
        • 2018-02-12T22:23:49.386Z WARN  <ed9d2edc-8d56-40b4-a0a5-e051055b3f08> [default-pool-47] o.e.j.u.t.strategy.EatWhatYouKill -
        • java.util.concurrent.RejectedExecutionException: CEP:SocketChannelEndPoint@6ae596a1{mesos-slave23-qa-sf.qasql.opentable.com/10.21.7.45:31813<->/10.21.5.214:39492,OPEN,fill=FI,flush=-,to=1373/32000}{io=1/0,kio=1,kro=1}->HttpConnectionOverHTTP@304be30b(l:/10.21.5.214:39492 <-> r:mesos-slave23-qa-sf.qasql.opentable.com/10.21.7.45:31813,closed=false)=>HttpChannelOverHTTP@6be5713f(exchange=HttpExchange@190a2a2 req=TERMINATED/null@null res=PENDING/null@null)[send=HttpSenderOverHTTP@5d0ed377(req=QUEUED,snd=COMPLETED,failure=null)[HttpGenerator@1b1816e9{s=START}],recv=HttpReceiverOverHTTP@5b34f03c(rsp=IDLE,failure=null)[HttpParser{s=START,0 of -1}]]:runFillable:BLOCKING
        • at org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:402)
        • at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:249)
        • at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
        • at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        • at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
        • at com.opentable.metrics.JettyServerMetricsConfiguration$OTQueuedThreadPool.runJob(JettyServerMetricsConfiguration.java:83)
        • at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
        • at java.lang.Thread.run(Thread.java:748)

[ Stack trace is an example, in reality we get them from all over the place ]


But oh no, this seems (maybe?) to totally break the Jetty handling flow -- after we see these sorts of exceptions fill the logs, we never again return to baseline 0 connections read -- and if you 'lsof' the process later, you see a number of:

java    4729 root  688u  IPv4          263035204      0t0       TCP 10.21.5.214:31267->10.21.4.7:39887 (CLOSE_WAIT)
java    4729 root  690u  IPv4          263106017      0t0       TCP 10.21.5.214:31267->10.21.4.7:34776 (CLOSE_WAIT)
java    4729 root  716u  IPv4          263175957      0t0       TCP 10.21.5.214:31267->10.21.4.7:33373 (CLOSE_WAIT)
java    4729 root  740u  IPv4          263189945      0t0       TCP 10.21.5.214:31267->10.21.4.7:59734 (CLOSE_WAIT)

Eventually, the process runs out of file descriptors and effectively dies.


We don't have proof positive that it is this execution rejection causing the problem, but it seems like a very likely cause -- failing to execute a callback could easily leak a half shutdown socket, or cause one of the proxy servlet callbacks to not get invoked, and we've noticed a strong correlation between seeing it in our logs and finding mysteriously unhealthy instances.

Does this sound like an explanation for our observed file descriptor and accounting "leak"?  What's the right way to configure Jetty to reject excess load, while still cleaning up after itself?  It seems that tasks related to already existing requests should preferentially continue to execute over accepting new entries into the queue, but I don't see how to implement something like that.  Should we switch from the AbortPolicy to e.g. CallerRunsPolicy?  I've observed (what I think were) deadlocks with small number of threads (< 10), would we risk similar phenomena with CallerRunsPolicy?

Finally, we also configure the Client and Server to share the same thread pool.  Are there additional dangers lurking here, if the Jetty Client gets similar REEs from the pool?

Thanks for any advice!
Steven


_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users

signature.asc (849 bytes) Download Attachment