[jetty-dev] Jetty Client Problems

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jetty-dev] Jetty Client Problems

Susan Javurek
Hi,

I have a customer who is using a camel route inside a fuse container that has a jetty:http producer.  We've suddenly started seeing problems where all of our requests are timing out due to lack of response.

When messages are flowing smoothly, we see about 16 connections to the backend server using LSOF to query connection state. All of these are in an ESTABLISHED state.  

At some point, things go awry and we are not getting responses in a timely fashion (httpClient.timeout=15000). Once the timeouts start occurring, the number of connections spike in the hundreds - all in an ESTABLISHED state.  All of the requests are timing out as if we did not get a response. We ran a tcp trace and were able to see that the request  made it to the back end server and that the server responded in milliseconds, however, the jetty client didn't recognize the response.

Restarting our bundle (inside service mix) or running JSTACK seems to free things up.  The latter was surprising to me but we believe that suspending the process momentarily reaps some of the connections.
In the first thread dump there are some threads that look problematic:

Thread t@20457: (state = BLOCKED)
- java.nio.channels.spi.AbstractSelector.cancel(java.nio.channels.SelectionKey) @bci=6, line=71 (Compiled frame)
- java.nio.channels.spi.AbstractSelectionKey.cancel() @bci=24, line=56 (Compiled frame)
- java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel() @bci=50, line=207 (Compiled frame)
- java.nio.channels.spi.AbstractInterruptibleChannel.close() @bci=23, line=97 (Compiled frame)
- org.eclipse.jetty.io.nio.ChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=199, line=258 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=2, line=325 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.fill() @bci=322, line=1035 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.parseNext() @bci=84, line=280 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.parseAvailable() @bci=1, line=235 (Compiled frame)
- org.eclipse.jetty.client.AsyncHttpConnection.handle() @bci=400, line=133 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() @bci=10, line=627 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() @bci=4, line=51 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=608 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=47, line=543 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

We also see a fair few of these:

Thread t@20488: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=196 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=68, line=2025 (Compiled frame)
 - org.eclipse.jetty.util.BlockingArrayQueue.poll(long, java.util.concurrent.TimeUnit) @bci=53, line=342 (Compiled frame)
 - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll() @bci=12, line=526 (Compiled frame)
 - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(org.eclipse.jetty.util.thread.QueuedThreadPool) @bci=1, line=44 (Compiled frame)
 - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=275, line=572 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Unfortunately, the only place this is repeatable is in my customers production environment.  We did see there are some interesting threads that are sort of in this area.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=387487
https://bugs.eclipse.org/bugs/show_bug.cgi?id=416477

http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02221.html
http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02224.html

https://java.net/jira/browse/GRIZZLY-547
Removed the selectionkey.cancel() we did right before the socket.close().
The cancel allowed for the selector thread to trigger too early, while the
other thread was still doing its socket.close(), hence becoming stuck at the
synchronized (stateLock) in the SocketChannelImpl.

However, I have nothing definitive to tie to my customer issue. I was wondering if someone might be able to say yes, that's this bug or issue.

One of my colleagues thought he may have seen an issue that JDK HTTP connection's code cannot recover from the timeout by reading the JDK HTTP connection's code.  

this issue suddenly started occurring after months of successful processing. The only changes we've been able to identify was an upgrade to the latest Solaris kernel (https://getupdates.oracle.com/readme/150401-03) which included an upgrade to Java 1.6  65.

Any insight greatly appreciated.

Susan
_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-dev] Jetty Client Problems

Simone Bordet-2
Hi,

On Fri, Nov 22, 2013 at 3:32 AM, Susan Javurek <[hidden email]> wrote:

> Hi,
>
> I have a customer who is using a camel route inside a fuse container that has a jetty:http producer.  We've suddenly started seeing problems where all of our requests are timing out due to lack of response.
>
> When messages are flowing smoothly, we see about 16 connections to the backend server using LSOF to query connection state. All of these are in an ESTABLISHED state.
>
> At some point, things go awry and we are not getting responses in a timely fashion (httpClient.timeout=15000). Once the timeouts start occurring, the number of connections spike in the hundreds - all in an ESTABLISHED state.  All of the requests are timing out as if we did not get a response. We ran a tcp trace and were able to see that the request  made it to the back end server and that the server responded in milliseconds, however, the jetty client didn't recognize the response.
>
> Restarting our bundle (inside service mix) or running JSTACK seems to free things up.  The latter was surprising to me but we believe that suspending the process momentarily reaps some of the connections.
> In the first thread dump there are some threads that look problematic:
>
> Thread t@20457: (state = BLOCKED)
> - java.nio.channels.spi.AbstractSelector.cancel(java.nio.channels.SelectionKey) @bci=6, line=71 (Compiled frame)
> - java.nio.channels.spi.AbstractSelectionKey.cancel() @bci=24, line=56 (Compiled frame)
> - java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel() @bci=50, line=207 (Compiled frame)
> - java.nio.channels.spi.AbstractInterruptibleChannel.close() @bci=23, line=97 (Compiled frame)
> - org.eclipse.jetty.io.nio.ChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=199, line=258 (Compiled frame)
> - org.eclipse.jetty.io.nio.SelectChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=2, line=325 (Compiled frame)
> - org.eclipse.jetty.http.HttpParser.fill() @bci=322, line=1035 (Compiled frame)
> - org.eclipse.jetty.http.HttpParser.parseNext() @bci=84, line=280 (Compiled frame)
> - org.eclipse.jetty.http.HttpParser.parseAvailable() @bci=1, line=235 (Compiled frame)
> - org.eclipse.jetty.client.AsyncHttpConnection.handle() @bci=400, line=133 (Compiled frame)
> - org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() @bci=10, line=627 (Compiled frame)
> - org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() @bci=4, line=51 (Compiled frame)
> - org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=608 (Compiled frame)
> - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=47, line=543 (Compiled frame)
> - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)
>
> We also see a fair few of these:
>
> Thread t@20488: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=196 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=68, line=2025 (Compiled frame)
>  - org.eclipse.jetty.util.BlockingArrayQueue.poll(long, java.util.concurrent.TimeUnit) @bci=53, line=342 (Compiled frame)
>  - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll() @bci=12, line=526 (Compiled frame)
>  - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(org.eclipse.jetty.util.thread.QueuedThreadPool) @bci=1, line=44 (Compiled frame)
>  - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=275, line=572 (Compiled frame)
>  - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)
>
>
> Unfortunately, the only place this is repeatable is in my customers production environment.  We did see there are some interesting threads that are sort of in this area.
>
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=387487
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=416477
>
> http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02221.html
> http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02224.html
>
> https://java.net/jira/browse/GRIZZLY-547
> Removed the selectionkey.cancel() we did right before the socket.close().
> The cancel allowed for the selector thread to trigger too early, while the
> other thread was still doing its socket.close(), hence becoming stuck at the
> synchronized (stateLock) in the SocketChannelImpl.
>
> However, I have nothing definitive to tie to my customer issue. I was wondering if someone might be able to say yes, that's this bug or issue.
>
> One of my colleagues thought he may have seen an issue that JDK HTTP connection's code cannot recover from the timeout by reading the JDK HTTP connection's code.
>
> this issue suddenly started occurring after months of successful processing. The only changes we've been able to identify was an upgrade to the latest Solaris kernel (https://getupdates.oracle.com/readme/150401-03) which included an upgrade to Java 1.6  65.

What version of Java ? "1.6 65" does not seem like a right Java version ?
What version of Jetty ?

Do you have any chance to upgrade Camel to use Jetty 9.1's HttpClient
? Should be just matter of writing a new Camel Component based on the
new HttpClient. Not sure how big a task that is, though.

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.
_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-dev] Jetty Client Problems

Susan Javurek
Hi Simone, 

Yes, update 65 is apparently available. However, it seems my customer has it installed but is not using it. There's no change to java in this scenario. The versions are:

* java 1.6.0_33
* jetty-client-7.6.7.v20120910.jar (so sorry for not including that!). 

The customer is in production so I'm afraid an upgrade to latest version of Jetty isn't possible. The only change we are aware of is an update to the Kernel:


There has been no change in the load running through the route. It's un-reproducable in any other environment.  Updating the bundle or touching the process in some way, i.e jstack which suspends the process briefly seems to have a positive effect. 

Have you all seen this type of issue before? Does the stack trace lend any clues?

Any insight appreciated, 
Susan




On Nov 23, 2013, at 5:49 PM, Simone Bordet <[hidden email]> wrote:

Hi,

On Fri, Nov 22, 2013 at 3:32 AM, Susan Javurek <[hidden email]> wrote:
Hi,

I have a customer who is using a camel route inside a fuse container that has a jetty:http producer.  We've suddenly started seeing problems where all of our requests are timing out due to lack of response.

When messages are flowing smoothly, we see about 16 connections to the backend server using LSOF to query connection state. All of these are in an ESTABLISHED state.

At some point, things go awry and we are not getting responses in a timely fashion (httpClient.timeout=15000). Once the timeouts start occurring, the number of connections spike in the hundreds - all in an ESTABLISHED state.  All of the requests are timing out as if we did not get a response. We ran a tcp trace and were able to see that the request  made it to the back end server and that the server responded in milliseconds, however, the jetty client didn't recognize the response.

Restarting our bundle (inside service mix) or running JSTACK seems to free things up.  The latter was surprising to me but we believe that suspending the process momentarily reaps some of the connections.
In the first thread dump there are some threads that look problematic:

Thread t@20457: (state = BLOCKED)
- java.nio.channels.spi.AbstractSelector.cancel(java.nio.channels.SelectionKey) @bci=6, line=71 (Compiled frame)
- java.nio.channels.spi.AbstractSelectionKey.cancel() @bci=24, line=56 (Compiled frame)
- java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel() @bci=50, line=207 (Compiled frame)
- java.nio.channels.spi.AbstractInterruptibleChannel.close() @bci=23, line=97 (Compiled frame)
- org.eclipse.jetty.io.nio.ChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=199, line=258 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=2, line=325 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.fill() @bci=322, line=1035 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.parseNext() @bci=84, line=280 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.parseAvailable() @bci=1, line=235 (Compiled frame)
- org.eclipse.jetty.client.AsyncHttpConnection.handle() @bci=400, line=133 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() @bci=10, line=627 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() @bci=4, line=51 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=608 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=47, line=543 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

We also see a fair few of these:

Thread t@20488: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=196 (Compiled frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=68, line=2025 (Compiled frame)
- org.eclipse.jetty.util.BlockingArrayQueue.poll(long, java.util.concurrent.TimeUnit) @bci=53, line=342 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll() @bci=12, line=526 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(org.eclipse.jetty.util.thread.QueuedThreadPool) @bci=1, line=44 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=275, line=572 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Unfortunately, the only place this is repeatable is in my customers production environment.  We did see there are some interesting threads that are sort of in this area.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=387487
https://bugs.eclipse.org/bugs/show_bug.cgi?id=416477

http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02221.html
http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02224.html

https://java.net/jira/browse/GRIZZLY-547
Removed the selectionkey.cancel() we did right before the socket.close().
The cancel allowed for the selector thread to trigger too early, while the
other thread was still doing its socket.close(), hence becoming stuck at the
synchronized (stateLock) in the SocketChannelImpl.

However, I have nothing definitive to tie to my customer issue. I was wondering if someone might be able to say yes, that's this bug or issue.

One of my colleagues thought he may have seen an issue that JDK HTTP connection's code cannot recover from the timeout by reading the JDK HTTP connection's code.

this issue suddenly started occurring after months of successful processing. The only changes we've been able to identify was an upgrade to the latest Solaris kernel (https://getupdates.oracle.com/readme/150401-03) which included an upgrade to Java 1.6  65.

What version of Java ? "1.6 65" does not seem like a right Java version ?
What version of Jetty ?

Do you have any chance to upgrade Camel to use Jetty 9.1's HttpClient
? Should be just matter of writing a new Camel Component based on the
new HttpClient. Not sure how big a task that is, though.

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.
_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev

Susan Javurek
Customer Support | Red Hat, Inc. 
FuseSource is now part of Red Hat
sjavure[hidden email] | fusesource.com redhat.com


_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-dev] Jetty Client Problems

Simone Bordet-2
Hi,

On Wed, Nov 27, 2013 at 12:01 AM, Susan Javurek <[hidden email]> wrote:

> Hi Simone,
>
> Yes, update 65 is apparently available. However, it seems my customer has it
> installed but is not using it. There's no change to java in this scenario.
> The versions are:
>
> * java 1.6.0_33
> * jetty-client-7.6.7.v20120910.jar (so sorry for not including that!).
>
> The customer is in production so I'm afraid an upgrade to latest version of
> Jetty isn't possible. The only change we are aware of is an update to the
> Kernel:
>
> https://getupdates.oracle.com/readme/150401-03
>
> There has been no change in the load running through the route. It's
> un-reproducable in any other environment.  Updating the bundle or touching
> the process in some way, i.e jstack which suspends the process briefly seems
> to have a positive effect.
>
> Have you all seen this type of issue before? Does the stack trace lend any
> clues?

No, never seen this issue before, sorry.

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.
_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-dev] Jetty Client Problems with new Solaris Patch

Susan Javurek
Thanks Simone. 

Unfortunately, I still don't have a test case. As an experiment we swapped jetty component with HTTP4 and this seems to avoid the problem. In that light we've narrowed the issue to either a problem in Jetty or in the new OS patch that was recently applied (or some combination of the two). 

The system was running fine until the following OS patch was installed:


As we can only reproduce this in production, removing the patch was not a popular option as affected too many other systems.  The threads I mentioned below seem to be a moment in time for that particular lock up. Subsequent threads didn't show the same traces - just that things were blocked.  I'm afraid it was a red herring.

This particular route talks to an IIS server via F5 Load Balancer. The Load Balancer has been reconfigured to use a pass through type connection so it should be direct communications channel now.  We've also taken the load balancer out of the equation as well. We still saw the same issue. When the problem arises, the number of connections grow - all stuck in an ESTABLISHED state.   TCP traces show that data is sent back but they are ignored and the request times out. Restarting or suspending the process clears the issue. 

Would you be able to shed any light on what may be in the above patch that would affect the jetty in any way? Something isn't playing nicely there and we haven't been able to isolate it.  

Any information greatly appreciated, 

Susan



On Dec 3, 2013, at 4:27 PM, Simone Bordet <[hidden email]> wrote:

Hi,

On Wed, Nov 27, 2013 at 12:01 AM, Susan Javurek <[hidden email]> wrote:
Hi Simone,

Yes, update 65 is apparently available. However, it seems my customer has it
installed but is not using it. There's no change to java in this scenario.
The versions are:

* java 1.6.0_33
* jetty-client-7.6.7.v20120910.jar (so sorry for not including that!).

The customer is in production so I'm afraid an upgrade to latest version of
Jetty isn't possible. The only change we are aware of is an update to the
Kernel:

https://getupdates.oracle.com/readme/150401-03

There has been no change in the load running through the route. It's
un-reproducable in any other environment.  Updating the bundle or touching
the process in some way, i.e jstack which suspends the process briefly seems
to have a positive effect.

Have you all seen this type of issue before? Does the stack trace lend any
clues?

No, never seen this issue before, sorry.

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.
_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev

Susan Javurek
Customer Support | Red Hat, Inc. 
FuseSource is now part of Red Hat
sjavure[hidden email] | fusesource.com redhat.com


_______________________________________________
jetty-dev mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-dev