Discussion:
[jetty-users] FW: avoiding earlyEOF
Robben, Bert
2018-06-21 05:23:52 UTC
Permalink
Hey all,

Some background: we have an app that consists of webservices that talk to each other. So the pattern of usage is that we have a few components that host a webserver and at the same time have a http client to talk to the others. We use Jetty webserver and Jetty httpclient for communication (9.4.7.v20170914).

The problem that we face is that we regularly see IOExceptions exceptions occurring in the communication between these components. See stacktrace below. These IOExceptions always contain the earlyEOF. A long time ago I already posted a similar message on this forum (see https://dev.eclipse.org/mhonarc/lists/jetty-users/msg07965.html). I followed the advice mentioned, upgraded to the latest version and explicitly set a different idleTimeout on both the server and the client.

However, that didn't help. We still see the occasional earlyEOF. What's worse, we now have another app with more webservices and more rest-communication. In this app, the earlyEOF happens frequently. And this during mild load tests (not that heavy, so nothing is overloaded), so it's clear to me that nothing is idle for a second. To get around the problem, as a test, we tried to replace the JettyClient with Apache HTTP client. There the problem is less of an issue, but we still see that REST calls failing. Our conclusion is that for some reason the server is closing the connection, but this while the client is still reading from the inputstream that contains the response body.

Does anyone have an idea how we can further diagnose and fix this?


Here's a typical stacktrace:

Caused by: java.io.EOFException: ***@516b4fbd(l:/10.58.234.140:34446<mailto:***@516b4fbd(l:/10.58.234.140:34446> <-> r:be-pom-node-04.clear2pay.com/10.58.234.175:28080,closed=false)=>***@125fa92a(exchange=***@71e96cb2 req=TERMINATED/***@null res=PENDING/***@null)[send=***@71a5f017(req=QUEUED,snd=COMPLETED,failure=null)[***@7a384f39{s=START}],recv=***@15c1bfad(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]<-***@2f77cd5b{be-pom-node-04.clear2pay.com/10.58.234.175:28080<-<mailto:-***@2f77cd5b%7bbe-pom-node-04.clear2pay.com/10.58.234.175:28080%3c->>/10.58.234.140:34446,ISHUT,fill=-,flush=-,to=1/0}{io=1/0,kio=1,kro=1}->***@516b4fbd(l:/10.58.234.140:34446<mailto:***@516b4fbd(l:/10.58.234.140:34446> <-> r:be-pom-node-04.clear2pay.com/10.58.234.175:28080,closed=false)=>***@125fa92a(exchange=***@71e96cb2 req=TERMINATED/***@null res=PENDING/***@null)[send=***@71a5f017(req=QUEUED,snd=COMPLETED,failure=null)[***@7a384f39{s=START}],recv=***@15c1bfad(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.earlyEOF(HttpReceiverOverHTTP.java:320)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1419)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.shutdown(HttpReceiverOverHTTP.java:196)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:143)
at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:70)
at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:130)
at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:116)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:104)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:243)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:679)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:597)

thanks,

Bert
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Shawn Heisey
2018-06-22 16:56:47 UTC
Permalink
Post by Robben, Bert
The problem that we face is that we regularly see IOExceptions
exceptions occurring in the communication between these components. See
stacktrace below. These IOExceptions always contain the earlyEOF. A long
time ago I already posted a similar message on this forum (see
https://dev.eclipse.org/mhonarc/lists/jetty-users/msg07965.html). I
followed the advice mentioned, upgraded to the latest version and
explicitly set a different idleTimeout on both the server and the client.
I'm not an expert by any means. Starting off with that in case I say
something wrong.

I come from the Apache Solr project, which uses Jetty. We see
EOFException quite a lot on the solr-user mailing list.

In my experience with Solr, this is almost always caused by the client
disconnecting (usually due to TCP socket timeout) before the server has
completed the request, because the user has set their socket timeout too
low on the client. When the server finally does try to respond,
EOFException is logged, because the connection is gone. If the client
is the one logging the exception, then the server may have closed the
connection, followed by the client trying to send more data, and failing.

I am not seeing any direct mention of what you are setting the idle
timeout to, but I do see this in your message: "so it’s clear to me
that nothing is idle for a second". In my opinion, setting socket or
idle timeouts to low numbers is asking for problems. Going as low as
one second will be extremely likely to lead to timeout issues.

I do understand the value of having these timeouts, but the timeout
needs to be significantly longer than you expect the requests to
actually take, because there may be situations where requests do take
longer than you expect them to.

Java software is prone to experiencing noticeable GC pauses, especially
as the heap size grows. I have seen pauses of 10-15 seconds happen with
an 8GB heap unless garbage collection is extensively tuned to avoid full
GC. For Solr, an 8GB heap could actually be quite small.

If there is no GC tuning beyond setting which collector to use, even the
G1 collector, which is Oracle's best option for low-pause collection,
will not avoid full GCs effectively. It is the full GC that causes the
most problems with pauses. GC tuning is very much an art form, and
settings that work well for one application may produce awful results on
another.

Let's say that you expect all requests to complete in 10 milliseconds or
less. So you set your timeout to 1 second, thinking that's always going
to be plenty of time. But then your application fills up its 2GB heap
right in the middle of handling one of those requests, and the resulting
garbage collection pauses the JVM for two seconds. The entity at the
other end of the connection is going to give up and close the connection
before the program experiencing the GC pause can respond. Tuning
garbage collection to reduce GC pauses is certainly a good idea, but if
the timeout were 10 seconds instead of one second, it probably would not
have had any problem.

Thanks,
Shawn
Steven Schlansker
2018-06-22 17:09:42 UTC
Permalink
Post by Shawn Heisey
Post by Robben, Bert
The problem that we face is that we regularly see IOExceptions
exceptions occurring in the communication between these components.
Let's say that you expect all requests to complete in 10 milliseconds or
less. So you set your timeout to 1 second, thinking that's always going
to be plenty of time. But then your application fills up its 2GB heap
right in the middle of handling one of those requests, and the resulting
garbage collection pauses the JVM for two seconds. The entity at the
other end of the connection is going to give up and close the connection
before the program experiencing the GC pause can respond. Tuning
garbage collection to reduce GC pauses is certainly a good idea, but if
the timeout were 10 seconds instead of one second, it probably would not
have had any problem.
You can (and should!) explicitly monitor these conditions. The JVM provides interesting
diagnostics output through JMX to monitor it, or you can directly measure:

https://github.com/opentable/otj-pausedetector

I run this in *every* application -- unexpected pauses cause all sorts of troubles,
monitoring it is cheap, and you'll save hours when you have a big warning
"hey, the JVM went to lunch for 30 seconds here, that might be why all this stuff broke"
Bill Ross
2018-06-22 18:45:52 UTC
Permalink
Is there a simple way to incorporate this in a start.jar from script
startup? If not, would it be worth building into jetty?

Thanks,

Bill
Post by Steven Schlansker
https://github.com/opentable/otj-pausedetector
public class MyCoolApp {
public static void main(String[]args) {
try (new JvmPauseAlarm(100,400).start()) {
runMyCoolApp();
}
}
}
Post by Steven Schlansker
Post by Shawn Heisey
Post by Robben, Bert
The problem that we face is that we regularly see IOExceptions
exceptions occurring in the communication between these components.
Let's say that you expect all requests to complete in 10 milliseconds or
less. So you set your timeout to 1 second, thinking that's always going
to be plenty of time. But then your application fills up its 2GB heap
right in the middle of handling one of those requests, and the resulting
garbage collection pauses the JVM for two seconds. The entity at the
other end of the connection is going to give up and close the connection
before the program experiencing the GC pause can respond. Tuning
garbage collection to reduce GC pauses is certainly a good idea, but if
the timeout were 10 seconds instead of one second, it probably would not
have had any problem.
You can (and should!) explicitly monitor these conditions. The JVM provides interesting
https://github.com/opentable/otj-pausedetector
I run this in *every* application -- unexpected pauses cause all sorts of troubles,
monitoring it is cheap, and you'll save hours when you have a big warning
"hey, the JVM went to lunch for 30 seconds here, that might be why all this stuff broke"
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Joakim Erdfelt
2018-06-22 18:50:52 UTC
Permalink
Jetty start uses Jetty XML.
Jetty XML doesn't support try-with-resources, or try-catch.
Also, Jetty XML expects an Object to be configured and returned.
Post by Bill Ross
Is there a simple way to incorporate this in a start.jar from script
startup? If not, would it be worth building into jetty?
Thanks,
Bill
Post by Steven Schlansker
https://github.com/opentable/otj-pausedetector
public class MyCoolApp {
public static void main(String[] args) {
try (new JvmPauseAlarm(100, 400).start()) {
runMyCoolApp();
}
}
}
The problem that we face is that we regularly see IOExceptions
exceptions occurring in the communication between these components.
Let's say that you expect all requests to complete in 10 milliseconds or
less. So you set your timeout to 1 second, thinking that's always going
to be plenty of time. But then your application fills up its 2GB heap
right in the middle of handling one of those requests, and the resulting
garbage collection pauses the JVM for two seconds. The entity at the
other end of the connection is going to give up and close the connection
before the program experiencing the GC pause can respond. Tuning
garbage collection to reduce GC pauses is certainly a good idea, but if
the timeout were 10 seconds instead of one second, it probably would not
have had any problem.
You can (and should!) explicitly monitor these conditions. The JVM provides interesting
https://github.com/opentable/otj-pausedetector
I run this in *every* application -- unexpected pauses cause all sorts of troubles,
monitoring it is cheap, and you'll save hours when you have a big warning
"hey, the JVM went to lunch for 30 seconds here, that might be why all this stuff broke"
_______________________________________________
To change your delivery options, retrieve your password, or unsubscribe from this list, visithttps://dev.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe
from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Steven Schlansker
2018-06-22 19:54:14 UTC
Permalink
The particular library I included was just for example,
if there's a simple way to make it nicely drop-in to Jetty that'd be totally great by us.
We embed all our Jettys so making it work nicely with XML based deployments was never a goal to us.
Post by Joakim Erdfelt
Jetty start uses Jetty XML.
Jetty XML doesn't support try-with-resources, or try-catch.
Also, Jetty XML expects an Object to be configured and returned.
Is there a simple way to incorporate this in a start.jar from script startup? If not, would it be worth building into jetty?
Thanks,
Bill
Post by Steven Schlansker
https://github.com/opentable/otj-pausedetector
public class MyCoolApp
{
public static void main(String[] args
) {
try (new JvmPauseAlarm(100, 400).
start()) {
runMyCoolApp();
}
}
}
Post by Steven Schlansker
Post by Shawn Heisey
Post by Robben, Bert
The problem that we face is that we regularly see IOExceptions
exceptions occurring in the communication between these components.
Let's say that you expect all requests to complete in 10 milliseconds or
less. So you set your timeout to 1 second, thinking that's always going
to be plenty of time. But then your application fills up its 2GB heap
right in the middle of handling one of those requests, and the resulting
garbage collection pauses the JVM for two seconds. The entity at the
other end of the connection is going to give up and close the connection
before the program experiencing the GC pause can respond. Tuning
garbage collection to reduce GC pauses is certainly a good idea, but if
the timeout were 10 seconds instead of one second, it probably would not
have had any problem.
You can (and should!) explicitly monitor these conditions. The JVM provides interesting
https://github.com/opentable/otj-pausedetector
I run this in *every* application -- unexpected pauses cause all sorts of troubles,
monitoring it is cheap, and you'll save hours when you have a big warning
"hey, the JVM went to lunch for 30 seconds here, that might be why all this stuff broke"
______________________________
_________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Robben, Bert
2018-06-25 06:49:35 UTC
Permalink
Thanks guys for trying to help me.

Some more info.

(1) It is always the client logging the timeout.
(2) The server timeout settings are configure to about 34 seconds.
for (Connector con : server.getConnectors()) {
if (con instanceof AbstractConnector) {
((AbstractConnector) con).setIdleTimeout(34321);
}
}
(3) We monitor activity and gc -- there are no long pauses (certainly not for more then 2-3 seconds). The app continues processing all the time without pausing.

The track that we're currently investigating is the role of the proxy. As we're running our app in dcos, all traffic is routed through ha-proxy. So http client connects to ha-proxy and ha-proxy directs further to the http server. It may be the case that in this setup, ha-proxy might not be properly configured. See https://stackoverflow.com/questions/44204603/marathon-lb-not-returning-keep-alive-headers and https://stackoverflow.com/questions/21550337/haproxy-netty-way-to-prevent-exceptions-on-connection-reset/40005338#40005338. As I understand this, as the proxy is inbetween, incorrect configuration could also be the cause of connections being closed unexpectedly. Certainly given the fact that the connections are long-lived (since the clients continue to send one message after the other to the same server).

Tbc,

Bert Robben
IT Architect Senior
POM
T: 000.000.0000
C: 000.000.0000
E: ***@fisglobal.com
FIS | Empowering the Financial World





-----Original Message-----
From: jetty-users-***@eclipse.org [mailto:jetty-users-***@eclipse.org] On Behalf Of Steven Schlansker
Sent: vrijdag 22 juni 2018 21:54
To: JETTY user mailing list <jetty-***@eclipse.org>
Subject: Re: [jetty-users] avoiding earlyEOF

The particular library I included was just for example, if there's a simple way to make it nicely drop-in to Jetty that'd be totally great by us.
We embed all our Jettys so making it work nicely with XML based deployments was never a goal to us.
Post by Joakim Erdfelt
Jetty start uses Jetty XML.
Jetty XML doesn't support try-with-resources, or try-catch.
Also, Jetty XML expects an Object to be configured and returned.
Is there a simple way to incorporate this in a start.jar from script startup? If not, would it be worth building into jetty?
Thanks,
Bill
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
ithub.com%2Fopentable%2Fotj-pausedetector&data=02%7C01%7Cbert.robben
%40fisglobal.com%7C0462cd541a7c4718769808d5d879f289%7Ce3ff91d834c84b
15a0b418910a6ac575%7C0%7C0%7C636652940625885882&sdata=RFmN5MTqVf8YV%
2FJn8lH%2B%2Fw7owQPMN1zQ8s5Ao8MsVSE%3D&reserved=0
public class MyCoolApp
{
public static void main(String[] args
) {
try (new JvmPauseAlarm(100, 400).
start()) {
runMyCoolApp();
}
}
}
Post by Shawn Heisey
Post by Robben, Bert
The problem that we face is that we regularly see IOExceptions
exceptions occurring in the communication between these components.
Let's say that you expect all requests to complete in 10
milliseconds or less. So you set your timeout to 1 second, thinking
that's always going to be plenty of time. But then your application
fills up its 2GB heap right in the middle of handling one of those
requests, and the resulting garbage collection pauses the JVM for
two seconds. The entity at the other end of the connection is going
to give up and close the connection before the program experiencing
the GC pause can respond. Tuning garbage collection to reduce GC
pauses is certainly a good idea, but if the timeout were 10 seconds
instead of one second, it probably would not have had any problem.
You can (and should!) explicitly monitor these conditions. The JVM
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
thub.com%2Fopentable%2Fotj-pausedetector&data=02%7C01%7Cbert.robben%4
0fisglobal.com%7C0462cd541a7c4718769808d5d879f289%7Ce3ff91d834c84b15a
0b418910a6ac575%7C0%7C0%7C636652940625885882&sdata=RFmN5MTqVf8YV%2FJn
8lH%2B%2Fw7owQPMN1zQ8s5Ao8MsVSE%3D&reserved=0
I run this in *every* application -- unexpected pauses cause all
sorts of troubles, monitoring it is cheap, and you'll save hours when
you have a big warning "hey, the JVM went to lunch for 30 seconds here, that might be why all this stuff broke"
______________________________
_________________
jetty-users mailing list
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fde
v.eclipse.org%2Fmailman%2Flistinfo%2Fjetty-users&data=02%7C01%7Cbert.
robben%40fisglobal.com%7C0462cd541a7c4718769808d5d879f289%7Ce3ff91d83
4c84b15a0b418910a6ac575%7C0%7C0%7C636652940625885882&sdata=vr94GxQf9z
RNEsiN4naVjxCm%2ByM2alooMYNATo%2Bmu7o%3D&reserved=0
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdev
.eclipse.org%2Fmailman%2Flistinfo%2Fjetty-users&data=02%7C01%7Cbert.ro
bben%40fisglobal.com%7C0462cd541a7c4718769808d5d879f289%7Ce3ff91d834c8
4b15a0b418910a6ac575%7C0%7C0%7C636652940625885882&sdata=vr94GxQf9zRNEs
iN4naVjxCm%2ByM2alooMYNATo%2Bmu7o%3D&reserved=0
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdev
.eclipse.org%2Fmailman%2Flistinfo%2Fjetty-users&data=02%7C01%7Cbert.ro
bben%40fisglobal.com%7C0462cd541a7c4718769808d5d879f289%7Ce3ff91d834c8
4b15a0b418910a6ac575%7C0%7C0%7C636652940625885882&sdata=vr94GxQf9zRNEs
iN4naVjxCm%2ByM2alooMYNATo%2Bmu7o%3D&reserved=0
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Loading...