[jetty-users] how to handle invalid UTF8 chars

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[jetty-users] how to handle invalid UTF8 chars

Mattia Merzi
Hi everyone,

I've recently updated jetty libs (I'm using jetty embedded) to the latest
version, and I'm having troubles with this exception:

org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
UTF8! byte 20 in state 3

is there a way to ask jetty to just trash invalid chars instead of
throwing an exception?

If you need some more details, just ask.

Thanks a lot,

Greetings,

Mattia.
_______________________________________________
jetty-users mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-users] how to handle invalid UTF8 chars

Simone Bordet-2
Hi,

On Tue, Jan 24, 2012 at 15:13, Mattia Merzi <[hidden email]> wrote:

> Hi everyone,
>
> I've recently updated jetty libs (I'm using jetty embedded) to the latest
> version, and I'm having troubles with this exception:
>
> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
> UTF8! byte 20 in state 3
>
> is there a way to ask jetty to just trash invalid chars instead of
> throwing an exception?
>
> If you need some more details, just ask.

Is this a body of a request ? If so, the solution is to specify the
right content-type and charset.

If this a URL ? If so, it must be encoded as UTF8.

Something else ?

Stack trace ?

Simon
--
http://cometd.org
http://intalio.com
http://bordet.blogspot.com
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz
_______________________________________________
jetty-users mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-users] how to handle invalid UTF8 chars

Mattia Merzi
right, a few more infos:

- requests that generate this exception do not specify the
content-type head or have a correct content-type (UTF8)
(we trash any request with a content-type != UTF8)
- requests *may* contain *invalid utf8 sequences*, but we would like
to *accept them anyway*, trashing the invalid characters or
replacing them with something else; please not to ask
why, it would be really complicated :)
- we do not have any kind of control on the clients, it is impossible
for us to change the requests or to ask the clients to change their
behaviour
- implementing a filtering on the data (something like a proxy that
deletes all invalid characters) would be very expensive: we have
sometimes something like tens of thousands of requests per minute,
and most of all, it would be one more software to install, configure,
and maintain :)
- requests pass through an apache web server with mod_proxy;
if interesting, I can post the configuration
- various types of post or get data generate this exception, maybe
very short (8~10 bytes payload), maybe very big (8~10 MB payload)
- jetty is used embedded in our application; if useful I can post the most
relevant code used to start the servlet container, let me know
- jetty 7.3.0 works perfectly,  jetty 8.1.0.RC2 generate this exception:
org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
UTF8! byte A1 in state 0
        at org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:168)
        at org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:93)
        at org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(UrlEncoded.java:482)
        at org.eclipse.jetty.util.UrlEncoded.decodeTo(UrlEncoded.java:533)
        at org.eclipse.jetty.server.Request.extractParameters(Request.java:277)
        at org.eclipse.jetty.server.Request.getParameterNames(Request.java:709)
[... our classes that extends servlet ]
- if needed, I can post the tcpdump of a few requests that generate the
exception, but sorry, I have not one available here

Thanks,

Greetings,

Mattia.



2012/1/24 Simone Bordet <[hidden email]>:

> Hi,
>
> On Tue, Jan 24, 2012 at 15:13, Mattia Merzi <[hidden email]> wrote:
>> Hi everyone,
>>
>> I've recently updated jetty libs (I'm using jetty embedded) to the latest
>> version, and I'm having troubles with this exception:
>>
>> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
>> UTF8! byte 20 in state 3
>>
>> is there a way to ask jetty to just trash invalid chars instead of
>> throwing an exception?
>>
>> If you need some more details, just ask.
>
> Is this a body of a request ? If so, the solution is to specify the
> right content-type and charset.
>
> If this a URL ? If so, it must be encoded as UTF8.
>
> Something else ?
>
> Stack trace ?
>
> Simon
> --
> http://cometd.org
> http://intalio.com
> http://bordet.blogspot.com
> ----
> Finally, no matter how good the architecture and design are,
> to deliver bug-free software with optimal performance and reliability,
> the implementation technique must be flawless.   Victoria Livschitz
> _______________________________________________
> jetty-users mailing list
> [hidden email]
> https://dev.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-users] how to handle invalid UTF8 chars

Thomas Becker
Hi Mattia,

there's currently a little inconsitency in the jetty code regarding
this. I've opened a #bugzilla and will provide a patch for it shortly.
We've to discuss if we put it into 7.6.0 and the next 8.1.0 releases as
we're actually already in some kind of code freeze.

Here's the issue: https://bugs.eclipse.org/bugs/show_bug.cgi?id=369602

Cheers,
Thomas

On 1/24/12 10:25 PM, Mattia Merzi wrote:

> right, a few more infos:
>
> - requests that generate this exception do not specify the
> content-type head or have a correct content-type (UTF8)
> (we trash any request with a content-type != UTF8)
> - requests *may* contain *invalid utf8 sequences*, but we would like
> to *accept them anyway*, trashing the invalid characters or
> replacing them with something else; please not to ask
> why, it would be really complicated :)
> - we do not have any kind of control on the clients, it is impossible
> for us to change the requests or to ask the clients to change their
> behaviour
> - implementing a filtering on the data (something like a proxy that
> deletes all invalid characters) would be very expensive: we have
> sometimes something like tens of thousands of requests per minute,
> and most of all, it would be one more software to install, configure,
> and maintain :)
> - requests pass through an apache web server with mod_proxy;
> if interesting, I can post the configuration
> - various types of post or get data generate this exception, maybe
> very short (8~10 bytes payload), maybe very big (8~10 MB payload)
> - jetty is used embedded in our application; if useful I can post the most
> relevant code used to start the servlet container, let me know
> - jetty 7.3.0 works perfectly,  jetty 8.1.0.RC2 generate this exception:
> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
> UTF8! byte A1 in state 0
>          at org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:168)
>          at org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:93)
>          at org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(UrlEncoded.java:482)
>          at org.eclipse.jetty.util.UrlEncoded.decodeTo(UrlEncoded.java:533)
>          at org.eclipse.jetty.server.Request.extractParameters(Request.java:277)
>          at org.eclipse.jetty.server.Request.getParameterNames(Request.java:709)
> [... our classes that extends servlet ]
> - if needed, I can post the tcpdump of a few requests that generate the
> exception, but sorry, I have not one available here
>
> Thanks,
>
> Greetings,
>
> Mattia.
>
>
>
> 2012/1/24 Simone Bordet<[hidden email]>:
>> Hi,
>>
>> On Tue, Jan 24, 2012 at 15:13, Mattia Merzi<[hidden email]>  wrote:
>>> Hi everyone,
>>>
>>> I've recently updated jetty libs (I'm using jetty embedded) to the latest
>>> version, and I'm having troubles with this exception:
>>>
>>> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
>>> UTF8! byte 20 in state 3
>>>
>>> is there a way to ask jetty to just trash invalid chars instead of
>>> throwing an exception?
>>>
>>> If you need some more details, just ask.
>> Is this a body of a request ? If so, the solution is to specify the
>> right content-type and charset.
>>
>> If this a URL ? If so, it must be encoded as UTF8.
>>
>> Something else ?
>>
>> Stack trace ?
>>
>> Simon
>> --
>> http://cometd.org
>> http://intalio.com
>> http://bordet.blogspot.com
>> ----
>> Finally, no matter how good the architecture and design are,
>> to deliver bug-free software with optimal performance and reliability,
>> the implementation technique must be flawless.   Victoria Livschitz
>> _______________________________________________
>> jetty-users mailing list
>> [hidden email]
>> https://dev.eclipse.org/mailman/listinfo/jetty-users
> _______________________________________________
> jetty-users mailing list
> [hidden email]
> https://dev.eclipse.org/mailman/listinfo/jetty-users

--
thomas becker
[hidden email]

http://webtide.com / http://intalio.com
(the folks behind jetty and cometd)

_______________________________________________
jetty-users mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-users] how to handle invalid UTF8 chars

Mattia Merzi
Great, I will wait until the next release before upgrading our
production servers.

I can't promise, but I will do my best to try the current development
branch and let you know if the patch fixes my problem (I'm pretty
sure but ... :)

Thanks a lot for the fast support, Thomas.

Greetings,

Mattia.



2012/1/24 Thomas Becker <[hidden email]>:

> Hi Mattia,
>
> there's currently a little inconsitency in the jetty code regarding this.
> I've opened a #bugzilla and will provide a patch for it shortly. We've to
> discuss if we put it into 7.6.0 and the next 8.1.0 releases as we're
> actually already in some kind of code freeze.
>
> Here's the issue: https://bugs.eclipse.org/bugs/show_bug.cgi?id=369602
>
> Cheers,
> Thomas
>
>
> On 1/24/12 10:25 PM, Mattia Merzi wrote:
>>
>> right, a few more infos:
>>
>> - requests that generate this exception do not specify the
>> content-type head or have a correct content-type (UTF8)
>> (we trash any request with a content-type != UTF8)
>> - requests *may* contain *invalid utf8 sequences*, but we would like
>> to *accept them anyway*, trashing the invalid characters or
>> replacing them with something else; please not to ask
>> why, it would be really complicated :)
>> - we do not have any kind of control on the clients, it is impossible
>> for us to change the requests or to ask the clients to change their
>> behaviour
>> - implementing a filtering on the data (something like a proxy that
>> deletes all invalid characters) would be very expensive: we have
>> sometimes something like tens of thousands of requests per minute,
>> and most of all, it would be one more software to install, configure,
>> and maintain :)
>> - requests pass through an apache web server with mod_proxy;
>> if interesting, I can post the configuration
>> - various types of post or get data generate this exception, maybe
>> very short (8~10 bytes payload), maybe very big (8~10 MB payload)
>> - jetty is used embedded in our application; if useful I can post the most
>> relevant code used to start the servlet container, let me know
>> - jetty 7.3.0 works perfectly,  jetty 8.1.0.RC2 generate this exception:
>> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
>> UTF8! byte A1 in state 0
>>         at
>> org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:168)
>>         at
>> org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:93)
>>         at
>> org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(UrlEncoded.java:482)
>>         at org.eclipse.jetty.util.UrlEncoded.decodeTo(UrlEncoded.java:533)
>>         at
>> org.eclipse.jetty.server.Request.extractParameters(Request.java:277)
>>         at
>> org.eclipse.jetty.server.Request.getParameterNames(Request.java:709)
>> [... our classes that extends servlet ]
>> - if needed, I can post the tcpdump of a few requests that generate the
>> exception, but sorry, I have not one available here
>>
>> Thanks,
>>
>> Greetings,
>>
>> Mattia.
>>
>>
>>
>> 2012/1/24 Simone Bordet<[hidden email]>:
>>>
>>> Hi,
>>>
>>> On Tue, Jan 24, 2012 at 15:13, Mattia Merzi<[hidden email]>
>>>  wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I've recently updated jetty libs (I'm using jetty embedded) to the
>>>> latest
>>>> version, and I'm having troubles with this exception:
>>>>
>>>> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
>>>> UTF8! byte 20 in state 3
>>>>
>>>> is there a way to ask jetty to just trash invalid chars instead of
>>>> throwing an exception?
>>>>
>>>> If you need some more details, just ask.
>>>
>>> Is this a body of a request ? If so, the solution is to specify the
>>> right content-type and charset.
>>>
>>> If this a URL ? If so, it must be encoded as UTF8.
>>>
>>> Something else ?
>>>
>>> Stack trace ?
>>>
>>> Simon
>>> --
>>> http://cometd.org
>>> http://intalio.com
>>> http://bordet.blogspot.com
>>> ----
>>> Finally, no matter how good the architecture and design are,
>>> to deliver bug-free software with optimal performance and reliability,
>>> the implementation technique must be flawless.   Victoria Livschitz
>>> _______________________________________________
>>> jetty-users mailing list
>>> [hidden email]
>>> https://dev.eclipse.org/mailman/listinfo/jetty-users
>>
>> _______________________________________________
>> jetty-users mailing list
>> [hidden email]
>> https://dev.eclipse.org/mailman/listinfo/jetty-users
>
>
> --
> thomas becker
> [hidden email]
>
> http://webtide.com / http://intalio.com
> (the folks behind jetty and cometd)
>
>
> _______________________________________________
> jetty-users mailing list
> [hidden email]
> https://dev.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-users] how to handle invalid UTF8 chars

Thomas Becker
You're welcome. The fix made it into 7.6.0/8.1.0 which will be released
shortly.

On 1/25/12 9:16 AM, Mattia Merzi wrote:

> Great, I will wait until the next release before upgrading our
> production servers.
>
> I can't promise, but I will do my best to try the current development
> branch and let you know if the patch fixes my problem (I'm pretty
> sure but ... :)
>
> Thanks a lot for the fast support, Thomas.
>
> Greetings,
>
> Mattia.
>
>
>
> 2012/1/24 Thomas Becker<[hidden email]>:
>> Hi Mattia,
>>
>> there's currently a little inconsitency in the jetty code regarding this.
>> I've opened a #bugzilla and will provide a patch for it shortly. We've to
>> discuss if we put it into 7.6.0 and the next 8.1.0 releases as we're
>> actually already in some kind of code freeze.
>>
>> Here's the issue: https://bugs.eclipse.org/bugs/show_bug.cgi?id=369602
>>
>> Cheers,
>> Thomas
>>
>>
>> On 1/24/12 10:25 PM, Mattia Merzi wrote:
>>> right, a few more infos:
>>>
>>> - requests that generate this exception do not specify the
>>> content-type head or have a correct content-type (UTF8)
>>> (we trash any request with a content-type != UTF8)
>>> - requests *may* contain *invalid utf8 sequences*, but we would like
>>> to *accept them anyway*, trashing the invalid characters or
>>> replacing them with something else; please not to ask
>>> why, it would be really complicated :)
>>> - we do not have any kind of control on the clients, it is impossible
>>> for us to change the requests or to ask the clients to change their
>>> behaviour
>>> - implementing a filtering on the data (something like a proxy that
>>> deletes all invalid characters) would be very expensive: we have
>>> sometimes something like tens of thousands of requests per minute,
>>> and most of all, it would be one more software to install, configure,
>>> and maintain :)
>>> - requests pass through an apache web server with mod_proxy;
>>> if interesting, I can post the configuration
>>> - various types of post or get data generate this exception, maybe
>>> very short (8~10 bytes payload), maybe very big (8~10 MB payload)
>>> - jetty is used embedded in our application; if useful I can post the most
>>> relevant code used to start the servlet container, let me know
>>> - jetty 7.3.0 works perfectly,  jetty 8.1.0.RC2 generate this exception:
>>> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
>>> UTF8! byte A1 in state 0
>>>          at
>>> org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:168)
>>>          at
>>> org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:93)
>>>          at
>>> org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(UrlEncoded.java:482)
>>>          at org.eclipse.jetty.util.UrlEncoded.decodeTo(UrlEncoded.java:533)
>>>          at
>>> org.eclipse.jetty.server.Request.extractParameters(Request.java:277)
>>>          at
>>> org.eclipse.jetty.server.Request.getParameterNames(Request.java:709)
>>> [... our classes that extends servlet ]
>>> - if needed, I can post the tcpdump of a few requests that generate the
>>> exception, but sorry, I have not one available here
>>>
>>> Thanks,
>>>
>>> Greetings,
>>>
>>> Mattia.
>>>
>>>
>>>
>>> 2012/1/24 Simone Bordet<[hidden email]>:
>>>> Hi,
>>>>
>>>> On Tue, Jan 24, 2012 at 15:13, Mattia Merzi<[hidden email]>
>>>>   wrote:
>>>>> Hi everyone,
>>>>>
>>>>> I've recently updated jetty libs (I'm using jetty embedded) to the
>>>>> latest
>>>>> version, and I'm having troubles with this exception:
>>>>>
>>>>> org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid
>>>>> UTF8! byte 20 in state 3
>>>>>
>>>>> is there a way to ask jetty to just trash invalid chars instead of
>>>>> throwing an exception?
>>>>>
>>>>> If you need some more details, just ask.
>>>> Is this a body of a request ? If so, the solution is to specify the
>>>> right content-type and charset.
>>>>
>>>> If this a URL ? If so, it must be encoded as UTF8.
>>>>
>>>> Something else ?
>>>>
>>>> Stack trace ?
>>>>
>>>> Simon
>>>> --
>>>> http://cometd.org
>>>> http://intalio.com
>>>> http://bordet.blogspot.com
>>>> ----
>>>> Finally, no matter how good the architecture and design are,
>>>> to deliver bug-free software with optimal performance and reliability,
>>>> the implementation technique must be flawless.   Victoria Livschitz
>>>> _______________________________________________
>>>> jetty-users mailing list
>>>> [hidden email]
>>>> https://dev.eclipse.org/mailman/listinfo/jetty-users
>>> _______________________________________________
>>> jetty-users mailing list
>>> [hidden email]
>>> https://dev.eclipse.org/mailman/listinfo/jetty-users
>>
>> --
>> thomas becker
>> [hidden email]
>>
>> http://webtide.com / http://intalio.com
>> (the folks behind jetty and cometd)
>>
>>
>> _______________________________________________
>> jetty-users mailing list
>> [hidden email]
>> https://dev.eclipse.org/mailman/listinfo/jetty-users
> _______________________________________________
> jetty-users mailing list
> [hidden email]
> https://dev.eclipse.org/mailman/listinfo/jetty-users

--
thomas becker
[hidden email]

http://webtide.com / http://intalio.com
(the folks behind jetty and cometd)

_______________________________________________
jetty-users mailing list
[hidden email]
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: [jetty-users] how to handle invalid UTF8 chars

cgw0827
This post has NOT been accepted by the mailing list yet.
In reply to this post by Mattia Merzi
hello Mattia Merzi

When I receive the news of the remote server, the program reported such an error:

[18:10:41] qtp1543148593-13 WARN  [] [] [org.eclipse.jetty.util.UrlEncoded] - org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 9d in state 0
[18:10:41] qtp1543148593-13 WARN  [] [] [org.eclipse.jetty.util.UrlEncoded] - org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 4d in state 3
[18:10:41] qtp1543148593-13 WARN  [] [] [org.eclipse.jetty.util.UrlEncoded] - org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte D3 in state 8
[18:10:41] qtp1543148593-13 INFO  [] [] [com.woo.gameplus.log.Log] - get msgType::3
[18:10:41] qtp1543148593-13 ERROR  [] [] [com.djly.billing.launch.code.BillingGameRpcServlet] - Error handle GET request
org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte E6 in state 2
        at org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:168)
        at org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:107)
        at org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:490)
        at org.eclipse.jetty.http.HttpURI.getQuery(HttpURI.java:610)
        at org.eclipse.jetty.server.Request.getQueryString(Request.java:773)
        at com.djly.billing.launch.code.BillingGameRpcServlet.getMsgBytesFromGet(BillingGameRpcServlet.java:79)

My jetty is jetty-all-7.6.5.v20120716.jar.

The strange thing is that if you use the browser directly to access will not have this error.
I'm very sad now,what can I do?