[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
Issue Type: Bug Bug
Affects Versions: 8.1.2
Assignee: Unassigned
Components: HTTP
Created: 06/Jul/12 5:21 AM
Description:

Affects 8.1.2 and possibly later, didn't check.

Set the platform encoding to -Dfile.encoding=UTF-16 (I'm dodging the question if this is a valid configuration for now). This reveals bugs with dependencies on default platform encoding, for example HTTP's content-type/ charset is decoded via call to MimeTypes#getCharsetFromContentType(Buffer) where it traverses the buffer assuming it is US-ASCII (which makes sense), but then does:

return CACHE.lookup(value.peek(start,i-start)).toString();

This is repeated in two places. Buffer's default toString() uses platform encoding so this will be screwed up if the content's US-ASCII characters don't align perfectly with it (ebcdic, UTF-16, etc.).

There is no workaround. A fix is to decode with US-ASCII as HTTP headers should be in this encoding.

Project: Jetty
Priority: Trivial Trivial
Reporter: Dawid Weiss
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
Change By: Jan Bartel (02/Aug/12 1:35 AM)
Assignee: Jan Bartel
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Jan Bartel commented on Bug JETTY-1532

Hi David,

I've fixed those, thanks. You mention it occurs in other places, do you have a list of them?

thanks
Jan

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Dawid Weiss commented on Bug JETTY-1532

I did a simple scan for Buffer.toString() and it's called in various places. Since this method is fundamentally broken (uses platform encoding for converting a buffer to a string) then I'd assume any call that relies on it will break sooner or later given non-lower-ascii default codepage. What I would do is replace all calls to Buffer.toString or even make it return something that is not a direct representation of the buffer ("Buffer: ASCII contents=XXX").

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Jan Bartel commented on Bug JETTY-1532

Dawid,

I'm not sure Buffer.toString() is fundamentally broken. The toString (charset) method is there to be called when you know the charset of the bytes represented by the buffer. If you don't know what the bytes represent, then guessing the platform encoding is as good as anything.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Dawid Weiss commented on Bug JETTY-1532

By "broken" I meant exactly that – the bytes in a Buffer will rarely match the platform's default encoding. The Buffer is (from my brief analysis) constructed from fragments of incoming HTTP request (headers, for example). This implicit assumption that the codepage of headers/ other data and the platform will match is just plain wrong so any use of Buffer's toString() method is, from my perspective, a source of potential problems. From what I remember, toString() is used not only for diagnostics (debugging) but also for other program logic. I'd say toString() should be allowed only when assertions are enabled (so that debugging can be done) but a specialized toString(Charset) should be exposed for program logic. Or whatever else but verifying that the byte buffer matches the request codepage.

Obviously you may disagree with this, but it is really easy to verify in practice – run your jetty instance with -Dfile.encoding=UTF-32. Does it work and pass all the tests? If so, it's fine and doesn't depend on the default codepage. If it doesn't, it's broken to me.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Jan Bartel resolved Bug JETTY-1532 as Fixed

Hi Dawid,

Setting file.encoding does not work because the jvm does not allow it.

In any case, my point still stands: we want to use the encoding when it is known, and use the platform's when it is not known.

I'm going to close this issue for now as I believe we've fixed the instance where we had a known encoding type but were failing to use it. If you have any other specific instances, then I'd appreciate it if you open a new issue over at Jetty's bugzilla at Eclipse, which is Jetty's home: https://bugs.eclipse.org/bugs/buglist.cgi?cmdtype=runnamed&namedcmd=jetty-bugs&list_id=3859265

cheers
Jan

Change By: Jan Bartel (13/Dec/12 8:55 PM)
Resolution: Fixed
Fix Version/s: 8.1.6
Status: Open Resolved
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Dawid Weiss commented on Bug JETTY-1532

Setting file.encoding does not work because the jvm does not allow it.

You're wrong in thinking so. All JVMs I know of allow the default file encoding to be overridden. More importantly – platform encoding sensitive methods are legacy inheritance that should be avoided because they make your software (and bug reports) reliable on an unknown context. Plain and simple. I don't think there is any reason for software like Jetty to rely on default platform encoding, in particular when handling HTTP requests.

Anyway, up to you of course.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Reply | Threaded
Open this post in threaded view
|

[jira] (JETTY-1532) HTTP headers decoded with platform's default encoding

JIRA jira@codehaus.org
In reply to this post by JIRA jira@codehaus.org
Uwe Schindler commented on Bug JETTY-1532

Hi,
it may be interesting to you to read: http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html or http://blog.joda.org/2012/12/annotating-jdk-default-data.html

Jetty is a server-based software and the character encoding used by the client is completely unrelated to the default encoding by the server platform. In the case of HTTP, the HTTP/1.1 standard specifies the encoding to be used for constructing URLs, parsing headers (US-ASCII or UTF-8, depends on).

To Jetty, also the big issue maybe Locale-specific defaults, used e.g. by lowercasing strings. I hope you have tested your server software on a system with tr_TR default locale (at Lucene we choose a random locale and charset for every test run)! If your tests pass, then you seem to correctly handle case-insensitivity, but if they fail in that locale you should really think of passing Locale.ROOT to things like String.toLowerCase(), when they are language-unspecific! I am just thinking of matching HTTP-headers in a case insensitive way against a internal map.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email