Discussion:
[jetty-users] How to escape Unicode with JSON.toString()?
Alexander Farber
2018-03-28 15:01:42 UTC
Permalink
Hello fellow Jetty users and developers,

is it please possible to escape UTF-8 characters when using

org.eclipse.jetty.util.ajax.JSON.toString() method?

I understand that it might be an internal library, but until now it
works well for me in a servlet which among other tasks sends push
notifications via FCM (Firebase Cloud Messaging) and ADM (Amazon
Device Messaging).

However my problem with the latter is that ADM does not accept any
UTF-8 chars (in my case Cyrillic) and reproducibly fails with the
cryptic error message:

<SerializationException>
<Message>Could not parse XML</Message>
</SerializationException>

java.lang.IllegalStateException:
unknown char '<'(60) in |||<SerializationException>| <Message>Could
not parse XML</Message>|</SerializationException>||

So is there maybe some possibility in Jetty 9.4.8.v20171121 to encode the chars?

Here is my Java code:

// this string is POSTed to ADM server
public String toAdmBody() {
Map<String, Object> root = new HashMap<>();
Map<String, String> data = new HashMap<>();
root.put(KEY_DATA, data);
data.put(KEY_BODY, mBody);
// ADM does not accept integers
data.put(KEY_GID, String.valueOf(mGid));
// TODO encode utf8 chars
return JSON.toString(root);
}

Thank you
Alex
Joakim Erdfelt
2018-03-28 15:19:31 UTC
Permalink
org.eclipse.jetty.util.ajax.JSON.toString() produces a JSON formatted
string.

The error you are getting back is an XML?
XML encoding is different then JSON encoding.

org.eclipse.jetty.util.ajax.JSON.toString() tries to follow the guidance at
https://tools.ietf.org/html/rfc8259#section-8

Perhaps you have some oddball charset declaration getting in your way.
I don't know how ADM works, but if you are submitting the JSON to them in
an HttpClient, make sure your `Content-Type` request header says something
like "application/json; charset=utf-8"
If ADM is issuing requests to your server, then make sure your
`Content-Type` response header has "application/json; charset=utf-8"


Joakim Erdfelt / ***@webtide.com

On Wed, Mar 28, 2018 at 10:01 AM, Alexander Farber <
Post by Alexander Farber
Hello fellow Jetty users and developers,
is it please possible to escape UTF-8 characters when using
org.eclipse.jetty.util.ajax.JSON.toString() method?
I understand that it might be an internal library, but until now it
works well for me in a servlet which among other tasks sends push
notifications via FCM (Firebase Cloud Messaging) and ADM (Amazon
Device Messaging).
However my problem with the latter is that ADM does not accept any
UTF-8 chars (in my case Cyrillic) and reproducibly fails with the
<SerializationException>
<Message>Could not parse XML</Message>
</SerializationException>
unknown char '<'(60) in |||<SerializationException>| <Message>Could
not parse XML</Message>|</SerializationException>||
So is there maybe some possibility in Jetty 9.4.8.v20171121 to encode the chars?
// this string is POSTed to ADM server
public String toAdmBody() {
Map<String, Object> root = new HashMap<>();
Map<String, String> data = new HashMap<>();
root.put(KEY_DATA, data);
data.put(KEY_BODY, mBody);
// ADM does not accept integers
data.put(KEY_GID, String.valueOf(mGid));
// TODO encode utf8 chars
return JSON.toString(root);
}
Thank you
Alex
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe
from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Joakim Erdfelt
2018-03-28 15:27:01 UTC
Permalink
So while Section 7 indicates a "\u####" notation as optional behavior.
https://tools.ietf.org/html/rfc8259#section-7

That is discouraged in the same spec (Section 8).

It's obvious that section 7 is old, as it limits the "\u" encoding to 3
bytes, even though the UTF-8 / Unicode spec has passed that 3 byte upper
limit a while ago and is now at 4 bytes.

The "\u####" behavior would also have no correlation to encoding for XML as
indicated by your initial question.

Guess we need more details on what is happening, and what ADM expects, in
order to help.
Post by Joakim Erdfelt
org.eclipse.jetty.util.ajax.JSON.toString() produces a JSON formatted
string.
The error you are getting back is an XML?
XML encoding is different then JSON encoding.
org.eclipse.jetty.util.ajax.JSON.toString() tries to follow the guidance
at https://tools.ietf.org/html/rfc8259#section-8
Perhaps you have some oddball charset declaration getting in your way.
I don't know how ADM works, but if you are submitting the JSON to them in
an HttpClient, make sure your `Content-Type` request header says something
like "application/json; charset=utf-8"
If ADM is issuing requests to your server, then make sure your
`Content-Type` response header has "application/json; charset=utf-8"
On Wed, Mar 28, 2018 at 10:01 AM, Alexander Farber <
Post by Alexander Farber
Hello fellow Jetty users and developers,
is it please possible to escape UTF-8 characters when using
org.eclipse.jetty.util.ajax.JSON.toString() method?
I understand that it might be an internal library, but until now it
works well for me in a servlet which among other tasks sends push
notifications via FCM (Firebase Cloud Messaging) and ADM (Amazon
Device Messaging).
However my problem with the latter is that ADM does not accept any
UTF-8 chars (in my case Cyrillic) and reproducibly fails with the
<SerializationException>
<Message>Could not parse XML</Message>
</SerializationException>
unknown char '<'(60) in |||<SerializationException>| <Message>Could
not parse XML</Message>|</SerializationException>||
So is there maybe some possibility in Jetty 9.4.8.v20171121 to encode the chars?
// this string is POSTed to ADM server
public String toAdmBody() {
Map<String, Object> root = new HashMap<>();
Map<String, String> data = new HashMap<>();
root.put(KEY_DATA, data);
data.put(KEY_BODY, mBody);
// ADM does not accept integers
data.put(KEY_GID, String.valueOf(mGid));
// TODO encode utf8 chars
return JSON.toString(root);
}
Thank you
Alex
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe
from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Alexander Farber
2018-03-28 15:50:28 UTC
Permalink
Hi Joakim,

no, I am already sending "application/json; charset=utf-8" - so that
is not a problem.

And also ADM error talking about XML unfortunately means nothing, they
mention XML in all their error messages I have seen sofar while
developing (while their API expects JSON being POSTed).

I understand, that encoding UTF-8 chars to \u#### is optional.

But is that possible with jetty-util-ajax or should I switch to some
other lib for JSON encoding?

When I look at
https://github.com/eclipse/jetty.project/blob/jetty-9.4.x/jetty-util-ajax/src/main/java/org/eclipse/jetty/util/ajax/JSON.java
then it can decode \u####, but jetty-util-ajax do the reverse thing too?

I have also posted my question at
https://stackoverflow.com/questions/49538806/how-to-escape-unicode-with-json-tostring-method-in-jetty-util-ajax

Thank you
Alex
Greg Wilkins
2018-03-29 07:47:07 UTC
Permalink
Alex,

Which characters do you want us to use \u#### encoding for? > US_ASCII?
ISO_8859? or just chars that would encode to 3 byte utf8?
Maybe we could... we'll discuss.
The other option is to do that encoding when you convert the json string
into bytes to send?

cheers
Bill Ross
2018-03-29 08:13:45 UTC
Permalink
The only way to escape Unicode is the grave.

Please excuse the faulty calendar.

Bill
Alex,
Which characters do you want us to use \u#### encoding for?   >
US_ASCII? >ISO_8859? or just chars that would encode to 3 byte utf8?
Maybe we could... we'll discuss.
The other option is to do that encoding when you convert the json
string into bytes to send?
cheers
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Alexander Farber
2018-03-29 09:28:31 UTC
Permalink
Hello Greg,
Post by Greg Wilkins
Which characters do you want us to use \u#### encoding for? > US_ASCII?
ISO_8859? or just chars that would encode to 3 byte utf8?
Maybe we could... we'll discuss.
The other option is to do that encoding when you convert the json string
into bytes to send?
I apologize - it has turned out to be an ADM backend problem and now
suddenly (after few weeks) all characters I send to them just work.

But maybe my question is still valid and useful for someone -

if it is possible to add optional encoding to \u#### to the jetty-util-ajax

However now I can not tell which exactly characters that would be.

Regards
Alex

Loading...