The importAsNode(InputStream, boolean) method of the XmlDomNodeImporterImpl class, parses the provided InputStream into a XML document, using a DocumentBuilder. Under the hood, the InputStream is wrapped as an InputSource, whose encoding is unknown - we haven't created the InputSource ourselves, and neither is an explicit encoding specified for the InputSource using the setEncoding method. When the input stream contains umlauts encoded in ISO-8859-1, the parser (in-built Xerces of the Oracle/Sun JRE) incorrectly attempts reading them as UTF-8; see the bug report for the parser behavior.
Going by the API documentation for the InputSource class, the solution is to either use an InputSource with an underlying character stream, or to specify the encoding for the byte stream.
This would require introduction of another method importAsNode(Reader reader, boolean close) in the XmlDomNodeImporterImpl class, and would be a member of the NodeImporter interface. However, the method would be useful only when the XmlDomNodeImporter class exposes it, and eventually in the NodeDescriptorImporterBase, DescriptorImporterBase and DescriptorImporter types. This would allow for a downstream project of ShrinkWrap (like Arquillian) to use the from(String string) of the DescriptorImporter class, in a safe manner.
In the context of SHRINKDESC-97, the ConfigurationSysPropResolver class of Arquillian invokes the from(String string) method of DescriptorImporterBase. This method uses the platform encoding to get a byte sequence used by the InputStream. When the platform encoding differs from the encoding used by the XML parser, an exception would be thrown. Introducing the importAsNode(Reader reader, boolean close) method in DescriptorImporter and hence in DescriptorImporterBase, would allow for the line:
return this.from(new ByteArrayInputStream(string.getBytes()));
to be replaced by:
return this.from(new StringReader(string));
and therefore rid us of the encoding issue.
Introducing the method accepting Reader in the DescriptorImporter interface, results in a tricky scenario for the ManifestDescriptorImporter class. It uses the ManifestModel class, which extends the java.util.jar.Manifest class. The Manifest class of the JRE reads a Manifest file only via an InputStream, and does not accept a Reader. No encoding problems are expected here, as the Java Manifest specification requires that the manifest be UTF-8 encoded. However, we'll need to convert the Reader to an InputStream when constructing an instance of a ManifestModel. We'll need to roll our own ReaderInputStream or use the one from the Commons-IO lib.
Unlike the above scenario where a StringReader was necessary in one use case, no use case exists for this at the moment (atleast for me). We may however want to consider adding a from(InputStream in, String charset, boolean close) method (or equivalent ones using the Charset class of the JRE), to allow for clients of this API to specify an encoding for the input stream.