问题描述
我正在使用编码utf-8"将对象编组为 xml 文件.它成功生成文件.但是当我尝试将其解组时,出现错误:
i am marshalling objects to xml file using encoding "utf-8". it generates file successfully. but when i try to unmarshal it back, there is an error:
无效的 xml 字符(unicode:0x{2}) 的值中发现属性{1}"且元素为0"
an invalid xml character (unicode: 0x{2}) was found in the value of attribute "{1}" and element is "0"
字符为 0x1a 或 u001a,在 utf-8 中有效但在 xml 中非法.jaxb 中的 marshaller 允许将此字符写入 xml 文件,但 unmarshaller 无法将其解析回来.我尝试使用另一种编码(utf-16、ascii 等)但仍然出错.
the character is 0x1a or u001a, which is valid in utf-8 but illegal in xml. marshaller in jaxb allows writing this character into xml file, but unmarshaller cannot parse it back. i tried to use another encoding (utf-16, ascii, etc) but still error.
常见的百家乐凯发k8的解决方案是在 xml 解析之前删除/替换这个无效字符.但是如果我们需要这个字符,解组后如何得到原来的字符呢?
the common solution is to remove/replace this invalid character before xml parsing. but if we need this character back, how to get the original character after unmarshalling?
在寻找此百家乐凯发k8的解决方案时,我想在解组之前用替代字符(例如点 =.")替换无效字符.
while looking for this solution, i want to replace the invalid characters with a substitute character (for example dot = ".") before unmarshalling.
我已经创建了这个类:
public class invalidxmlcharacterfilterreader extends filterreader { public static final char substitute = '.'; public invalidxmlcharacterfilterreader(reader in) { super(in); } @override public int read(char[] cbuf, int off, int len) throws ioexception { int read = super.read(cbuf, off, len); if (read == -1) return -1; for (int readpos = off; readpos < off read; readpos ) { if(!isvalid(cbuf[readpos])) { cbuf[readpos] = substitute; } } return readpos - off 1; } public boolean isvalid(char c) { if((c == 0x9) || (c == 0xa) || (c == 0xd) || ((c >= 0x20) && (c <= 0xd7ff)) || ((c >= 0xe000) && (c <= 0xfffd)) || ((c >= 0x10000) && (c <= 0x10ffff))) { return true; } else return false; } }
这就是我读取和解组文件的方式:
then this is how i read and unmarshall the file:
filereader filereader = new filereader(this.getfile()); reader reader = new invalidxmlcharacterfilterreader(filereader); object o = (object)um.unmarshal(reader);
不知何故,读者不会用我想要的字符替换无效字符.它会导致无法解组的错误 xml 数据.我的 invalidxmlcharacterfilterreader 类有问题吗?
somehow the reader does not replace invalid characters with the character i want. it results a wrong xml data which can't be unmarshalled. is there something wrong with my invalidxmlcharacterfilterreader class?
推荐答案
unicode 字符 u 001a 是 在 xml 1.0 中是非法的:
the unicode character u 001a is illegal in xml 1.0:
在这种情况下,用于表示它的编码无关紧要,在 xml 内容中根本不允许.
the encoding used to represent it does not matter in this case, it's simply not allowed in xml content.
xml 1.1 允许某些受限字符(包括 u 001a)被包括在内,但它们必须以数字字符引用的形式出现 ()
xml 1.1 allows some of the restricted characters (including u 001a) to be included, but they must be present as numeric character references ()
维基百科有一个很好的情况总结.