Bugs in Exchange's XML handling

Illegal characters

The XML 1.0 Specification defines a character as:

        Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

There is no way to directly encode any other character into an XML 1.0 document.

In some cases, Exchange XML will include entities like . (One example of this resulted from syncing corrupt data from a Palm device.) This renders the XML non-well-formed. As a result, we have to use the "recovery" mode of the libxml parser in order to guarantee that it will give us output.

(If we want to be charitable, we could say that Exchange was just ahead of the curve, because XML 1.1 does allow control characters to appear as entities.)


Illegal tag names

The grammar for a tag name is:

	Name ::= (Letter | '_' | ':') (NameChar)*

But in the namespaces under http://schemas.microsoft.com/mapi/id/, Exchange uses tags that start with "0x". There is no way to force libxml to parse these, so e2k-result.c has to mangle the raw response to remove the "0"s (in sanitize_bad_multistatus()), and then put them back into the parsed property values (in prop_parse()).