XML isn't as fashionable because it once was, however there is still plenty of XML primarily based configuration and information floating around these days. simply these days i used to be operating with a conversion routine that mustgenerate XML formatted templates, and one issue that I required is a simple thanks to generate a properly encoded XML string.
In most cases you will need to use a correct XML processor whether or not it's associate XML Document, XmlWriter or LINQ to XML to get your XML. once you use those options the info conversion from string (and most alternative types) is made in and principally automatic.
However, during this case I even have an enormous block of principally static XML text and making the whole document victimisation structured XML documents sounds like overkill once extremely i simply ought to inject a couple of easy values.
So during this case i am trying to find the way to format values as XML that the XmlConvert static category works well.
Should be straightforward right? Well...
The XMLConvert category works well - aside from string conversions that it does not support. XmlConvert.ToString() works with concerning any of the common base sorts aside from string to convert properly XML formatted content.
In most cases you will need to use a correct XML processor whether or not it's associate XML Document, XmlWriter or LINQ to XML to get your XML. once you use those options the info conversion from string (and most alternative types) is made in and principally automatic.
However, during this case I even have an enormous block of principally static XML text and making the whole document victimisation structured XML documents sounds like overkill once extremely i simply ought to inject a couple of easy values.
So during this case i am trying to find the way to format values as XML that the XmlConvert static category works well.
Should be straightforward right? Well...
The XMLConvert category works well - aside from string conversions that it does not support. XmlConvert.ToString() works with concerning any of the common base sorts aside from string to convert properly XML formatted content.
Reading an encoded XML Value
There are a number of different ways that you can generate XML output and all of them basically involve creating some sort of XML structure and reading the value out of the 'rendered' document.
The most concise way I've found (on StackOverflow from John Skeet with modifications to return just the content) is the following:
public static string XmlString(string text)
{
return new XElement("t", text).LastNode.ToString();
}
The
XElement
returns the entire XML fragment, while LastNode
is the text node which contains the actual node's content.
You can call
XmlString()
with:void Main()
{
XmlString("Brackets & stuff <doc> and \"quotes\" and more 'quotes'.").Dump();
}
which produces:
Brackets & stuff <doc> and "quotes" and more 'quotes'.
But hold on - this doesn't take into account attributes which require some additional encoding for quotes and control characters. So a little more work is required for the wrapper:
public static string XmlString(string text, bool isAttribute = false)
{
if (string.IsNullOrEmpty(text))
return text;
if (!isAttribute)
return new XElement("t", text).LastNode.ToString();
return new XAttribute("__n",text)
.ToString().Substring(5).TrimEnd('\"');
}
If you don't want to use LINQ to XML you can use an XML Document instead.
private static XmlDoc _xmlDoc;
public string XmlString(string text)
{
_xmlDoc = _xmlDoc ?? new XmlDocument();
var el = _xmlDoc.CreateElement("t");
el.InnerText = text;
return el.InnerXml;
}
Note that using
XmlDocument
is considerably slower than XElement
even with the document caching used above.System.Security.SecurityElement.Escape()?
The
SecurityElement.Escape()
is a built-in CLR function that performs XML encoding. It's a single function so it's easy to call, but it lways encodes all quotes without options. This is OK, but can result in extra characters if you're encoding for XML elements. Only attributes need quotes encoded.
The function is also considerably slower than the other mechanisms mentioned here.
Just Code
If you don't want to deal with adding a reference to LINQ to XML or even
System.Xml
you can also create a simple code routine. XML strings really just escape 5 characters (3 if you're encoding for elements), plus it throws for illegal characters < CHR(32) with the exception of tabs, returns and line feeds.
The simple code to do this looks like this:
/// <summary>
/// Turns a string into a properly XML Encoded string.
/// Uses simple string replacement.
///
/// Also see XmlUtils.XmlString() which uses XElement
/// to handle additional extended characters.
/// </summary>
/// <param name="text">Plain text to convert to XML Encoded string</param>
/// <param name="isAttribute">
/// If true encodes single and double quotes, CRLF and tabs.
/// When embedding element values quotes don't need to be encoded.
/// When embedding attributes quotes need to be encoded.
/// </param>
/// <returns>XML encoded string</returns>
/// <exception cref="InvalidOperationException">Invalid character in XML string</exception>
public static string XmlString(string text, bool isAttribute = false)
{
var sb = new StringBuilder(text.Length);
foreach (var chr in text)
{
if (chr == '<')
sb.Append("<");
else if (chr == '>')
sb.Append(">");
else if (chr == '&')
sb.Append("&");
// special handling for quotes
else if (isAttribute && chr == '\"')
sb.Append(""");
else if (isAttribute && chr == '\'')
sb.Append("'");
// Legal sub-chr32 characters
else if (chr == '\n')
sb.Append(isAttribute ? "
" : "\n");
else if (chr == '\r')
sb.Append(isAttribute ? "
" : "\r");
else if (chr == '\t')
sb.Append(isAttribute ? "	" : "\t");
else
{
if (chr < 32)
throw new InvalidOperationException("Invalid character in Xml String. Chr " +
Convert.ToInt16(chr) + " is illegal.");
sb.Append(chr);
}
}
return sb.ToString();
}
Attributes vs. Elements
Notice that the functions above optionally supports attribute encoding. Attributes need to be encoded differently than elements.
That's because XML Elements are not required to have quotes encoded because there are no string delimiters to worry about in an XML element. This is legal XML:
<doc>This a "quoted" string. So is 'this'!</doc>
However, if you are generating an content for an XML Attribute, you do need to encode quotes because the quotes are the delimiter for the attribute. Makes sense right?
<doc note="This a "quoted" string. So is 'this'!"
Actually, the
'
is not required in this example because the attribute delimiter is "
. So this is actually more correct:<doc note="This a "quoted" string. So is 'this'!"
However, both are valid XML. The string function above will encode single and double quotes when the
encodeQuotes
parameter is set to true
to handle setting attribute values.
In addition attributes can't represent carriage return and linefeeds (and also tabs) because attributes are single line, so those need to be encoded to with:
CR: &xD; LF: &xA; Tab: &x9;
The following LINQPad code demonstrates what XML is generated for values by Elements and Attributes:
void Main()
{
var doc = new XmlDocument();
doc.LoadXml("<d><t>This is & a \"test\" and a 'tested' test</t></d>");
doc.OuterXml.Dump();
var s = "This is & a \"test\" and a 'tested' test</t></d> with breaks \r\n and \t tabs</root>";
var node = doc.CreateElement("d2");
node.InnerText = s;
doc.DocumentElement.AppendChild(node);
var attr = doc.CreateAttribute("note", s);
node.Attributes.Append(attr);
doc.OuterXml.Dump();
}
The document looks like this:
<d>
<t>This is & a "test" and a 'tested' test</t>
<d2 d2p1:note=""
xmlns:d2p1="This is &amp; a "test" and a 'tested' test</t></d>
with breaks 
 and tabs</root>">This is &amp; a "test" and a 'tested'
test</t></d> with breaks
and tabs</root></d2>
</d>
(attribute is a single line - linebreaks added for readability)
Bottom line: Elements don't require quotes, line breaks and tabs to be encoded, but attributes do.
Performance
This falls into the pre-mature optimization bucket, but I was curious how well each of these mechanisms would perform relative to each other. It would seem that
XElement
and especially XmlDocument
would be very slow as they process the element as an XML document/fragment that has to be loaded and parsed.
I was very surprised to find that the fastest and most consistent solution in various sizes of text was
XElement
which was faster than my bare bones string implementation especially for larger strings. For small amounts of text (under a few hundred characters) the string and XElement implementations were roughly the same, but as strings get larger XElement
started to become considerably faster.
As an aside, the custom string version also runs considerably faster in Release Mode (in LINQPad run with Optimizations On) with optimizations enabled rather than debug mode. In debug mode performance was about 3-4x slower. Yikes.
Not surprisingly
XmlDocument
- even the cached version - was the slower solution. With small strings roughly 50% slower, with larger strings many times slower and incrementally getting slower as the string size gets larger.
Surprisingly slowest of them all was
SecurityElement.Escape()
which was nearly twice as slow as the XmlDocument approach.
Whatever
XElement
is doing to parse the element, it's very efficient and it's built into the framework and maintained by Microsoft, so I would recommend that solution, unless you want to avoid the XML assembly references in which case the custom solution string works as well with smaller strings and reasonably close with large strings.
Take all of these numbers with a grain of salt - all of them are pretty fast for one off parsing and unless you're using manual XML encoding strings in loops or large batches, the perf difference is not of concern here.
If you want to play around with the different approaches, here's a Gist that you can load into LINQPad that you can just run:
Summary
XML string secret writing are a few things you hopefully will not have to be compelled to do abundant of, howeverit's one factor I've tripped over enough times to require the time to write down over here. Again, in most cases my recommendation is to write down strings victimisation some form of official XML computer program (XmlDocument or XDocument/XElement), however within the few cases wherever you only have to be compelled to jam a handful of values into an outsized document, nothing beats easy string replacement within the document for simplicity and straightforward maintenance and that is the one edge, use-case wherever a operate like XmlString() is sensible.