Trapping Errors with simplexml for Not Well-Formed XML
I discovered the hard way that in PHP5 there are no obvious ways to detect if some XML is well-formed, especially if you want to deploy on Unix/Windows platform and don’t want to access the shell directly.
Adding to this problem, I discovered also that the DOM and simplexml extensions can’t use the PHP5 exception handling to trap the errors when the XML is not well-formed. Using simplexml or the DOM extensions against not well-formed XML, the errors generated by these extensions are not trapped and are displayed immediately.
It’s possible to load with the DOM or the Tidy extensions not well-formed XML, and then repair it on the fly. But what if you need to detect not well-formed XML and provide a message stating the error?
Fortunately, after some research, I found that you could use the libxml functions (PHP 5.1 and over) to test XML well formedness and trap XML errors. So, I wiped out this little function called get_xml_object (see here for the inspiration) that allow me to trap errors when simplexml is used to parse XML. The function is quite simple, by default, you provide a path to a XML file. If you want to use a string, just add another argument after the first parameter (it can’t be anything, but here’s I chose “string” for clarity sakes). You can also replace the simplexml extension by the DOM extensions if you prefer this extension to parse XML.
The function get_xml_object will return an array that contains two keys, errors and xml. In this example, $result=get_xml_object($s, "string"), $result is an array. If there are no errors, $result['errors'] will be set to null. If everything is ok, $result['xml'] will contains a simplexml object that you can then manipulate with the simplexml extension.
$s = "tag>hello world</tag>";
// $s = "<tag>hello world</tag>";
function get_xml_object ($xml, $xmlFormat="file") {
$xml_object = null;
$result = array ("errors" => null, "xml" => null);
libxml_use_internal_errors (true);
$xmlFormat == "file" ? $xml_object = simplexml_load_file ($xml)
: $xml_object = simplexml_load_string ($xml);
if (!$xml_object) {
$errors = libxml_get_errors();
foreach ($errors as $error) {
$error_msg = "Error: line: " . $error->line
. ": column: " . $error->column . ": "
. $error->message . "n";
}
libxml_clear_errors();
$result["errors"] = $error_msg;
} else {
$result["xml"] = $xml_object;
}
return $result;
}
$result = get_xml_object ($s, "string");
if ($result['errors']) {
var_dump ($result['errors']);
} else {
var_dump ($result['xml']);
}