Trapping Errors with simplexml for Not Well-Formed XML

I discovered the hard way that in PHP5 there are no obvious ways to detect if some XML is well-formed, especially if you want to deploy on Unix/Windows platform and don’t want to access the shell directly.

Adding to this problem, I discovered also that the DOM and simplexml extensions can’t use the PHP5 exception handling to trap the errors when the XML is not well-formed. Using simplexml or the DOM extensions against not well-formed XML, the errors generated by these extensions are not trapped and are displayed immediately.

It’s possible to load with the DOM or the Tidy extensions not well-formed XML, and then repair it on the fly. But what if you need to detect not well-formed XML and provide a message stating the error?

Fortunately, after some research, I found that you could use the libxml functions (PHP 5.1 and over) to test XML well formedness and trap XML errors. So, I wiped out this little function called get_xml_object (see here for the inspiration) that allow me to trap errors when simplexml is used to parse XML. The function is quite simple, by default, you provide a path to a XML file. If you want to use a string, just add another argument after the first parameter (it can’t be anything, but here’s I chose « string » for clarity sakes). You can also replace the simplexml extension by the DOM extensions if you prefer this extension to parse XML.

The function get_xml_object will return an array that contains two keys, errors and xml. In this example, $result=get_xml_object($s, "string"), $result is an array. If there are no errors, $result['errors'] will be set to null. If everything is ok, $result['xml'] will contains a simplexml object that you can then manipulate with the simplexml extension.

$s = "tag>hello world</tag>";
// $s = "<tag>hello world</tag>";

function get_xml_object ($xml, $xmlFormat="file") {

  $xml_object = null;
  $result = array ("errors" => null, "xml" => null);

  libxml_use_internal_errors (true);
  $xmlFormat == "file"  ? $xml_object = simplexml_load_file ($xml) 
                        : $xml_object = simplexml_load_string ($xml);

  if (!$xml_object) {
     $errors = libxml_get_errors();
     foreach ($errors as $error) {
         $error_msg = "Error: line: " . $error->line 
                    . ": column: " . $error->column . ": " 
                    . $error->message . "\n";
     }
     libxml_clear_errors();
     $result["errors"] = $error_msg;
  } else {
    $result["xml"] = $xml_object;
  }
  return $result;
}

$result = get_xml_object ($s, "string");

if ($result['errors']) {
  var_dump ($result['errors']);
} else {
  var_dump ($result['xml']);
}

Les commentaires sont fermés.