PHP5: SimpleXML and xpath: What You See Is Not What You Get

This is something that took me a while to figure out. The problem is related to the way SimpleXML handle a second xpath query made against the result of a first xpath query. Simply put, what happens is that if you do keep the result of a first xpath query, and then, you do another xpath query against this result, the second xpath query will not be made against this subset, but against the whole XML tree.

Here’s a simple example to illustrate what I’m trying to convey. Please consider this example of an illustration of the problem, and not as the best way to use a xpath query.

The XML Snippet

<?xml version="1.0" encoding="UTF-8"?>
<world>
  <continent name='Europe'>
    <country name='France'>Paris</country>
    <country name='United Kingdom'>London</country>
  </continent>
  <continent name='North America'>
    <country name='Canada'>Ottawa</country>
    <country name='United States'>Washington</country>
  </continent>
</world>

First, I know that there are more continents than Europe and North America :) .

What I want to do is:

1. Make an xpath query to get a result set for continent,

2. And then loop in this result set to get the result set country for each country.

The first iteration of the code will go like this:

$xml          = simplexml_load_string ($xmlstring);
$continents   = $xml->xpath ('//continent');

foreach ($continents as $continent) {
  $countries = $continent->xpath ('//country');
  var_dump ($countries);
}

If you do this way, the xpath query will not be against the result set $continents, but against the whole XML tree. What that means is that instead of having a listing of country by continent, you will get all the countries for all the continents two times.

To get only the countries by continent, what you have to do is to deference again the result set for $continents.

$xml          = simplexml_load_string ($xmlstring);
$continents   = $xml->xpath ('//continent');

foreach ($continents as $continent) {
  $continent_deferenced = simplexml_load_string ($continent->asXML ());
  $countries = $continent_deferenced->xpath ('//country');
  var_dump ($countries);
}

Now you will get what you are looking for.

Here’s the same example code made more clearer.


$xmlstring = <<<END
<?xml version="1.0" encoding="UTF-8"?>
<world>
  <continent name='Europe'>
    <country name='France'>Paris</country>
    <country name='United Kingdom'>London</country>
  </continent>
  <continent name='North America'>
    <country name='Canada'>Ottawa</country>
    <country name='United States'>Washington</country>
  </continent>
</world>
END;

$deference_xml  = 0;

$xml          = simplexml_load_string ($xmlstring);
$continents   = $xml->xpath ('//continent');

$counter        = 0;
foreach ($continents as $continent) {
  $counter++;
  echo "--------------------------------\n";
  echo "Loop# $counter: " . $continent['name'] . "\n";
  echo "--------------------------------\n";
  echo "\n";
  
  if ($deference_xml) {
    echo "--------------------------------\n";
    echo "\$continent is deferenced\n";
    echo "--------------------------------\n";
    
    $continent_deferenced = simplexml_load_string ($continent->asXML ());
    $countries = $continent_deferenced->xpath ('//country');
  } else {
    echo "--------------------------------\n";
    echo "\$continent is NOT deferenced\n";
    echo "--------------------------------\n";
    
    $countries = $continent->xpath ('//country');
  }
  
  echo "--------------------------------\n";
  echo "\$continent->xpath('//country')\n";
  echo "--------------------------------\n";
  var_dump ($countries);
  
}

If you set the variable $deference_xml to 0, the xpath query will not be deferenced. So, you will get this result.

--------------------------------
Loop# 1: Europe
--------------------------------

--------------------------------
$continent is NOT deferenced
--------------------------------
--------------------------------
$continent->xpath('//country')
--------------------------------
array(4) {
  [0]=>
  object(SimpleXMLElement)#4 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(6) "France"
    }
    [0]=>
    string(5) "Paris"
  }
  [1]=>
  object(SimpleXMLElement)#5 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(14) "United Kingdom"
    }
    [0]=>
    string(6) "London"
  }
  [2]=>
  object(SimpleXMLElement)#6 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(6) "Canada"
    }
    [0]=>
    string(6) "Ottawa"
  }
  [3]=>
  object(SimpleXMLElement)#7 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(13) "United States"
    }
    [0]=>
    string(10) "Washington"
  }
}
--------------------------------
Loop# 2: North America
--------------------------------

--------------------------------
$continent is NOT deferenced
--------------------------------
--------------------------------
$continent->xpath('//country')
--------------------------------
array(4) {
  [0]=>
  object(SimpleXMLElement)#8 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(6) "France"
    }
    [0]=>
    string(5) "Paris"
  }
  [1]=>
  object(SimpleXMLElement)#9 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(14) "United Kingdom"
    }
    [0]=>
    string(6) "London"
  }
  [2]=>
  object(SimpleXMLElement)#10 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(6) "Canada"
    }
    [0]=>
    string(6) "Ottawa"
  }
  [3]=>
  object(SimpleXMLElement)#11 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(13) "United States"
    }
    [0]=>
    string(10) "Washington"
  }
}

If you set now the variable $deference_xml to 1, you will get what you expected.

--------------------------------
Loop# 1: Europe
--------------------------------

--------------------------------
$continent is deferenced
--------------------------------
--------------------------------
$continent->xpath('//country')
--------------------------------
array(2) {
  [0]=>
  object(SimpleXMLElement)#5 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(6) "France"
    }
    [0]=>
    string(5) "Paris"
  }
  [1]=>
  object(SimpleXMLElement)#6 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(14) "United Kingdom"
    }
    [0]=>
    string(6) "London"
  }
}
--------------------------------
Loop# 2: North America
--------------------------------

--------------------------------
$continent is deferenced
--------------------------------
--------------------------------
$continent->xpath('//country')
--------------------------------
array(2) {
  [0]=>
  object(SimpleXMLElement)#4 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(6) "Canada"
    }
    [0]=>
    string(6) "Ottawa"
  }
  [1]=>
  object(SimpleXMLElement)#8 (2) {
    ["@attributes"]=>
    array(1) {
      ["name"]=>
      string(13) "United States"
    }
    [0]=>
    string(10) "Washington"
  }
}

Les commentaires sont fermés.