Reading XML
sabre/xml has a reader class called Sabre\Xml\Reader
. This class extends
PHP's built-in XMLReader
class.
The reader is extended quite a bit. So while you can find the exact same API methods as in PHP's class, the way you interact with the reader will likely look different.
Often you can just parse your XML documents by using the reader like this:
$reader = new Sabre\Xml\Reader();
$reader->xml($xml);
$result = $reader->parse();
However, we recommend using the Service object instead. It's optional, but it adds a few nice features. All of the following examples will be using this boilerplate instead to parse XML:
$service = new Sabre\Xml\Service();
$result = $service->parse($xml);
Converting XML to a PHP array
Let's take the following XML as our primary example.
<?xml version="1.0" encoding="utf-8"?>
<books xmlns="http://example.org/books">
<book>
<title>Snow Crash</title>
<author>Neil Stephenson</author>
</book>
<book>
<title>Dune</title>
<author>Frank Herbert</author>
</book>
</books>
To convert this XML to a PHP array, we can just run this:
$service = new Sabre\Xml\Service();
print_r($service->parse($xml));
The output for this, is quite big:
Array
(
[0] => Array
(
[name] => {http://example.org/books}book
[value] => Array
(
[0] => Array
(
[name] => {http://example.org/books}title
[value] => Snow Crash
[attributes] => Array
(
)
)
[1] => Array
(
[name] => {http://example.org/books}author
[value] => Neil Stephenson
[attributes] => Array
(
)
)
)
[attributes] => Array
(
)
)
[1] => Array
(
[name] => {http://example.org/books}book
[value] => Array
(
[0] => Array
(
[name] => {http://example.org/books}title
[value] => Dune
[attributes] => Array
(
)
)
[1] => Array
(
[name] => {http://example.org/books}author
[value] => Frank Herbert
[attributes] => Array
(
)
)
)
[attributes] => Array
(
)
)
)
Key-Value XML structures
However, we can simplify this quite a bit. Our book
element pretty much
looks like a key-value structure, so we can tell the parser to treat it as
such:
$service = new Sabre\Xml\Service();
$service->elementMap = [
'{http://example.org/books}book' => 'Sabre\Xml\Deserializer\keyValue',
];
print_r($service->parse($xml));
This creates the new output:
Array
(
[0] => Array
(
[name] => {http://example.org/books}book
[value] => Array
(
[{http://example.org/books}title] => Snow Crash
[{http://example.org/books}author] => Neil Stephenson
)
[attributes] => Array
(
)
)
[1] => Array
(
[name] => {http://example.org/books}book
[value] => Array
(
[{http://example.org/books}title] => Dune
[{http://example.org/books}author] => Frank Herbert
)
[attributes] => Array
(
)
)
)
Stripping the XML namespaces
We added keyValue
in our last example. keyValue
really is simply a
function that gets automatically called. We can give that function a default
namespace, which will cause it to strip all namespace declarations if it
matches that specific namespace.
Our new code looks like this:
$service = new Sabre\Xml\Service();
$service->elementMap = [
'{http://example.org/books}book' => function(Sabre\Xml\Reader $reader) {
return Sabre\Xml\Deserializer\keyValue($reader, 'http://example.org/books');
}
];
print_r($service->parse($xml));
The new output:
Array
(
[0] => Array
(
[name] => {http://example.org/books}book
[value] => Array
(
[title] => Snow Crash
[author] => Neil Stephenson
)
[attributes] => Array
(
)
)
[1] => Array
(
[name] => {http://example.org/books}book
[value] => Array
(
[title] => Dune
[author] => Frank Herbert
)
[attributes] => Array
(
)
)
)
Parsing the books
element as a collection of book
items.
Lastly, in our XML object we have a root element books
and a repeating child
element book
. This too is a very common pattern in XML structures. We can
also instruct the parser to treat books
as a collection of book
and return
an even simpler array:
$service = new Sabre\Xml\Service();
$service->elementMap = [
'{http://example.org/books}book' => function(Sabre\Xml\Reader $reader) {
return Sabre\Xml\Deserializer\keyValue($reader, 'http://example.org/books');
},
'{http://example.org/books}books' => function(Sabre\Xml\Reader $reader) {
return Sabre\Xml\Deserializer\repeatingElements($reader, '{http://example.org/books}book');
},
];
print_r($service->parse($xml));
This last example will output:
Array
(
[0] => Array
(
[title] => Snow Crash
[author] => Neil Stephenson
)
[1] => Array
(
[title] => Dune
[author] => Frank Herbert
)
)
Other standard XML parsers
There's a number of standard XML parsers included. Here's the list:
keyValue
Sabre\Xml\Deserializer\keyValue(Reader $reader, $namespace = null);
Example further up in this document.
enum
Sabre\Xml\Deserializer\enum(Reader $reader, $namespace = null);
This deserializer turns a bunch of xml elements into a flat PHP array. Specifically it's intended for structures such as this:
<fruit xmlns="urn:fruit">
<apple>
<banana>
<orange>
</fruit>
Parsing this:
$service = new Sabre\Xml\Service();
$service->elementMap['{urn:fruit}fruit'] = 'Sabre\Xml\Deserializer\enum';
$result = $service->parse($xml);
print_r($result);
This would yield:
Array
(
[0] => {urn:fruit}apple
[1] => {urn:fruit}banana
[2] => {urn:fruit}orange
)
You can also specify a default namespace, which will cause that namespace to be stripped out. Example:
$service = new Sabre\Xml\Service();
$service->elementMap['{urn:fruit}fruit'] = function(Sabre\Xml\Reader $reader) {
return Sabre\Xml\Deserializer\enum($reader, 'urn:fruit');
};
$result = $service->parse($xml);
print_r($result);
This would yield:
Array
(
[0] => apple
[1] => banana
[2] => orange
)
repeatingElements
Sabre\Xml\Deserializer\repeatingElements(Sabre\Xml\Reader $reader, $childElementName);
repeatingElements is specifically tailored for XML structures that look like this:
<collection xmlns="urn:fruit">
<item>...</item>
<item>...</item>
<item>...</item>
<item>...</item>
</collection>
It allows to specifically say, collection
always has a list of item
elements
and please return those item
element's values as an array.
valueObject
Sabre\Xml\Deserializer\valueObject(Reader $reader, $className, $namespace);
The valueObject deserializer function allows you to turn an XML element into a PHP object of a specific class, mapping sub-elements to properties in the class.
It's used internally by Sabre\Xml\Service::mapValueObject
. Read more here.
XmlFragment
In some cases you might want to allow 'free form XML' to be specified in for example an API. Atom for instance allows XHTML to be embedded, and WebDAV allows users to store custom properties consisting of complex xml structures with their own namespaces.
XmlFragment extracts an entire XML subtree and creates a object that can be stored in a database, and later on embedded in an xml document again:
$service = new Sabre\Xml\Service();
$service->elementMap = [
'{http://example.org/}root' => 'Sabre\Xml\Element\XmlFragment',
];
print_r($reader->parse($xml));
Output:
Sabre\Xml\Element\XmlFragment Object
(
[xml:protected] =>
<elem1 xmlns="http://example.org/">value1</elem1>
<elem2 xmlns="http://example.org/" att="attvalue">value2</elem2>
)
Custom element parsers
Lets take this one step further… We have a simple class that represents the books document:
class Books {
// A list of books.
public $books = [];
}
And we have a class for every book:
class Book {
public $title;
public $author;
}
By refactoring our parser a bit, we can automatically map these classes to their respective XML elements:
$service = new Sabre\Xml\Service();
$service->elementMap = [
// handle a collection of books
'{http://example.org/books}books' => function(Sabre\Xml\Reader $reader) {
$books = new Books();
$children = $reader->parseInnerTree();
foreach($children as $child) {
if ($child['value'] instanceof Book) {
$books->books[] = $child['value'];
}
}
return $books;
},
// handle a single book
'{http://example.org/books}book' => function(Sabre\Xml\Reader $reader) {
$book = new Book();
// Borrowing a parser from the KeyValue class.
$keyValue = Sabre\Xml\Deserializer\keyValue($reader, 'http://example.org/books');
if (isset($keyValue['title'])) {
$book->title = $keyValue['title'];
}
if (isset($keyValue['author'])) {
$book->author = $keyValue['author'];
}
return $book;
},
];
print_r($service->parse($xml));
This gives us the following output:
Books Object
(
[books] => Array
(
[0] => Book Object
(
[title] => Snow Crash
[author] => Neil Stephenson
)
[1] => Book Object
(
[title] => Dune
[author] => Frank Herbert
)
)
)
Using the XmlDeserializable interface
The last example had 2 custom deserializers. We can also integrate straight into the classes they are supposed to deserialize.
The following two classes rebuild the Book
and Books
classes so they can
parse themself from an XML document, and also write them in a new document:
class Books implements Sabre\Xml\XmlDeserializable {
// A list of books.
public $books = [];
static function xmlDeserialize(Sabre\Xml\Reader $reader) {
$books = new self();
$children = $reader->parseInnerTree();
foreach($children as $child) {
if ($child['value'] instanceof Book) {
$books->books[] = $child['value'];
}
}
return $books;
}
}
class Book implements Sabre\Xml\XmlDeserializable {
public $title;
public $author;
static function xmlDeserialize(Sabre\Xml\Reader $reader) {
$book = new self();
// Borrowing a parser from the KeyValue class.
$keyValue = Sabre\Xml\Element\KeyValue::xmlDeserialize($reader);
if (isset($keyValue['{http://example.org/books}title'])) {
$book->title = $keyValue['{http://example.org/books}title'];
}
if (isset($keyValue['{http://example.org/books}author'])) {
$book->author = $keyValue['{http://example.org/books}author'];
}
return $book;
}
}
To use this:
$service = new Sabre\Xml\Service();
$service->elementMap = [
'{http://example.org/books}books' => 'Books',
'{http://example.org/books}book' => 'Book',
];
print_r($service->parse($xml));
Using ValueObjects for this instead
While the former classmapping example is a good way to learn how to write custom deserializers, using the 'value objects' feature this could have been simplified even further.
The exact same result could have been achieved by registering the PHP classes as value objects:
$service = new Sabre\Xml\Service();
$service->mapValueObject('{http://example.org/books}books', 'Books');
$service->mapValueObject('{http://example.org/books}book', 'Book');
print_r($service->parse($xml));
For more information about this feature, read Value Objects. Value Objects are a great way to map very simple XML structures to simple PHP classes. As soon as either the XML or PHP classes are no longer 'simple', you will likely still need to write your own (de-)serializers.
Summary of parsing XML objects
- By default sabre/xml will always parse an XML document into an array.
- It is possible to map certain XML using
elementMap
. - In the elementMap you can specify a custom deserializer strategy for a specific element.
- Often this is simply a PHP callback. The PHP callback receives the
Xml\Reader
object as its only argument. - Default readers are provided to aid with parsing common xml structures.
Examples are
keyValue
andelements
. - Instead of a callback, you can also specify a class. If this class
implements
Sabre\Xml\XmlDeserializable
, then that function will be called to deserialize the element.
Tips for custom deserializing functions
If you either specify custom callbacks in $elementMap
, or you are using
Sabre\Xml\XmlDeserializable
, you will end up with a function that receives
an instance of Sabre\Xml\Reader
such as this:
function myDeserializer(Sabre\Xml\Reader $reader) {
}
The reader extends PHP's XMLReader object. You must absolutely make sure that you read the entire XML element, and not half way. The simplest possible deserializer function looks like this:
function myDeserializer(Sabre\Xml\Reader $reader) {
$reader->next();
return 'foo';
}
The next()
function is a
function that specifically instructs the PHP XMLReader to simply skip
the element and anything inside of it.
From within your deserializer function you can also easily call upon other deserializer functions to do the parsing for you:
function myDeserializer(Sabre\Xml\Reader $reader) {
$keyValue = Sabre\Xml\Deserializer\keyValue($reader);
return 'foo';
}
Because any deserializer function (such as keyValue
) is responsible for
reading the entire node, we no longer need to call next()
here.
sabre/xml also adds a parseAttributes()
method to the Reader to easily
get a list of attributes. Here's a deserializer function that just returns
the attributes and ignores sub-elements:
function myDeserializer(Sabre\Xml\Reader $reader) {
$attributes = $reader->parseAttributes();
$reader->next();
return $attributes;
}
Note that you must read attributes before anything else.
You can also ask the reader to parse the entire sub-tree for you:
function myDeserializer(Sabre\Xml\Reader $reader) {
$subTree = $reader->parseInnerTree();
return 'foo';
}
Right now we are not doing anything with $subTree
, but for every child
element $subTree
has the following:
* name
* value
* attributes
parseInnerTree
keeps your $classMap
into consideration for this. So any
child elements that are class-mapped will also correctly be parsed into this
new structure.
However, it is also possible to temporarily override the entire $classMap
for
the subtree. This allows to disable all custom deserializers or specify your
own, but only during parsing of the subtree.
function myDeserializer(Sabre\Xml\Reader $reader) {
$classMap = $reader->classMap;
$classMap['{urn:foo]some-child-element'] = 'NewDeserializer';
$subTree = $reader->parseInnerTree($classMap);
return 'foo';
}
Validate XML against a XSD
To validate XML content before parsing, use setSchema()
inherited from XMLReader
$service = new Sabre\Xml\Service();
$reader = $service->getReader();
$validXml = $reader->setSchema('myschema.xsd')
if ($validXml) {
$reader->xml($xml);
print_r($reader->parse());
}