If you are either a client or a server developer, and you have to create paths for things I would highly recommend using characters from this list.
A-Z // uppercase a-z // lowerchase 0-9 // numbers - // dash _ // underscore . // full stop
This is the list of unreserved characters, with one exception: it doesn't
include the tilde (
These characters are pretty much guaranteed to work correctly across the board. Everything else may be percent encoded, depending on the client.
Problems with percent encoding
The following three urls are technically equivalent:
http://example.org/~evert http://example.org/%7eevert http://example.org/%7Eevert
Bad clients may just look at those urls and consider them all as different urls. This causes really weird bugs.
A second problem is characters that are not part of ascii, such as ü.
The ü (u-umlaut) may be encoded in different ways, usually either UTF-8 or ISO-8859-1 (or CP-1252 or latin1, etc).
HTTP urls don't specify any encodings, they are just lists of bytes. As a
result a client may encode the url
http://example.org/üvert as one of
http://example.org/%FCvrt // latin-1 http://example.org/%C3%BCvrt // utf-8 normalization form C http://example.org/u%CC%88vrt // utf-8 normalization form D
The last three urls are 'identical' for a user, but from a HTTP perspective they are all distinct urls.
SabreDAV will accept either form and internally encode everything to UTF-8, but we cannot fix bad-behaving clients.
An imcomplete list of clients that have issues with this:
Windows encodes files as ISO-8859-1 in request-urls. Starting SabreDAV 1.2.0alpha1 these will be automatically converted to their UTF-8 equivalents.
This only covers request-url's though, urls returned in PROPFIND bodies will still be UTF-8. Some of these are reported to work, while others are not.
Read more on the Windows documentation page.
OS/X finder uses UTF-8. In most cases this works without any issues. More information can be found on the Finder page.
- BitKinex is a known offender.
All urls passed within SabreDAV are UTF-8 characters. Before 1.2.0alpha1 character sets were completely ignored, but now attempts are made to convert to UTF-8 where possible.
In general this is only done to urls, specifically when urls are converted to
local paths and the reverse. This is all handled in
If you are using a database or other datastore to store your data (and url's), you likely don't have to worry about this. Just make sure everything is UTF-8.
If you are using the filesystem to store files, this could be a little bit problematic. Linux and OS/X treat files as binary strings. When displayed in an application, the character encoding that will be used is based on the current locale. This is actually great, because there's no need to worry about these.
Windows is different though. NTFS uses UTF-16 internally to store filenames, but PHP's filesystem wrappers appear to use ISO-8859-1. This is problematic, because there's no clean mapping possible in many cases.
It might be useful in those cases to simply urlencode every filename before storing. This will give a cross-platform consistent result and the files will remain legible while checking it out with the console, etc.
More information on this topic