Converting New URI Schemes or URN
Sub-Schemes to HTTP
views expressed herein are my own
and do not necessarily reflect
those of HP. Comments are invited.
New URI schemes or URN sub-schemes are sometimes
proposed for resource identification in applications where the HTTP
protocol is deemed
unsuitable. This paper argues that URIs based on specialized HTTP URI prefixes would
better choice in virtually all cases, even if the resource resolution
or data transfer properties of HTTP are insufficient for these
applications. A simple recipe is presented for converting
proposed URI schemes or URN sub-schemes to HTTP using specialized
This technique cleanly separates the use of the URI as an identifier
(to establish resource identity) from the use of the URI as a locator
(to retrieve representations). The
resulting capabilities of the HTTP URIs are virtually a direct superset
of those of URIs based on new URI schemes or URN sub-schemes.
Table of Contents
Introductions: A Scenario
A (fictitious) organization, XyzConsortium, representing a group of
individuals working in a particular field, xyz, wishes to define a kind
persistent identifier that has resolution characteristics different
from HTTP. It defines a new URI scheme or URN sub-scheme,
xyzsheme, and publishes a specification, XyzSpec, that defines:
The resulting URIs are of the form xyzscheme:foo, where xyzscheme:foo identifies a resource r, and foo obeys the syntactic
conventions dictated by XyzSpec in addition to obeying the syntactic
conventions for URIs in general. Users of xyzscheme URIs
are expected to know and follow the conventions published in XyzSpec,
and specialized software is made available for resolving xyzscheme URIs
to their associated data d or
metadata md1. Metadata md1 is expressed in a widely
understood format that is not specific to the xyz field. However,
data d may be in a format
that is specific to the xyz field.
- Syntactic conventions for constructing such URIs (over and above
the syntactic requirements of URIs in general), perhaps
incorporating such niceties as version numbers or checksums into the
- Conventions for locating data, d,
associated with such a URI; and
- Conventions for locating metadata, md1, associated with such a URI.
Heavy users of xyzscheme URIs are happy with xyzscheme URIs and do not
mind the fact that they must have special software installed to
retrieve data d and/or
metadata md1. They
often need both the data d
and the metadata md1.
users (or users in more peripheral fields) who do not have the
xyzscheme resolution software installed are not so happy. They
are unable to make much use of these URIs without the xyzscheme
software, in spite of the fact that many of their applications only
need metadata md1 and do not
need data d at all. A
few complain to
XyzConsortium, but many others
quietly forego the benefit that such metadata could have
provided to their applications.
To facilitate use by more casual users (and users in related fields),
XyzConsortium decides to offer HTTP URIs as synonyms for its existing
xyzcheme URIs using the following recipe.
Recipe for Converting to HTTP URIs
Step 1. XyzConsortium
creates a web site, http://xyzpurl.org, for
forwarding HTTP requests. In setting up xyzpurl.org, XyzConsortium uses
all of the institutional
and legal safeguards at its disposal to ensure that the site will
continue to exist and faithfully implement its intended purpose for as
long as possible.
Step 2. XyzConsortium
publishes a specification, XyzHttpSpec, for interpreting the
specialized HTTP URI
prefix, "http://xyzpurl.org?". In particular, it declares that
for any URI of the form http://xyzpurl.org?foo:
(This technique of defining an HTTP URI prefix is the same technique
used by thing-described-by.org.)
- Syntactic conventions for http://xyzpurl.org?foo must conform to the XyzSpec
conventions for xyzscheme:foo;
- Data associated with http://xyzpurl.org?foo may be located using the
XyzSpec conventions for locating data associated with xyzscheme:foo;
- Metadata associated with http://xyzpurl.org?foo may be located using the
XyzSpec conventions for locating metadata associated with xyzscheme:foo.
Step 3. XyzConsortium
configures the xyzpurl.org web server such that an HTTP GET on any URI
the form http://xyzpurl.org?foo
will be redirected (using an HTTP 303
See Other status code) to another HTTP URI, p, where metadata, md2, may be obtained. (Think of p as a caching proxy for accessing
metadata md1 of xyzscheme:foo.)
Metadata md2 should
resource r that
xyzscheme:foo) names, i.e., it
should provide sufficient information for users to distinguish resource
r from all other
It should be
expressed in a widely understood format that does not require
xyz-specific software to interpret. And of course, md2 must be consistent with md1 (and d). Normally, md1 would be
sufficient to meet these requirements. Furthermore, ideally md2 should include:
Step 4. XyzConsortium
furthermore publishes a declaration stating that: (a) for any URI
conforming to XyzHttpSpec, http://xyzpurl.org?foo identifies the same resource r that xyzscheme:foo identifies; and (b) if for any
reason the xyzpurl.org web site no longer functions, or if it fails to
faithfully implement the intended purpose of XyzHttpSpec, then any
information it serves should be ignored and the meaning of
http://xyzpurl.org?foo should be regarded as identical
to the meaning of xyzscheme:foo.
- as much of metadata md1 as
possible (preferably all of md1),
which p could supply by using
the XyzSpec conventions to retrieve (and perhaps cache) md1;
- (if needed) pointers to metadata md1 and data d, preferably via protocols that
are not xyz specific (such as HTTP); and
- a pointer to the XyzHttpSpec, so that users discovering
are encouraged to learn how they may be resolved more efficiently using
Many variations of this basic recipe are possible of course. Some
applications for which new URI schemes or URN sub-schemes are proposed
may only have metadata md1 and
no data d; other applications
may mix metadata md1 and
Conflicts in Metadata
If both http://xyzpurl.org?foo and
xyzscheme:foo URIs are used,
and they both identify the same resource r, then there would be two paths
for obtaining authoritative metadata about r, and hence the metadata retrieved
via the two paths could potentially conflict. If such conflicts
are due to p providing stale
data due to caching, then metadata md2
should indicate the time(s) when the data is/was known to be
valid. If conflicts are not due to caching or latency, then p is not faithfully implementing
Suppose a resource owner wishes to mint http URIs but also wants to
offer the URI resolution functionality of LSIDs. To do this,
the resource owner can create a special purpose http URI prefix, such
as such as http://entrez.example/2007/lsid: , and declare that prefix
as indicating that such URIs could be accessed using the LSID
protocol. So for a URI of the form
a naive client dereferencing the URI would thus use HTTP, but an
LSID-aware client might access the data using an LSID-aware proxy,
Of course, the proxy would not need to be hard-coded to recognize the
prefix. It could merely read some string pattern matching rules
(or an ontology) to map http://entrez.example/2007/lsid: URIs to
- recognize the http://entrez.example/2007/lsid: prefix;
- convert it to urn:lsid: and
- resolve the result using LSID resolution.
Furthermore, the resource metadata returned when the http URI is
naively dereferenced using HTTP could include a pointer to the URI
pattern matching rules (or an ontology), so that an LSID-aware proxy
that did not previouly recognize the
http://entrez.example/2007/lsid: prefix could be automatically
bootstrapped to learn of its special meaning.
Comparing Capabilities of xyzscheme URIs
Because the http://xyzpurl.org URIs cleanly separate resource
identification from resource resolution or data transfer issues,
deferring to xyzscheme conventions for those tasks, the capabilities of
http://xyzpurl.org URIs are virtually a direct superset of the
capabilities of xyzscheme URIs, as the following table illustrates.
|Heavy users willing to install
special xyzscheme software
recognize the "xyzscheme:"
prefix on xyzscheme:foo
_ URIs and apply
the conventions defined in XyzSpec to retrieve the data or metadata
associated with resource r.
recognize the "http://xyzpurl.org?"
prefix on http://xyzpurl.org?foo
URIs and apply the conventions defined in XyzSpec to retrieve the data
or metadata associated with resource r.
|Casual users without special
access data or metadata.
|Software may be able
to access metadata, md2,
which may include a subset of md1
or a superset of md1.
Bootstrapping Protocol Adoption
This section was added 12-Oct-2009.
Another major benefit of HTTP URIs is that they can be used to
bootstrap the adoption of a new protocol by resolving to a download of
a browser extension or other software that implements the new protocol,
suggested by Graham Klyne. In contrast, if a new protocol
is based on a new URI scheme, a user who wishes to enjoy the features
of a new protocol has no choice but to manually download and install
the software that implements that protocol. Since users are far
more likely to accept a browser extension download than to manually
locate, download and install new software, the use of HTTP URIs could
dramatically improve the adoption rate of a new protocol.
This section was added 8-Aug-2006.
Although the above has illustrated how the capabilities of HTTP URIs
can generally be a direct superset of the capabilities of URIs based on
new schemes or URN sub-schemes, there are some inherent differences for
which new URI
schemes or URN sub-schemes could still be seen as advantageous, such as:
Are these differences important enough in practice to warrant creating
a new URI
scheme or URN sub-scheme? In my opinion, no. However, this
may depend on the application. Please email me if you know of
where you think these differences are important enough to
justify the creation of new URI schemes or URN sub-schemes, or if you
know of other inherent differences that I have missed.
- URI Length. HTTP
URIs will generally be longer
- Governing Authority.
New URI schemes must be registered with IANA, whereas specialized HTTP
may be defined by any URI owner. This may be a concern, both
because IANA may be perceived as being more reputable than other
organizations, and because IANA provides a single place to look for
scheme definitions. However, if this concern is important enough,
a registry of specialized HTTP prefixes could be created by a reputable
organization -- potentially even IANA.
- Expectations. Users
discovering an xyzscheme URI expect it to be governed by a separate
specification, whereas users discovering an HTTP URI with a specialized
prefix may not realize that there is a separate specification governing
it (over and above the http scheme specification). This can be
mitigated by educating users, and one good way to do so is to serve
useful metadata (indirectly) via the URI, as described above.
HTTP URIs with specialized prefixes provide greater capability than
URIs based on new
URI schemes or URN sub-schemes in virtually all cases.
Furthermore, such HTTP URIs seem better equipped to survive the test of
time than URIs based on new URI schemes or URN sub-schemes:
Addendum 2006-08-02: See also Kunze
and Rodgers excellent work on
Resource Keys (ARKs). They provide a much more thorough
discussion of how to achieve persistence.
- HTTP URIs can be
by anyone, using GET -- not just by those who are the primary intended
users. Therefore, HTTP URIs are likely to more widely disperse
knowledge of their intended
and conventions for use, thus increasing the likelihood of their
survival over time.
- HTTP URIs offer a lower barrier to use: applications without
software can still
do a follow-your-nose GET on an HTTP URI to potentially retrieve
useful metadata about it. Therefore they are likely to achieve
greater uptake, particularly in
applications beyond their primary intended use.
- HTTP URIs allow a new protocol adoption to be readily
bootstrapped by dereferencing to browser extension downloads.
Frequently Asked Questions (FAQ)
Q: Why does
http://xyzpurl.org not violate URI opacity?
A: The principle of URI
opacity is intended to prevent agents from incorrectly guessing properties of the
associated resource or representation. However, in this case,
software that obeys the XyzHttpSpec is not guessing, it is following
the explicit declaration of the URI owner (XyzConsortium).
Q: Why does
http://xyzpurl.org?foo do a
303-redirect instead of returning a representation? After all,
xyzscheme:foo is supposed to identify an information resource!
A: An HTTP URI that returns a 303 status in response to an HTTP
GET may be any kind of resource -- including an information
resource. (See the W3C
TAG's httpRange-14 decision.) This recipe suggests using a
specialized HTTP prefixes be used for transient URIs -- URIs that are
not intended to persist?
- to facilitate cases where the named resource r is not an information resource,
i.e., where it does not have a representation (data d);
- to facilitate cases where representations (data d) would be inefficient or
inappropriate to retrieve
- to enable data and metadata to be separately associated with the
- to permit the resource to have properties that are not intrinsic
to information resources, such as immutability.
A: Sure. The prefix owner can associate any desired properties
with the prefix. The prefix could indicate that the URI is
URI schemes or URN sub-schemes allow different URI owners to mint URIs
independently, while a user discovering a URI will know that the URI
has the property defined by that scheme, without having to know the
conventions defined by each URI owner. For example,
xyzscheme:foo.com/fum and xyzscheme:bar.com/boo can be syntactically
recognized as obeying the conventions for xyzscheme even though they
were minted by different organizations, foo.com and bar.com. How
can this be done with HTTP URIs?
A: Here are two techniques:
- The owner of the specialized HTTP prefix can use the rest of the
URI to delegate minting authority to other URI owners, such as:
- In effect, a class of specialized HTTP prefixes can be defined,
and individually owned prefixes can declare themselves to be members of
that class. For example, if the term
http://xyzconsortium.org/terms/xyzprefix is defined to indicate that
something is a specialized xyz HTTP prefix, then metadata served
(indirectly) via http://foo.com?fum can indicate that "http://foo.com?"
is a http://xyzconsortium.org/terms/xyzprefix , and metadata served
(indirectly) via http://bar.com?bee can also indicate that
"http://bar.com?" is a http://xyzconsortium.org/terms/xyzprefix .
1. HTTP 303 See Other status code:
2. Describing Versus Identifying:
3. URI Opacity:
4. W3C TAG's httpRange-14 decision:
5. The ARK Persistent Identifier Scheme, J. Kunze and R. P. C. Rodgers,
6. LSID specification:
7. Graham Klyne suggestion of using HTTP URIs to retrieve protocol
12-Oct-2009: Added section
on Bootstrapping Protocol Adoption.
19-May-2009: Updated email
1-Mar-2007: Added LSID
8-Aug-2006: Added ARK reference
2-Aug-2006: Initial publication