This document defines APIs for a database of records holding simple values and hierarchical objects. Each record consists of a key and some value. Moreover, the database maintains indexes over records it stores. An application developer directly uses an API to locate records either by their key or by using an index. A query language can be layered on this API. An indexed database can be implemented using a persistent B-tree data structure.
A third-party host (or any object capable of getting content distributed to multiple sites) could use a unique identifier stored in its client-side database to track a user across multiple sessions, building a profile of the user's activities. In conjunction with a site that is aware of the user's real id object (for example an e-commerce site that requires authenticated credentials), this could allow oppressive groups to target individuals with greater accuracy than in a world with purely anonymous Web usage.
There are a number of techniques that can be used to mitigate the risk of user tracking:
iframe
s.
User agents MAY automatically delete stored data after a period of time.
This can restrict the ability of a site to track a user, as the site would then only be able to track the user across multiple sessions when he authenticates with the site itself (e.g. by making a purchase or logging in to a service).
However, this also puts the user's data at risk.
User agents should present the database feature to the user in a way that associates them strongly with HTTP session cookies. [[COOKIES]]
This might encourage users to view such storage with healthy suspicion.
User agents MAY require the user to authorize access to databases before a site can use the feature.
User agents MAY record the origins of sites that contained content from third-party origins that caused data to be stored.
If this information is then used to present the view of data currently in persistent storage, it would allow the user to make informed decisions about which parts of the persistent storage to prune. Combined with a blacklist ("delete this data and prevent this domain from ever storing data again"), the user can restrict the use of persistent storage to sites that he trusts.
User agents MAY allow users to share their persistent storage domain blacklists.
This would allow communities to act together to protect their privacy.
While these suggestions prevent trivial use of this API for user tracking, they do not block it altogether. Within a single domain, a site can continue to track the user during a session, and can then pass all this information to the third party along with any identifying information (names, credit card numbers, addresses) obtained by the site. If a third party cooperates with multiple sites to obtain such information, a profile can still be created.
However, user tracking is to some extent possible even with no cooperation from the user agent whatsoever, for instance by using session identifiers in URLs, a technique already commonly used for innocuous purposes but easily repurposed for user tracking (even retroactively). This information can then be shared with other sites, using using visitors' IP addresses and other user-specific data (e.g. user-agent headers and configuration settings) to combine separate sessions into coherent user profiles.
User agents should treat persistently stored data as potentially sensitive; it's quite possible for e-mails, calendar appointments, health records, or other confidential documents to be stored in this mechanism.
To this end, user agents should ensure that when deleting data, it is promptly deleted from the underlying storage.
Special thanks and great appreciation to Nikunj Mehta, the original author of this specification, who was employed by Oracle Corp when he wrote the early drafts.
Garret Swart was extremely influential in the design of this specification.
Special thanks to Chris Anderson, Pablo Castro, Dana Florescu, Israel Hilerio, Arun Ranganathan, Margo Seltzer, Ben Turner, Shawn Wilsher, and Kris Zyp, all of whose feedback and suggestions have led to improvements to this specification.