This document defines APIs for a database of records holding simple values and hierarchical objects. Each record consists of a key and some value. Moreover, the database maintains indexes over records it stores. An application developer directly uses an API to locate records either by their key or by using an index. A query language can be layered on this API. An indexed database can be implemented using a persistent B-tree data structure.

Privacy

User tracking

A third-party host (or any object capable of getting content distributed to multiple sites) could use a unique identifier stored in its client-side database to track a user across multiple sessions, building a profile of the user's activities. In conjunction with a site that is aware of the user's real id object (for example an e-commerce site that requires authenticated credentials), this could allow oppressive groups to target individuals with greater accuracy than in a world with purely anonymous Web usage.

There are a number of techniques that can be used to mitigate the risk of user tracking:

Blocking third-party storage
User agents MAY restrict access to the database objects to scripts originating at the domain of the top-level document of the browsing context, for instance denying access to the API for pages from other domains running in iframes.
Expiring stored data

User agents MAY automatically delete stored data after a period of time.

This can restrict the ability of a site to track a user, as the site would then only be able to track the user across multiple sessions when he authenticates with the site itself (e.g. by making a purchase or logging in to a service).

However, this also puts the user's data at risk.

Treating persistent storage as cookies

User agents should present the database feature to the user in a way that associates them strongly with HTTP session cookies. [[COOKIES]]

This might encourage users to view such storage with healthy suspicion.

Site-specific white-listing of access to databases

User agents MAY require the user to authorize access to databases before a site can use the feature.

Origin-tracking of stored data

User agents MAY record the origins of sites that contained content from third-party origins that caused data to be stored.

If this information is then used to present the view of data currently in persistent storage, it would allow the user to make informed decisions about which parts of the persistent storage to prune. Combined with a blacklist ("delete this data and prevent this domain from ever storing data again"), the user can restrict the use of persistent storage to sites that he trusts.

Shared blacklists

User agents MAY allow users to share their persistent storage domain blacklists.

This would allow communities to act together to protect their privacy.

While these suggestions prevent trivial use of this API for user tracking, they do not block it altogether. Within a single domain, a site can continue to track the user during a session, and can then pass all this information to the third party along with any identifying information (names, credit card numbers, addresses) obtained by the site. If a third party cooperates with multiple sites to obtain such information, a profile can still be created.

However, user tracking is to some extent possible even with no cooperation from the user agent whatsoever, for instance by using session identifiers in URLs, a technique already commonly used for innocuous purposes but easily repurposed for user tracking (even retroactively). This information can then be shared with other sites, using using visitors' IP addresses and other user-specific data (e.g. user-agent headers and configuration settings) to combine separate sessions into coherent user profiles.

Sensitivity of data

User agents should treat persistently stored data as potentially sensitive; it's quite possible for e-mails, calendar appointments, health records, or other confidential documents to be stored in this mechanism.

To this end, user agents should ensure that when deleting data, it is promptly deleted from the underlying storage.

Authorization

DNS spoofing attacks

Because of the potential for DNS spoofing attacks, one cannot guarantee that a host claiming to be in a certain domain really is from that domain. To mitigate this, pages can use SSL. Pages using SSL can be sure that only pages using SSL that have certificates identifying them as being from the same domain can access their databases.

Cross-directory attacks

Different authors sharing one host name, for example users hosting content on geocities.com, all share one set of databases.

There is no feature to restrict the access by pathname. Authors on shared hosts are therefore recommended to avoid using these features, as it would be trivial for other authors to read the data and overwrite it.

Even if a path-restriction feature was made available, the usual DOM scripting security model would make it trivial to bypass this protection and access the data from any path.

Implementation risks

The two primary risks when implementing these persistent storage features are letting hostile sites read information from other domains, and letting hostile sites write information that is then read from other domains.

Letting third-party sites read data that is not supposed to be read from their domain causes information leakage, For example, a user's shopping wish list on one domain could be used by another domain for targeted advertising; or a user's work-in-progress confidential documents stored by a word-processing site could be examined by the site of a competing company.

Letting third-party sites write data to the persistent storage of other domains can result in information spoofing, which is equally dangerous. For example, a hostile site could add records to a user's wish list; or a hostile site could set a user's session identifier to a known ID that the hostile site can then use to track the user's actions on the victim site.

Thus, strictly following the origin model described in this specification is important for user security.

Requirements

Requirements will be written with an aim to verify that these were actually met by the API specified in this document.

Acknowledgements

Special thanks and great appreciation to Nikunj Mehta, the original author of this specification, who was employed by Oracle Corp when he wrote the early drafts.

Garret Swart was extremely influential in the design of this specification.

Special thanks to Chris Anderson, Pablo Castro, Dana Florescu, Israel Hilerio, Arun Ranganathan, Margo Seltzer, Ben Turner, Shawn Wilsher, and Kris Zyp, all of whose feedback and suggestions have led to improvements to this specification.