Negative DatabasesA striking feature of the natural immune system is its use of negative detection in which "self" is represented (approximately) by the set of circulating lymphocytes that fail to match self. This suggests the idea of a negative representation, in which a set of data elements is represented by its complement set. That is, all the elements not in the original set are represented (a potentially huge number), and the data itself are not explicitly stored. This representation has interesting information-hiding properties when privacy is a concern and it has implications for intrusion detection. To make these ideas concrete, we consider the case of a negative database.
In a negative database, the negative image of a set of data records is represented rather than the records themselves. For now we assume a universe U of finite-length records (or strings), all of the same length l, and defined over a binary alphabet. We logically divide the space of possible strings into two disjoint sets: DB representing the set positive records (holding the information of interest), and U - DB denoting the set of all strings not in DB. We assume that DB is uncompressed (each record is represented explicitly), but we allow U - DB to be stored in a compressed form called NDB. We refer to DB as the positive database and NDB as the negative database.
Negative databases have the potential to help prevent inappropriate queries and inferences, while supporting allowable operations. A motivating scenario for our work involves a large database of personal records, which an outside entity might need to search, for example, to identify suspicious activites or to conduct epidemiological studies. Under this scenario, it is desirable that the database support only the allowable queries while protecting the privacy of individual records, say from inspection by an insider. A second goal involves distributed data, where we would like to determine privately the intersection of sets owned by different parties. For example, two or more entities might wish to determine which of a set of possible ''items'' (transactions) they have in common without reveling the totality of the contents of their database or its cardinality.
Negative information can be represented efficiently, even though the negative image will typically be much larger than the positive image. In particular, a database consisting of n, l-bit records can be represented negatively using only O(ln) records. We have also shown that membership queries for DB can be processed against the negative representation in time no worse than linear in its size and that reconstructing the database DB represented by a negative database NDB given as input is an NP-hard problem when time complexity is measured as a function of the size of NDB.
Current technologies of encryption (for the data itself) and query restriction (for controlling access to the data) help ensure confidentiality, but neither solution is appropriate for all applications. In the case of encryption, the ability to search data records is hindered, while in the case of query restriction, individual records are vulnerable to insider attacks. Negative databases potentially address both of these concerns.
Negative representations of data have been explored in the past, especially in art where artists like Magritte and Escher have taken advantage of the so called figure-ground relationship. In Hindu philosophy, in saying "I am not this; no, nor am I this, nor this," that which then remains is a pure awareness of I. Examples can also be found in mathematics and statistics where sometimes it is easier to obtain an answer by looking at the complement of the problem we intend to solve and complementing the solution.
Negative Databases in the NewsIn an opinion piece, the Journal of Biosciences 32:2 (2007) discusses the analogy between negative databases and immunology (pdf).
The Economist (Sept.2, 2006) provides a brief introduction to negative databases and some potential applications.
A readily accessible overview of negative selection
and immunology as it relates to computing is online at DDJ's AI Expert Newletter (March 2006).
This project is partially supported by the NSF through ITR grant CCR-0331580 Sensitive Information in a Wired World.
For more information about Negative Databases, please contact
Fernando Esponda at firstname.lastname@example.org