Deterministic Encryption on Outsourced Bits
http://senykam.github.io/tags/deterministic-encryption/
Recent content in Deterministic Encryption on Outsourced BitsHugo -- gohugo.ioen-usMon, 14 Oct 2013 23:34:09 -0300How to Search on Encrypted Data: Deterministic Encryption (Part 2)
http://senykam.github.io/2013/10/14/how-to-search-on-encrypted-data-deterministic-encryption-part-2
Mon, 14 Oct 2013 23:34:09 -0300http://senykam.github.io/2013/10/14/how-to-search-on-encrypted-data-deterministic-encryption-part-2<p><em>This is the second part of a series on searching on encrypted data. See parts <a href="http://outsourcedbits.org/2013/10/06/how-to-search-on-encrypted-data-part-1/">1</a>, <a href="https://outsourcedbits.org/2013/10/30/how-to-search-on-encrypted-data-part-3/">3</a>, <a href="https://outsourcedbits.org/2013/12/20/how-to-search-on-encrypted-data-part-4-oblivious-rams/">4</a> and <a href="https://outsourcedbits.org/2014/08/21/how-to-search-on-encrypted-data-searchable-symmetric-encryption-part-5/">5</a>.</em></p>
<p><img src="http://senykam.github.io/img/search.jpg" class="alignright" width="250">
In this post we'll cover the simplest way to search on encrypted data. This is
usually the solution people come up with when they first think of the problem of
encrypted search and, as we'll see this, this approach has some nice properties
but also some limitations.</p>
<p>To make this work we' ll need a special type of encryption scheme called a
<em>property-preserving encryption</em> (PPE) scheme. PPE schemes encrypt
messages in a way that leaks certain properties of the underlying message.</p>
<p>There are different types of PPE schemes that each leak different properties.
The simplest form is <em>deterministic</em> encryption which
always encrypts the same message to the same ciphertext <sup class="footnote-ref" id="fnref:1"><a class="footnote" href="#fn:1">1</a></sup>. The property
preserved by deterministic encryption is <em>equality</em> since, given two
encryptions</p>
<p><span class="math">\[
c_1 = {\sf Enc}_K(m_1) \textrm{ and } c_2 = {\sf Enc}_K(m_2),
\]</span></p>
<p>one can test if the underlying messages are equal by just checking if <span class="math">\(c_1 =
c_2\)</span> <sup class="footnote-ref" id="fnref:2"><a class="footnote" href="#fn:2">2</a></sup>. More complex PPE schemes include <a href="http://www.cc.gatech.edu/~aboldyre/papers/bclo.pdf">order-preserving
encryption</a> (OPE) and
<a href="http://www.cs.utexas.edu/~jrous/documents/191.pdf">orthogonlity-preserving
encryption</a>.</p>
<p>The approach we'll describe here works with any PPE scheme but is easier to
explain with deterministic encryption so that's what we'll use. PPE-based
encrypted search was first proposed in the Database community and later studied
more formally in the Cryptography community. The first paper to provide a
cryptographic treatment of this approach was a 2006 paper by Bellare, Boldyreva
and O' Neill [<a href="http://eprint.iacr.org/2006/186.pdf">BBO06</a>].</p>
<p>The authors formally study the notion of deterministic encryption and show how
to apply it to the problem of encrypted search. While Bellare et al. were
motivated in part by searching on encrypted data, the notion of deterministic
encryption is interesting to cryptographers in its own right and the formal
study of deterministic encryption that was initiated by this paper has led to
other applications and deepened our understanding of encryption.</p>
<p>As mentioned above, another<br>
important type of PPE is OPE which was introduced by Agrawal, Kiernan,
Srikant and Xu in [<a href="http://rsrikant.com/papers/sigmod04.pdf">AKSX04</a>]
and studied more formally by Boldyreva, Chenette, Lee and O' Neill
[<a href="http://www.cc.gatech.edu/~aboldyre/papers/bclo.pdf">BCLO09</a>] and
[<a href="http://www.cc.gatech.edu/~aboldyre/papers/operev.pdf">BCO11</a>].</p>
<h2 id="the-highlevel-idea">The High-Level Idea</h2>
<p>Suppose we have both a deterministic encryption scheme <span class="math">\({\sf Enc}^D\)</span> and a standard
<a href="http://en.wikipedia.org/wiki/Ciphertext_indistinguishability">CPA-secure</a>
(i.e., randomized) encryption scheme <span class="math">\({\sf Enc}^R\)</span>.</p>
<p>We can then create an encrypted database <span class="math">\({\sf EDB}\)</span> as follows. For each document
<span class="math">\(D_i\)</span> in the collection <span class="math">\((D_1, \dots, D_n)\)</span>, the client computes deterministic
encryptions of each keyword of <span class="math">\(D_i\)</span>. Assuming each document <span class="math">\(D_i\)</span> has <span class="math">\(m\)</span>
keywords <span class="math">\((w_{i,1}, \dots, w_{i,m})\)</span>, the {\sf EDB} then simply consists of <span class="math">\(n\)</span> tuples</p>
<p><span class="math">\[
r_i = (d_{i,1}, \dots, d_{i,m}, {\sf ptr}(c_i)),
\]</span></p>
<p>where <span class="math">\(d_{i,j} = {\sf Enc}^D_{K_2}(w_{i,j})\)</span>, <span class="math">\(c_i = {\sf Enc}^R_{K_1}(D_i)\)</span> and <span class="math">\({\sf ptr}(c_i)\)</span> is a
pointer to ciphertext <span class="math">\(c_i\)</span>.</p>
<p>Recall that in our setting, the client sends the encrypted database</p>
<p><span class="math">\[
{\sf EDB} = (r_1, \dots, r_n)
\]</span></p>
<p>to the server along with the randomized encryptions of the
documents <span class="math">\((c_1, \dots, c_n)\)</span>.</p>
<p>To search for keyword <span class="math">\(w\)</span>, the client just sends a
deterministic encryption of <span class="math">\(w\)</span> to the server. This encryption of the keyword,
<span class="math">\(d_w = {\sf Enc}^D_{K_1}(w)\)</span>, will serve as the token. Now all the server has
to do is compare <span class="math">\(d_w\)</span> to all the deterministic encryptions in <span class="math">\({\sf EDB}\)</span>. If <span class="math">\(d_w\)</span>
is equal to any of them, the server follows the corresponding pointer and
returns the encrypted document. In other words, for all <span class="math">\(1 \leq i \leq n\)</span> and <span class="math">\(1
\leq j \leq m\)</span>, the server tests if <span class="math">\(d_w = d_{i,j}\)</span> and if they are equal it
follows <span class="math">\({\sf ptr}(c_i)\)</span> and returns <span class="math">\(c_i\)</span>.</p>
<p>This clearly works but there is a limitation in the way the
scheme is described: the search time for the server is <span class="math">\(O(nm)\)</span>, i.e., linear in
the number of documents (let's just assume <span class="math">\(m\)</span> is very small here). Obviously,
linear-time search is too slow for practice but in reality this is not a problem
because we can just store the deterministic encryptions of
<span class="math">\({\sf EDB}\)</span> in data structures that support fast search (e.g., a binary search
tree) so that search can be performed very quickly (e.g., in time <span class="math">\(O(\log(n))\)</span>.</p>
<h2 id="is-this-secure">Is This Secure?</h2>
<p>As we've seen, PPE-based solutions can achieve fast (i.e., sub-linear)
server-side encrypted search. They do, however, have some limitations with
respect to security as discussed and formalized by Bellare et al.</p>
<p>The first problem is that the encrypted database <span class="math">\({\sf EDB}\)</span> leaks quite a bit of
information to the server about the data collection---even before the client
has performed any searches. In particular, recall that the keywords in <span class="math">\({\sf EDB}\)</span> are
encrypted using a deterministic encryption scheme so the same keyword <span class="math">\(w\)</span> will
always encrypt to the same ciphertext.</p>
<p>This means that if the server sees two or
more equal ciphertexts in <span class="math">\({\sf EDB}\)</span> it knows that the corresponding encrypted
documents contain a keyword in common. In addition, the server learns the
frequency with which keywords appear which makes the encrypted database vulnerable to
<a href="http://en.wikipedia.org/wiki/Frequency_analysis">frequency analysis</a>.</p>
<p>Another issue is that since tokens are deterministic encryptions of the
search terms, the server will always know whether the client is
repeating a search a not.</p>
<p>A third issue occurs when the deterministic encryption scheme (or any form of
PPE scheme) is public-key. In this case, all the deterministic encryptions (both
in <span class="math">\({\sf EDB}\)</span> and in the tokens) are encrypted under the client's public key which
is, obviously, public and available to the server. The server can then mount a
dictionary attack on the encrypted database by encrypting a list of possible
keywords and comparing them to the ones found in <span class="math">\({\sf EDB}\)</span> and in the tokens. If it
gets a match then it knows the keyword.</p>
<p>This attack clearly shows that public-key PPE-based solutions should probably
not be used for "normal" data (e.g., text, emails, PII etc.). But does it mean
we can't use it all? Not quite. As Bellare et al. observe in their paper, such a
solution can be used when the data has high min-entropy, which is a way of
saying that the data looks random to the server.</p>
<p>The problem of course is that it's not clear when this applies in practice.
Also, note that even if the keywords do have high min-entropy, the other two
issues still remain.</p>
<h2 id="conclusions">Conclusions</h2>
<p>So we've seen one approach to searching on encrypted data based on
property-preserving encryption. It results in a solution that supports fast
search on encrypted data but, unfortunately, leaks quite a bit of information to
the server. In the next post we'll go over another solution that provides a
different tradeoff: it leaks less information to the server but achieves slower
search time.</p>
<div class="footnotes">
<hr>
<ol>
<li id="fn:1">In general it is a terrible idea to encrypt with deterministic encryption since we've known since [<a href="http://theory.lcs.mit.edu/~cis/pubs/shafi/1984-jcss.pdf}{this}">GM84</a>] that any <a href="http://en.wikipedia.org/wiki/Semantic_security">secure</a> encryption scheme has to be randomized. The point here, however, is that we may be willing to use weaker primitives in order to design more functional encryption schemes.
<a class="footnote-return" href="#fnref:1">↩</a></li>
<li id="fn:2">Technically, we do not need a deterministic <em>encryption</em> scheme since we'll never need to decrypt so (in the symmetric setting) one could use a <a href="http://en.wikipedia.org/wiki/Pseudorandom_function_family">pseudo-random function</a>.
<a class="footnote-return" href="#fnref:2">↩</a></li>
</ol>
</div>