Functional Encryption on Outsourced Bits
http://senykam.github.io/tags/functional-encryption/
Recent content in Functional Encryption on Outsourced BitsHugo -- gohugo.ioen-usWed, 30 Oct 2013 11:02:40 -0300How to Search on Encrypted Data: Functional Encryption (Part 3)
http://senykam.github.io/2013/10/30/how-to-search-on-encrypted-data-functional-encryption-part-3
Wed, 30 Oct 2013 11:02:40 -0300http://senykam.github.io/2013/10/30/how-to-search-on-encrypted-data-functional-encryption-part-3<p><em>This is the third part of a series on searching on encrypted data. See parts <a href="http://outsourcedbits.org/2013/10/06/how-to-search-on-encrypted-data-part-1/">1</a>, <a href="https://outsourcedbits.org/2013/10/30/how-to-search-on-encrypted-data-part-2/">2</a>, <a href="https://outsourcedbits.org/2013/12/20/how-to-search-on-encrypted-data-part-4-oblivious-rams/">4</a> and <a href="https://outsourcedbits.org/2014/08/21/how-to-search-on-encrypted-data-searchable-symmetric-encryption-part-5/">5</a>.</em></p>
<p><img src="http://senykam.github.io/img/search.jpg" class="alignright" width="250">
Previously, we covered the simplest solution for encrypted search which
consisted of using a deterministic encryption scheme (more generally, using a
property-preserving encryption scheme) to encrypt keywords. This resulted in an
encrypted search solution with sub-linear (in <span class="math">\(n\)</span>) search time but that leaked
quite a bit of information to the server.</p>
<p>We' ll now describe a different approach that provides the opposite properties:
slow search but better security. At a high-level, one can view this approach
as simply replacing the PPE scheme in the previous solution with a <em>functional
encryption</em> (FE) scheme.</p>
<h2 id="functional-and-identitybased-encryption">Functional and Identity-Based Encryption</h2>
<p>The notion of FE was first described by Sahai and Waters in a talk
[<a href="http://www.cs.utexas.edu/~bwaters/presentations/files/functional.ppt">SW09</a>]
and later formalized by Boneh, Sahai and Waters
[<a href="http://eprint.iacr.org/2010/543.pdf">BSW10</a>] and by O'Neill
[<a href="http://eprint.iacr.org/2010/556">O10</a>]. Starting with the work of Boneh
and Franklin on <a href="http://en.wikipedia.org/wiki/ID-based_encryption">identity-based
encryption</a>, there was a slew
of new encryption schemes achieving various properties (e.g., attribute-based
encryption, hidden vector encryption, predicate encryption). Many of these
constructions felt loosely related so the idea behind FE was to capture all
these schemes under a single framework.</p>
<p>Though everything we'll cover can be done with FE, for concreteness, we'll
consider the special case of IBE, which was first suggested by Shamir
[<a href="http://discovery.csc.ncsu.edu/Courses/csc774-S07/shamir84.pdf">Shamir84</a>]
and realized by Boneh and Franklin
[<a href="http://crypto.stanford.edu/~dabo/pubs/papers/bfibe.pdf">BF01</a>].</p>
<p>A public-key IBE scheme consists of four algorithms:</p>
<ul>
<li>A setup algorithm <span class="math">\({\sf Setup}\)</span> used to generate a master secret and public key
pair <span class="math">\((msk, mpk)\)</span>.</li>
<li>An encryption algorithm <span class="math">\({\sf Enc}\)</span> that takes as input the master public-key
<span class="math">\(mpk\)</span>, an identity <span class="math">\(id\)</span> and a message <span class="math">\(m\)</span> as input and returns a ciphertext <span class="math">\(c\)</span>.</li>
<li>A key generation algorithm <span class="math">\({\sf Keygen}\)</span> that takes as input the master secret
key <span class="math">\(msk\)</span> and an identity <span class="math">\(id\)</span> and returns a secret key <span class="math">\(sk_{id}\)</span>.</li>
<li>And finally a decryption algorithm <span class="math">\({\sf Dec}\)</span> that takes as input a secret key
<span class="math">\(sk_{id}\)</span> and a ciphertext <span class="math">\(c\)</span> and returns a message <span class="math">\(m\)</span> or a failure symbol <span class="math">\(\bot\)</span>.</li>
</ul>
<p>The motivation behind IBE is key distribution. In particular, using an IBE
scheme should be easier than using a standard (public-key) encryption scheme
where public keys have to be certified, revoked and verified.</p>
<p>Let's consider a concrete example. Suppose Alice wants to send an encrypted
message to Bob who works at Microsoft. The idea is that Microsoft would first
generate a pair of master keys <span class="math">\((msk, mpk)\)</span> and distribute <span class="math">\(mpk\)</span> together with a
certificate. To send her message <span class="math">\(m\)</span>, Alice would retrieve Microsoft's master
public key <span class="math">\(mpk\)</span>, verify its certificate and then encrypt <span class="math">\(m\)</span> under Bob's
identity by computing:</p>
<p><span class="math">\[
c = {\sf Enc}(mpk, "\texttt{bob@microsoft.com}", m).
\]</span></p>
<p>To decrypt the ciphertext <span class="math">\(c\)</span>, Bob needs to hold a secret key for his identity
under Microsoft's master key:</p>
<p><span class="math">\[
sk = {\sf Keygen}(msk, "\texttt{bob@microsoft.com}").
\]</span></p>
<p>Given this key, he can then recover the message by computing <span class="math">\(m = {\sf Dec}(sk, c)\)</span>.</p>
<p>Notice that Alice never needed to know what Bob's public key was or to verify
any certificate for his key. The only certificate she had to verify was for
Microsoft's master public key but once that key is authenticated she can send
email to anyone at Microsoft without any additional work.</p>
<h2 id="publickey-encrypted-search">Public-Key Encrypted Search</h2>
<p>We are now ready to see how (anonymous) IBE can be used to search over
encrypted data. This idea was first proposed by Boneh, Di Crescenzo, Ostrovsky
and Persiano
[<a href="http://crypto.stanford.edu/~dabo/pubs/papers/encsearch.pdf">BCOP04</a>] and is
best explained by considering the following email scenario where Alice wants to
send an encrypted email to Bob.</p>
<p>Bob first generates a master secret and public key pair for the IBE scheme
<span class="math">\((msk, mpk)\)</span> and a secret and public key pair for a standard public-key
encryption scheme <span class="math">\((sk, pk)\)</span>. He then makes the public keys <span class="math">\((mpk, pk)\)</span> public
and keeps the secret keys <span class="math">\((msk, sk)\)</span> private. Alice encrypts her message under
<span class="math">\(pk\)</span> using the standard public-key encryption scheme, resulting in a ciphertext
<span class="math">\(c\)</span>. She then attaches IBE encryptions of "1" under Bob's master public key
<span class="math">\(mpk\)</span> with the keywords as the identity. This results in a set of IBE encryptions
<span class="math">\((e_1, \dots, e_m)\)</span> where each <span class="math">\(e_j\)</span> (for <span class="math">\(1 \leq j \leq m\)</span>) is defined as</p>
<p><span class="math">\[
e_j = {\sf Enc}(mpk, w_j),
\]</span></p>
<p>where <span class="math">\((w_1, \dots, w_m)\)</span> are the keywords.</p>
<p>Let's suppose Bob's email server has received <span class="math">\(n\)</span> emails of this form, so that
it now holds a set of encrypted emails <span class="math">\((c_1, \dots, c_n)\)</span> and an encrypted
database</p>
<p><span class="math">\[
{\sf EDB} = \bigg(\big(e_{1, 1}, \dots, e_{1, m}, {\sf ptr}(c_1)\big), \dots,
\big(e_{n,1}, \dots, e_{n,m}, {\sf ptr}(c_n)\big)\bigg).
\]</span></p>
<p>Now, if Bob wants to retrieve the emails with keyword <span class="math">\(w\)</span>, he just needs to
generate a secret IBE key as <span class="math">\(sk_w = {\sf Keygen}(msk, w)\)</span> and send it as the token
to the server. The server then tries to decrypt each IBE ciphertext in
<span class="math">\({\sf EDB}\)</span> and if successful follows the associated pointer to return the
appropriate encrypted email.</p>
<p>An important observation is that a standard IBE scheme here will not be enough.
The problem is that the notion of IBE does not necessarily guarantee that a
ciphertext hides information about the identity used to create it.
This means that if we were to use a standard IBE scheme, <span class="math">\({\sf EDB}\)</span> could leak the
keywords to the server. To address this, Boneh et al. observe that what you
actually need is an <em>anonymous</em> IBE scheme which essentially means that the
ciphertexts do not reveal information about the identities. Fortunately, we
know how to construct such schemes efficiently so this is not a major concern
from a practical point of view (e.g., the Boneh-Franklin IBE scheme is
anonymous).</p>
<p><strong>Efficiency.</strong>
Search time for the server is <span class="math">\(O(nm)\)</span> since it has to try to decrypt each
ciphertext in the <span class="math">\({\sf EDB}\)</span>. Assuming <span class="math">\(m \ll n\)</span>, this is <span class="math">\(O(n)\)</span> which is a <em>a
lot</em> slower than the solution based on deterministic encryption described in the
<a href="http://outsourcedbits.org/2013/10/14/how-to-search-on-encrypted-data-part-2/">previous post</a>
which required time <span class="math">\(o(n)\)</span> (i.e., sub-linear in <span class="math">\(n\)</span>).</p>
<h2 id="is-this-secure">Is this Secure?</h2>
<p>While this approach is slower than the PPE-based approach, it has better security
properties. First, the encrypted database by itself does not reveal much useful
information to the server since---unlike the deterministic approach---keywords
are encrypted using a <em>randomized</em> (identity-based) encryption scheme. So
even if two documents have keywords in common, the encrypted keywords in <span class="math">\({\sf EDB}\)</span>
will be different. This means that we don't have to make unnatural assumptions
about the data (e.g., that it has high entropy) to use it safely.</p>
<p>There is an issue, however, with this approach: <em>it
does not protect the search terms</em>. In particular, the server could mount the
following attack to figure out which keyword the client is searching for.</p>
<p>Suppose the server has some dictionary <span class="math">\(W\)</span> of <span class="math">\(d\)</span> words. For each keyword <span class="math">\(w \in
W\)</span> it encrypts "1" with key <span class="math">\(mpk\)</span> and identity <span class="math">\(w\)</span>.
This results in a set of <span class="math">\(d\)</span> (identity-based) encryptions <span class="math">\((e'_1, \dots, e'_d)\)</span>.
Now, given some token <span class="math">\(sk_w\)</span>, the server can learn <span class="math">\(w\)</span> by simply trying to
decrypt each of the ciphertext <span class="math">\(e'_i\)</span> with <span class="math">\(sk_w\)</span>. If the decryption works for
some <span class="math">\(e'_i\)</span>, then the server knows that <span class="math">\(sk_w\)</span> is for the identity used to
generate <span class="math">\(e'_i\)</span>.</p>
<p>Notice that the attack does not result from a deficiency of any particular IBE
scheme but that it applies to <em>any</em> public-key encrypted search solution.
The fundamental problem is that the server has both the ability to create EDBs
(since it has the public-key) and to search over them. So what this tells us is
that, as defined, the notion of search on publicly-encrypted data cannot
protect search terms.</p>
<p>So what can we do about this? Recently, Boneh, Raghunathan and Segev
[<a href="http://eprint.iacr.org/2013/283.pdf">BRS13</a>] and Ariaga and Tang
[<a href="http://eprint.iacr.org/2013/330.pdf">AT13</a>] set out to design
public-key encrypted search solutions that achieved the best possible level of
confidentiality for search terms. Roughly speaking, what this means is that if
the search terms are hard enough to guess, then the schemes proposed will
protect them.</p>
<p>But what do we do if (as in most cases) our search terms are not hard to guess?
Well, we don't really have a good answer except that this problem does not occur
in the symmetric setting since only the client can generate EDBs so,
depending on the application, a symmetric solution might be preferable.</p>
<h2 id="conclusions">Conclusions</h2>
<p>So far, we' ve seen two approaches to searching on encrypted data. The first,
the
<a href="http://outsourcedbits.org/2013/10/14/how-to-search-on-encrypted-data-part-2/">PPE-based
approach</a>,
resulted in schemes with fast search (sub-linear in <span class="math">\(n\)</span>) but with relatively
weak security guarantees. The second, the FE-based approach, resulted in
schemes with slow search (linear in <span class="math">\(n\)</span>) but with better security guarantees.</p>
<p>In the next post, we'll go over solutions that are even slower, but that achieve
the strongest possible levels of security!</p>