Mpc on Outsourced Bits
http://senykam.github.io/tags/mpc/
Recent content in Mpc on Outsourced BitsHugo -- gohugo.ioen-usMon, 10 Mar 2014 12:05:03 -0300Restructuring the NSA Metadata Program
http://senykam.github.io/2014/03/10/restructuring-the-nsa-metadata-program
Mon, 10 Mar 2014 12:05:03 -0300http://senykam.github.io/2014/03/10/restructuring-the-nsa-metadata-program<p><img src="http://senykam.github.io/img/design.jpg" class="alignright" width="280">
I just got back from Barbados where I attended the <a href="http://fc14.ifca.ai/">Financial Cryptography and
Data Security</a> conference. It was a great event overall with many interesting
talks and two great workshops.</p>
<p>One workshop was on <a href="http://fc14.ifca.ai/bitcoin/index.html">Bitcoin</a> and was the most successful Financial Crypto
workshop in history! Though I haven't personally worked on Bitcoin, one of the
things I enjoyed most about the conference and workshops was the presence of
the Bitcoin community. The interaction between the academic and Bitcoin
communities led to some very interesting discussions and ideas. I really hope
the two communities keep interacting.</p>
<p>The other workshop was on <a href="https://www.dcsec.uni-hannover.de/4556.html">applied homomorphic
cryptography</a>. Homomorphic in the
context of this workshop is to be understood broadly and is meant to include
all cryptographic technologies that allow for some form of computation on
encrypted data. As such this includes secure multi-party computation and
encrypted search.</p>
<p>I was invited to give the keynote at this workshop and I chose to talk about
how to restructure the NSA metadata program. My slides are <a href="http://research.microsoft.com/en-us/um/people/senyk/slides/metacrypt.pdf">here</a>. They
describe---at a very high-level---a new design I refer to as MetaCrypt whose
goal is to enable some of the functionality the current NSA metadata program
supports but in a privacy-preserving manner. I first started thinking about
this problem in July 2013 when I wrote <a href="http://outsourcedbits.org/2013/07/23/are-compliance-and-privacy-always-at-odds/">this blog post</a>.</p>
<p>Since I only had one hour, there are many details missing in the talk. Also,
since this was a talk aimed at a general crypto audience, <sup class="footnote-ref" id="fnref:1"><a class="footnote" href="#fn:1">1</a></sup> I included a
variant of the protocol that is easy to describe as opposed to variants that
are perhaps more efficient and/or provide stronger security guarantees. The
details and alternative designs will appear later in an accompanying paper but
I hope that even this high-level description is interesting.</p>
<p><strong>Update:</strong> my talk and the MetaCrypt project was recently covered by MIT Tech
Review. See
<a href="http://www.technologyreview.com/news/526121/cryptography-could-add-privacy-protections-to-nsa-phone-surveillance/">here</a>
for the article.</p>
<div class="footnotes">
<hr>
<ol>
<li id="fn:1">The audience included people who focus on number-theoretic primitives, hardware crypto implementations, lattice-based cryptography etc. The ideas I described in the talk, however, required mostly background on secure multi-party computation, <a href="http://eprint.iacr.org/2006/210.pdf">searchable symmetric encryption</a> and especially <a href="http://eprint.iacr.org/2011/010.pdf">structured encryption</a>, which are more recent and not as well-known.<br>
<a class="footnote-return" href="#fnref:1">↩</a></li>
</ol>
</div>
Are Compliance and Privacy Always at Odds?
http://senykam.github.io/2013/07/23/are-compliance-and-privacy-always-at-odds
Tue, 23 Jul 2013 22:23:59 -0300http://senykam.github.io/2013/07/23/are-compliance-and-privacy-always-at-odds<p><img src="http://senykam.github.io/img/obey.jpg" class="alignright" width="250">
Chris Soghoian
<a href="https://twitter.com/csoghoian/status/358613839094362112">points</a> to an
interesting
<a href="http://http//online.wsj.com/article/SB10001424127887324448104578615881436052760.html">article</a>
in the Wall Street Journal. It describes mounting pressure on the NSA to
re-design its phone-data program---the program under which it compels
telecommunications companies (telcos) like Verizon to turn over their phone
record data.</p>
<p>In the article, Timothy Edgar, a former privacy lawyer who served in the Bush
and Obama administrations is quoted as saying:</p>
<blockquote>
<p>Privacy technology under development would allow for anonymous searches of
databases, keeping data out of government hands but also preventing phone
companies from learning the purpose of NSA searches. Overhauling the
surveillance program would provide a reason to speed up the technology's
deployment.</p>
</blockquote>
<p>So this motivates the following interesting technical question:
<em>how would one design such a privacy-preserving phone-data program exactly?</em></p>
<p>The first thing we need is that the telcos keep their data, as opposed to
sending it all to the NSA. The issue with such an approach, of course, is that
the NSA would have to disclose its queries to the telco in order to retrieve
any information---which for obvious reasons is not going to happen.</p>
<p>So what we need is a mechanism with which the telcos can keep their data and
the NSA can access it without disclosing its queries. This might sound
impossible, but it turns out we've known how to do this (in theory at least)
for over <em>15</em> years!</p>
<h2 id="private-information-retrieval">Private Information Retrieval</h2>
<p>One answer to this problem could be to use something called <a href="http://en.wikipedia.org/wiki/Private_information_retrieval">private
information
retrieval</a> (PIR).
With PIR, a client can retrieve information from a server <em>without the server
learning anything about which item is being retrieved</em>. Standard PIR protocols
only allow the client to retrieve information by memory location but there are
more sophisticated variants that also support retrieval based on
<a href="http://eprint.iacr.org/1998/003">keywords</a>.</p>
<p>PIR was first introduced in 1995 in a
<a href="http://people.csail.mit.edu/madhu/papers/1995/pir-journ.pdf">paper</a> by Chor,
Kushilevitz, Goldreich and Sudan. Initially, PIR only worked if the data could
be stored on two (or more) servers that could not collude. In a breakthrough
paper, Kushilevitz and Ostrovsky showed in 1997 that PIR could be achieved even
with a single server. Since then, there has been a lot of work and many
advances on PIR and, recently, Ian Goldberg from the University of Waterloo and
his students have been trying to make PIR practical (improving both efficiency
and functionality). If you are interested in this topic (especially in the
practical aspects) I highly recommend the thesis of
<a href="http://uwspace.uwaterloo.ca/bitstream/10012/6142/1/Olumofin_Femi.pdf">Olumofin</a>.</p>
<p>So a simple idea to solve our problem is to have the telco keep its data and to
have the NSA query it through a PIR protocol. While this might seem like a good
solution, there are two important problems.</p>
<p>The first is that while PIR will protect the query of the NSA (i.e., the telco will not learn anything about the query) it will not necessarily protect the telco's dataset from the NSA; that is, the NSA could learn information about individuals that are not included in its query.</p>
<p>The second problem is that the telco has no way of knowing if the NSA' s query is legitimate. What if the NSA keeps submitting queries indiscriminately and eventually just learns the entire database? How does the telco know whether a particular query is even legal?</p>
<p>Fortunately, both problems can be addressed!</p>
<h2 id="oblivious-transfer">Oblivious Transfer</h2>
<p>To handle the first problem, we need a stronger form of PIR called <a href="http://en.wikipedia.org/wiki/Oblivious_transfer">oblivious
transfer</a> (OT). With an OT
protocol, a client can select an item from a server's dataset while maintaining
the following guarantees: (1) the server learns
nothing about the client's query; and (2) the
client learns nothing about the items it does not query. So unlike PIR, OT
protects both parties; which is why it is sometimes called symmetric PIR.</p>
<p>Like PIR, standard OT protocols only allow clients to retrieve items by their
location in memory so, in practice, we would prefer to use a keyword-based OT;
that is, an OT protocol where items can be labeled with keywords and where the
clients can retrieve them based on search terms. Fortunately, we already know
how to design such protocols. The first keyword OT is due to Ogata and Kurosawa
(see this <a href="http://seculab.cis.ibaraki.ac.jp/~kurosawa/2004/OKS.pdf">paper</a>) but
their scheme does not scale very well (each query would require the NSA to do
work that is linear in the size of the dataset). A more efficient approach is
due to Freedman, Ishai, Pinkas and Reingold and is described in this
<a href="https://www.cs.princeton.edu/~mfreed/docs/FIPR05-ks.pdf">paper</a>.</p>
<h2 id="keyword-ot">Keyword OT</h2>
<p>The high-level idea of Freedman et al.'s keyword OT is as follows. As before,
the server is the telco and the client is the NSA. Suppose the telco's dataset
consists of <span class="math">\(n\)</span> pairs <span class="math">\((w_1, d_1), \dots, (w_n, d_n)\)</span>, where <span class="math">\(w_i\)</span> is a keyword
and <span class="math">\(d_i\)</span> is some data associated to <span class="math">\(w_i\)</span>. In practice, the keywords could be
names and the data could be phone, address, etc. The telco starts by encrypting
this dataset by replacing each pair <span class="math">\((w_i, d_i)\)</span> by a label/ciphertext pair
<span class="math">\((\ell_i, d_i \oplus p_i)\)</span>, where the label <span class="math">\(\ell_i\)</span> and the pad <span class="math">\(p_i\)</span> are
(pseudo-)random strings generated from <span class="math">\(w_i\)</span> using a pseudo-random function
with a secret key <span class="math">\(K\)</span>. More formally, we would write that for all <span class="math">\(i\)</span>,</p>
<p><span class="math">\[
F_K(w_i) = (\ell_i, p_i),
\]</span></p>
<p>where <span class="math">\(F\)</span> is the PRF. A PRF is sort of like a keyed
hash. <sup class="footnote-ref" id="fnref:1"><a class="footnote" href="#fn:1">1</a></sup> The main property of PRFs is that if we evaluate them with a random
key <span class="math">\(K\)</span> on any input, they output a random looking
string.</p>
<p>Note that this new encrypted dataset reveals no information about the real
dataset since the <span class="math">\(\ell_i\)</span> values are pseudo-random
(and therefore effectively independent of the
<span class="math">\(w_i\)</span>'s) and because the ciphertexts <span class="math">\(d_i\oplus
p_i\)</span> are effectively one-time pad (OTP) encryptions of the
<span class="math">\(d_i\)</span>'s. <sup class="footnote-ref" id="fnref:2"><a class="footnote" href="#fn:2">2</a></sup> The telco now sends this encrypted
dataset to the NSA who stores it. Remember: it reveals no information
whatsoever about the real dataset so this is OK!</p>
<p>Now suppose the NSA needs to lookup information related to some keyword <span class="math">\(w\)</span> and
remember that the encrypted dataset it holds consists of labels <span class="math">\(\ell_i\)</span> and
ciphertexts <span class="math">\(d_i \oplus p_i\)</span>. To extract the information it needs from the
encrypted dataset, it therefore needs to figure out: (1) the label for keyword
<span class="math">\(w\)</span> (so it can lookup the appropriate OTP ciphertext); and (2) the pad <span class="math">\(p_i\)</span>
used in the associated ciphertext.</p>
<p>Of course the NSA cannot do this on its own because it does not know the
telco's secret key <span class="math">\(K\)</span> for the PRF used to generate these items. But we have a
problem. If the NSA sends its keyword w to the telco so that the latter
computes and returns <span class="math">\(F_K(w)\)</span>, the telco will learn the keyword. And if the
telco sends its key <span class="math">\(K\)</span> to the NSA so that it computes <span class="math">\(F_K(w)\)</span> on its own, the
NSA will be able to decrypt the entire dataset.</p>
<p>The solution here is to use another amazing cryptographic technology called
<a href="http://en.wikipedia.org/wiki/Secure_multi-party_computation#Two-party_computation">secure two-party
computation</a>
(2PC). I won't try to explain how 2PC works but if you are interested a good
place to start is the <a href="http://mpclounge.au.dk/">MPC Lounge</a>. The important
thing to know about 2PC is that we can use it to solve our problem. In other
words, the telco and the NSA can execute a 2PC protocol that will result in the
NSA learning <span class="math">\(F_K(w)\)</span> and therefore the label and the pad for <span class="math">\(w\)</span>, without it
learning anything about the telco's key and without the telco learning anything
about <span class="math">\(w\)</span> <sup class="footnote-ref" id="fnref:3"><a class="footnote" href="#fn:3">3</a></sup>.</p>
<h2 id="authorized-queries">Authorized Queries</h2>
<p>Now on to the second problem: how does the telco know if the NSA' s query is
legitimate? To address this we first need to incorporate an extra party into
our model that has the power to decide if an NSA query is legitimate or not. In
practice, this would be the <a href="http://en.wikipedia.org/wiki/United_States_Foreign_Intelligence_Surveillance_Court">FISA
court</a>
<sup class="footnote-ref" id="fnref:4"><a class="footnote" href="#fn:4">4</a></sup> and we' ll assume this court can digitally sign, i.e., it has a secret
signing key and a public verification key that is known to the telco.</p>
<p>Now suppose the NSA wants to retrieve information about a user Alice from the
telco. It first sends its query to the court. If the court approves the query,
it signs it and returns the signature to the NSA. At this point, we only need
to make a small change to the protocol described above. Instead of executing a
2PC that evaluates the PRF so as to generate a label and pad for the NSA's
query; the parties will execute a 2PC that first verifies the court's signature
and then (if the signature checks out) evaluates the PRF (i.e., generates the
label and pad for the keyword). The properties of the 2PC will hide the
signature and the keyword from the telco, and the secret key
<span class="math">\(K\)</span> from the NSA. <sup class="footnote-ref" id="fnref:5"><a class="footnote" href="#fn:5">5</a></sup></p>
<h2 id="is-this-really-possible">Is this really possible?</h2>
<p>The design described above is possible in theory. But of course the interesting
question is whether something like this could be used in practice.</p>
<p>I don't really know how large telco datasets are but I would guess on the order
of hundreds of millions of users. Encrypting such a dataset and sending it to
the NSA would be expensive but definitely possible as the encryption process
here would consist of relatively cheap operations like PRF evaluations and
XORs. The query stage, however would be very inefficient due to the execution
of the 2PC protocol. But if we look at things carefully, the bottlenecks would
likely be (1) the verification of the signature (due to the complexity of
signature verification); and (2) the generation of the pads (since they have to
be as long as the data they will be XORed with).</p>
<p>Fortunately there are a few things we can do to mitigate these problems.
Instead of using a signature scheme, we could use a message authentication code
(MAC). This would require the court to share a secret key with the telco but
this doesn't seem like such a severe requirement. MACs are much simpler
computationally than signatures so the 2PC verification would be much faster
<sup class="footnote-ref" id="fnref:6"><a class="footnote" href="#fn:6">6</a></sup>.</p>
<p>With respect to the length of the pads, we could use the PRF to generate a
short string instead (say 128 bits long) and use
that as a seed to a pseudo-random generator to generate a larger pad. This
would change how the telco and NSA encrypt and decrypt items of the dataset but
it is a minor change that would not effect the efficiency of encryption and
decryption much.</p>
<p>With these changes, the 2PC would only have to compute two PRF evaluations and
one equality check which is definitely within practical reach.</p>
<p><strong>Update:</strong> For a high-level description of the protocol I designed in this
post see
<a href="http://boingboing.net/2014/03/01/trustycon-how-to-redesign-nsa.html">this</a>
great talk by Ed Felten.</p>
<p><em>Thanks to Matt Green and Payman Mohassel for comments on a draft of this post
and to Chris Soghoian for motivating me to think about this problem.</em></p>
<div class="footnotes">
<hr>
<ol>
<li id="fn:1">PRFs are like keyed hash functions only in idealized models like the random oracle model.
<a class="footnote-return" href="#fnref:1">↩</a></li>
<li id="fn:2">Technically, since the labels and pads are pseudo-random (as opposed to random), <span class="math">\(\ell_i\)</span> is not independent of <span class="math">\(w_i\)</span> and <span class="math">\(d_i \oplus p_i\)</span> is not a one-time pad. More precisely, <span class="math">\(\ell_i\)</span> and <span class="math">\(d_i \oplus p_i\)</span> reveal no partial information about <span class="math">\(w_i\)</span> and <span class="math">\(d_i\)</span> to a computationally-bounded adversary.
<a class="footnote-return" href="#fnref:2">↩</a></li>
<li id="fn:3">Protocols that evaluate PRFs in this manner are usually called oblivious PRF (OPRF) protocols. The 2PC-based OPRF protocol is the simplest to understand conceptually but we know of more efficient OPRF protocols not based on 2PC (e.g., the Freedman et al. paper describes one such construction).
<a class="footnote-return" href="#fnref:3">↩</a></li>
<li id="fn:4">There is debate as to whether the FISA court exercises proper oversight over the NSA or not (for example see <a href="http://www.nytimes.com/2013/07/26/us/politics/robertss-picks-reshaping-secret-surveillance-court.html?_r=0">this article</a> from the New York Times), but for the purpose of this exercise we'll just assume that it does.
<a class="footnote-return" href="#fnref:4">↩</a></li>
<li id="fn:5">The reason we also need to hide the signature from the telco is that signatures can leak information about their message.
<a class="footnote-return" href="#fnref:5">↩</a></li>
<li id="fn:6">Here we also assume the data is hashed with a collision-resistant hash function before being MACed.
<a class="footnote-return" href="#fnref:6">↩</a></li>
</ol>
</div>
Applying Fully Homomorphic Encryption (Part 2)
http://senykam.github.io/2012/09/29/applying-fully-homomorphic-encryption-part-2
Sat, 29 Sep 2012 16:59:14 -0300http://senykam.github.io/2012/09/29/applying-fully-homomorphic-encryption-part-2<p><em>This is the second part of a series on applying fully-homomorphic encryption.
In the
<a href="http://outsourcedbits.org/2012/06/26/applying-fully-homomorphic-encryption-part-1/">first</a>
post we went over what fully-homomorphic encryption (FHE) and
shomewhat-homomorphic encryption (SHE) were and how they relate. In this post
we' ll discuss actual applications.</em></p>
<p><img src="http://senykam.github.io/img/grail.jpg" class="alignright" width="250">
To structure the discussion, I' ll refer to some applications as direct and
others as indirect. Indirect applications will refer to applications where FHE
is used as a building block---usually with other components---to construct
something else of interest. Direct applications, on the other hand, will refer
to applications where FHE is used (almost) "as-is". These are coarse and
imprecise distinctions and are not particularly meaningful but they will be
useful for organizational purposes. Roughly speaking, you can think about an
indirect application as something mostly cryptographers would be excited about
and a direct application as something everyone else might be excited about.</p>
<p>Quoting
<a href="http://windowsontheory.org/2012/05/02/building-the-swiss-army-knife/">Barak and Brakerski</a>, FHE is viewed by cryptographers as a swiss-army knife. This is
because FHE has all kinds of indirect applications and can be used to construct and
improve many cryptographic systems ranging from relatively simple
things like encryption schemes to more complex things like
<a href="http://en.wikipedia.org/wiki/Secure_multi-party_computation">secure multi-party computation</a>
(MPC) protocols <sup class="footnote-ref" id="fnref:1"><a class="footnote" href="#fn:1">1</a></sup>. Gentry's thesis provides a good
overview of the indirect applications of FHE, including to the design of
<a href="http://www.mit.edu/~rothblum/papers/otp.pdf">one-time programs</a>,
<a href="http://www.cs.ucla.edu/~rafail/PUBLIC/Ostrovsky-Skeith.html">public-key obfuscation</a>,
<a href="http://en.wikipedia.org/wiki/Proxy_re-encryption">proxy re-encryption</a>,
<a href="http://www.cs.ucla.edu/~rafail/PUBLIC/09.pdf">software protection</a>
and secure multi-party computation.</p>
<p>Here I' ll only cover a few possible applications of FHE. The first one is an
indirect application of FHE to MPC while the rest will be direct applications.</p>
<h2 id="secure-multiparty-computation">Secure Multi-Party Computation</h2>
<p>An MPC protocol allows mutually distrustful parties to cooperate securely. By
this I mean that the parties---each with private data---can use a MPC protocol
to evaluate any function <span class="math">\(f\)</span> over their <em>joint</em> data without having to share
the data with each other. Note that MPC can achieve this without hardware and
without trusted third parties. MPC is a very general cryptographic technology
and you can frame a huge number of security and privacy problems as instances
of MPC. When trying to convey how general and useful it is I usually say that,
roughly speaking, MPC is useful in any situation where you might think of using
an NDA. Of course it is <em>much</em> more useful than NDAs because the parties will
never see any of the data!</p>
<p>MPC is a very active area of research that started in the 80's and it is still
going strong. In fact, recent work suggests that it could soon be practical.
I would even venture to say that over the last few years, MPC has steadily
moved from an area of theoretical cryptography research to an area of
<em>applied</em> cryptography research.</p>
<p>So what does this have to do with FHE? It turns out that using FHE one can
design very efficient MPC protocols (asymptotically). The protocols require
little communication and interaction (the number of communication rounds between
parties). But to see why this is the case, we first have to understand a little
bit of how MPC protocols are typically designed.</p>
<p>Most protocols are based on (abstract)
<a href="http://en.wikipedia.org/wiki/Circuit_(computer_theory)">circuits</a> as opposed
to the more familiar computational models like <a href="http://en.wikipedia.org/wiki/Random-access_machine">random-access
machines</a> (this is similar
to how FHE works as discussed in the previous post). To use these protocols,
the parties first construct a circuit that evaluates the function they wish to
compute. They then use various techniques to jointly evaluate this circuit on
their joint inputs. So, for example, if we consider the case of two-party
computation where Alice and Bob wish to compute a function <span class="math">\(f\)</span> on their
respective inputs <span class="math">\(x\)</span> and <span class="math">\(y\)</span>, they first construct a circuit <span class="math">\(C\)</span> that
evaluates <span class="math">\(f\)</span> on two inputs and then run the MPC protocol in order to securely
compute <span class="math">\(f(x, y)\)</span>.</p>
<p>By secure here what I mean is that Alice will not learn any information about
<span class="math">\(y\)</span> and Bob will not learn any information about <span class="math">\(x\)</span> <sup class="footnote-ref" id="fnref:2"><a class="footnote" href="#fn:2">2</a></sup>. Now, there are many
ways of performing this secure computation but they all have the following
characteristics: either (1) they require Alice and Bob to send <span class="math">\(\Omega(|C|)\)</span>
bits to each other, where <span class="math">\(|C|\)</span> denotes the size of <span class="math">\(C\)</span>, i.e., its total number
of gates; or Alice and Bob will need <span class="math">\(\Omega(|C|)\)</span> rounds of communication.
This matters because when we' re working with circuits, the size of a circuit
reflects the complexity of the function it computes. In other words, for
"complicated" functions, Alice and Bob will have to exchange <em>a lot</em> of
data and/or interact a large number of times.</p>
<p>With FHE, on the other hand, it is very easy to construct a MPC protocol without
these limitations. To do this, Alice just needs to generate a public/private
key pair for the FHE scheme. She then encrypts her input <span class="math">\(x\)</span> and sends the
resulting ciphertext <span class="math">\(\)</span>c<em>x = E</em>{pk}(x)<span class="math">\(\)</span> to Bob. Bob encrypts his input <span class="math">\(y\)</span>,
resulting in a ciphertext
<span class="math">\(
c_y = E_{pk}(y)
\)</span>
and evaluates the circuit <span class="math">\(C\)</span> on
the encryptions on <span class="math">\(c_x\)</span> and <span class="math">\(c_y\)</span>, resulting in an encryption <span class="math">\(c^\star\)</span> of
<span class="math">\(f(x, y)\)</span>. Bob then sends <span class="math">\(c^\star\)</span> back to Alice who decrypts it and sends the
result back to Bob <sup class="footnote-ref" id="fnref:3"><a class="footnote" href="#fn:3">3</a></sup>.</p>
<p>The total amount of data exchanged between Alice and Bob in this protocol is
<span class="math">\(
|c_x| + |c^\star| + |f(x,y)|,
\)</span>
where <span class="math">\(|c_x|\)</span>, <span class="math">\(|c^\star|\)</span> and <span class="math">\(|f(x,y)|\)</span> refer to the bit length of <span class="math">\(c_x\)</span>,
<span class="math">\(c^\star\)</span> and <span class="math">\(f(x,y)\)</span>, respectively. What's important to note here is that
this is <em>independent</em> of the size/complexity of the circuit <span class="math">\(C\)</span>! Also,
the total number of rounds needed is <span class="math">\(1.5\)</span>, i.e., one round of communication
plus one message. So, in other words, the FHE-based MPC protocol will have the
same efficiency in terms of data exchanged and rounds, no matter how
complicated the function is.</p>
<p>Another nice application of FHE in the context of MPC is that one can use it to
design server-aided (or cloud-assisted) MPC protocols. A server-aided MPC
protocol is like a regular MPC protocol except that the parties can make use of
an untrusted party to outsource some of their computations. The point here is to
decrease the computational burden of the parties at the expense of the server
but, of course, without having to trust it.</p>
<p>This can greatly improve the efficiency of MPC as explored recently in a
<a href="http://eprint.iacr.org/2012/542">paper</a> I co-authored with Mohassel and Riva.
But as shown by Asharov et al in this
<a href="http://delta-apache-vm.cs.tau.ac.il/~tromer/papers/tfhe-mpc.pdf">paper</a>, if
one uses FHE (plus some additional machinery) it is possible to design
server-aided MPC protocols that are even more efficient from an asymptotic
point of view (but unfortunately not from a concrete/practical point of view).</p>
<h2 id="delegated-computation">Delegated Computation</h2>
<p>Of course, the most direct application of FHE is to the problem of outsourced
computation. Here, Alice wants to evaluate a function <span class="math">\(f\)</span> over her input <span class="math">\(x\)</span> but
she doesn't have enough resources to do the evaluation herself. Because of this
she wishes to outsource the computation to another party but she doesn't trust
that party with her data.</p>
<p>FHE provides a perfect solution to this problem. Alice encrypts <span class="math">\(x\)</span> and sends
the ciphertext to the server who evaluates <span class="math">\(f\)</span> on it and returns an encryption
of <span class="math">\(f(x)\)</span>. Alice can then decrypt it to recover <span class="math">\(f(x)\)</span>. Note that similarly to
the case of MPC this approach is only secure in the semi-honest model, where we
assume the server will indeed evaluate the function <span class="math">\(f\)</span>. What happens if the
server evaluates some other function <span class="math">\(f' \neq f\)</span>? Fortunately, there are ways of
handling this problem.</p>
<h2 id="search-on-encrypted-data">Search on Encrypted Data</h2>
<p>Another application of FHE that is often cited is to the problem of searching on
encrypted data. This is a special case of the delegated computation problem
mentioned above where the client just wants to search through an encrypted file
collection stored at the server. The idea is that Alice stores an encryption of
her dataset (e.g., a collection of emails) on the server. Whenever she wants to
search over her data, she sends an encryption of her keyword and the server
homomorphically evaluates a search algorithm on the encrypted data and keyword.</p>
<p>While this obviously works, as far as I know, all the ways of using FHE to
search on encrypted data require linear time in the length of the data. In other
words, the keyword will have to be checked (homomorphically) against every word
in the dataset. In practice, of course, linear-time search is inconceivable for
many practical scenarios since more often than not search is a latency-sensitive
operation. Just imagine if a search engine or a desktop search application used
linear time search.
In future posts, we' ll see how one can achieve <em>sub-linear</em> and even
<a href="http://en.wikipedia.org/wiki/Output-sensitive_algorithm">output sensitive</a>
time search over encrypted data (albeit with a weaker security guarantee).</p>
<p>Since Gentry proposed his construction and blueprint in 2009, there has been a
huge effort to make FHE more practical. While a lot of progress has been made,
unfortunately, we' re still some way from truly practical FHE. So the natural
question is: "where are the bottlenecks?".</p>
<p>As we discussed in the last post, most FHE schemes are based on Gentry's
blueprint which consists of first constructing a SHE and then using Gentry's
bootstrapping technique to turn it into a FHE scheme. It turns out that
bootstrapping is a major bottleneck and that SHE is actually reasonably
efficient. So if we care about practical applications, then it may be worthwhile
to explore what exactly we can do with SHE instead.</p>
<h2 id="private-healthcare-and-online-ads">Private Healthcare and Online Ads</h2>
<p>This question was explored recently by Lauter, Naehrig and Vaikuntanathan in a
<a href="http://eprint.iacr.org/2011/405">paper</a> where they consider various
classes of applications and argue that SHE is enough for many of them. The first
class is a multi-source and single-reader scenario where: (1) encrypted data is
uploaded to the cloud by many different entities; (2) the cloud does some
computation over the data; and (3) the encrypted answer is returned to the data
owner/reader. One example of such a scenario is in healthcare where medical data
pertaining to a given patient is sent to the cloud by many sources. These
sources could be, for example, doctors or medical devices owned by the patient.
All the data is sent encrypted and the cloud processes it homomrphically (e.g.,
it could run some statistical algorithm or classifier) and return the encrypted
answer to the patient.</p>
<p>Another example explored in the paper is online ads. The idea here is that your
mobile device could use SHE to encrypt and send private information to the cloud
about your location, browsing history, emails etc. The cloud would store a set
of ads encrypted under your key and then homomorphically run targeting ad
algorithm on your data and the ads to figure out which ads to display to you.
The result would be an encrypted ad that it could then return to you. Because
your data is encrypted, the cloud will obviously never see your emails, location
etc.</p>
<h2 id="machine-learning-on-encrypted-data">Machine Learning on Encrypted Data</h2>
<p>In a more recent <a href="http://eprint.iacr.org/2012/323">paper</a>, Graepel, Lauter and
Naehrig study the more specific problem of machine learning over encrypted data.
The setting here is the same as the healthcare scenario described above (i.e.,
the multi-source single-reader one) where we have doctors or perhaps medical
devices sending encrypted data pertaining to a patient, say Alice, to the
cloud. Here, Graepel et al. consider the setting of
<a href="http://en.wikipedia.org/wiki/Supervised_learning">supervised</a> learning} where,
given a set of labeled training data, we want to derive a function that will
accurately label future data, i.e., a classifier. As one can imagine,
supervised learning has many applications including to bioinformatics, spam
detection and speech recognition just to name a few.</p>
<p>So in this setting, the data sources (i.e., doctors or medical devices) send
encrypted labeled data to the cloud under Alice's public key. The cloud then
runs a machine learning algorithm homomorphically over all the data. This
results in an encrypted classifier which Alice can now use in the following way.
Whenever Alice wants to classify some data, she encrypts it under her public key
and sends it to the cloud which evaluates the classifier on her data
homomorphically, resulting in an encrypted labeling of her data. The cloud
returns this to Alice who can then decrypt it using her secret key.</p>
<p>The high level idea is sound, but remember that in this work the authors are
only interested in using SHE since it is more practical than FHE. Unfortunately,
this turns out to be a serious limitation as it requires the algorithms to be
low-degree polynomials and many are not. What is meant here is that if one views
the learning and classifying algorithms as a single function, then that function
has to be a low-degree polynomial.</p>
<p>This is pretty restrictive, but Graepel et al. still manage to show how one can
homomorphically evaluate interesting and useful machine learning algorithms
under this constraint. One example is <a href="http://en.wikipedia.org/wiki/Linear_discriminant_analysis">Fisher's linear
discrimant</a>
classifier <sup class="footnote-ref" id="fnref:4"><a class="footnote" href="#fn:4">4</a></sup>. As suggested in the paper, this this poses an interesting
challenge for machine learning research: "is it possible to design good machine
learning algorithms that can be computed using low-degree polynomials".</p>
<p>That's it for applications. In the next and final post in the series we' ll
discuss the limitations of FHE and SHE and some alternatives.</p>
<div class="footnotes">
<hr>
<ol>
<li id="fn:1">By simple and complex, here I am not referring to the complexity of constructing the objects (e.g., many encryption schemes, FHE included, are very difficult to construct) but to the complexity of the object's functionality or "behavior"
<a class="footnote-return" href="#fnref:1">↩</a></li>
<li id="fn:2">Technically this is not true because the fact that Alice and Bob learn <span class="math">\(f(x,y)\)</span> means that they learn something about each other's input---namely they learn <span class="math">\(f(x,y)\)</span>. But for the purposes of MPC this is not a problem because it is the best we can do. Indeed, if the goal is to construct a protocol where Alice and Bob will learn <span class="math">\(f(x,y)\)</span> then we cannot do so without them learning <span class="math">\(f(x,y)\)</span>. So when we talk about the security of MPC, we usually say that Alice and Bob do not learn anything about each other's input that they cannot learn from <span class="math">\(f(x,y)\)</span>
<a class="footnote-return" href="#fnref:2">↩</a></li>
<li id="fn:3">I should stress that, as described, the protocol is only secure in a very restricted adversarial model known as the <em>semi-honest</em> model, where it is assumed that all the parties will follow the protocol but will try to learn whatever they can from the execution. This may seem like a weak and strange adversarial model at first but it is reasonable to consider for the following reasons. First, a protocol that is secure in this model can be safely used in any situation where the adversary is passive, i.e., where it only sees transcripts of the messages sent between parties. One could imagine an adversary that tries to recover information about the parties' inputs after the protocol was executed, for example, from a log of the messages. Second, there are general techniques that can transform any protocol that is secure in this model into a protocol that is secure in stronger and more natural models (e.g., where the adversaries does not have to follow the protocol).
<a class="footnote-return" href="#fnref:3">↩</a></li>
<li id="fn:4">A note that SHE cannot handle division which may seem necessary for the classifiers considered in this work. Graepel et al., however, get around this by observing that technically the classifiers work just as well if the values are multiplied through with the denominator. <strong>Note</strong>: in a previous version of this post I erroneously claimed that the client had to do the division.<br>
<a class="footnote-return" href="#fnref:4">↩</a></li>
</ol>
</div>