I am a faculty member at the School of Computer Science
at Ben Gurion University.

My main research interests are Error Correcting Codes, Interactive Communication and Computational
Complexity.

Phone:+972-3-6407996

Email: [first] [at] bgu.ac.il

Mail: Klim Efremenko, Ben-Gurion University P.O. Box 653 Beer-Sheva 84105, Israel

2015-2017 I was post-doc at Tel-Aviv University where I work with Amir Shpilka

I was a fellow at Simons Institute at Berekeley during 2015.

Before that I was a Simons Fellow at University of Chicago

And a member at Institute for Advanced Study in the group of Avi Wigderson I graduated with a Ph.D. in Computer Science in 2012 from the Tel-Aviv University, where I was advised by Prof. Amnon Ta-Shma and Oded Regev.

[2018-2022] Israel Science Foundation (ISF) -- Individual research grant.

## Klim Efremenko and Gillat Kol and Raghuvansh Saxena

Interactive coding over the noisy broadcast channel

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing(STOC 2018)

[Abstract] [Paper: PDF]## Klim Efremenko and Garg, Ankit and Oliveira, Rafael and Wigderson, Avi

Barriers for rank methods in arithmetic complexity

Innovations in Theoretical Computer Science Conference (ITCS 2018)

[Abstract] [Paper: PDF]## Klim Efremenko and Elad Haramaty and Yael Kalai

Interactive Coding with Nearly Optimal Round and Communication Blowup

[Abstract] [Paper: PDF]## Ankit Singh Rawat and Itzhak Tamo and Venkatesan Guruswam and Klim Efremenko

MDS Code Constructions With Small Sub-Packetization and Near-Optimal Repair Bandwidth

IEEE Transactions on Information Theory 2018

## Mark Braverman Klim Efremenko and Ran Gelles and Bernhard Haeupler

Constant-rate coding for multiparty interactive communication is impossible.

Journal of the ACM 2017, Symposium on Theory of Computing (STOC) 2016,

[Abstract] [Paper: PDF]## Noga Alon Mark Braverman Klim Efremenko and Ran Gelles and Bernhard Haeupler

Reliable Communication over Highly Connected Noisy Networks

invited to Distributed Computing (DIST) special issue of Principles of Distributed Computing (PODC) 2016

[Abstract] [Paper: PDF]## Noga Alon Klim Efremenko and Benny Sudakov

Testing Equality in Communication Graphs.

IEEE Information Theory 2017

[Abstract] [Paper: PDF]## Klim Efremekno, Joseph Landsberg, Hal Schenck and Jerzy Weyman

The method of shifted partial derivatives cannot separate the permanent from the determinant

Mathematics of Computation 2018

[Abstract] [Paper: PDF]## Klim Efremekno, Joseph Landsberg, Hal Schenck and Jerzy Weyman

On minimal free resolutions and the method of shifted partial derivatives in complexity theory.

Journal of Algebra 2018

[Abstract] [Paper: PDF]## Mark Braverman and Klim Efremenko

List and Unique Coding for Interactive Communication in the Presence of Adversarial Noise

Special issue of the SIAM Journal of Computing for FOCS 2014.

[Abstract] [Paper: PDF]## Klim Efremenko and Ran Gelles and Bernhard Haeupler

Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback

IEEE Transactions on Information Theory 2016, (appeared in ITCS 2015)

[Abstract] [Paper: PDF]## Klim Efremenko

From Irreducible Representations to Locally Decodable Codes.

The 44st ACM Symposium on Theory of Computing, STOC 2012

[Abstract] [BiBTeX] [Paper: PDF]## Avraham Ben-Aroya, Klim Efremenko and Amnon Ta-Shma

Local List Decoding with a Constant Number of Queries.

51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010

[Abstract] [BiBTeX] [Paper: PDF]## Avraham Ben-Aroya, Klim Efremenko and Amnon Ta-Shma

A Note on Amplifying the Error-Tolerance of Locally Decodable Codes.

Electronic Colloquium on Computational Complexity (ECCC) 17: 134 (2010)

[Abstract] [BiBTeX] [Paper: PDF]## Klim Efremenko

3-Query Locally Decodable Codes of Subexponential Length

STOC 2009 Special Issue SIAM Journal on Computing

[Abstract] [BiBTeX] [Paper: PDF]## Klim Efremenko and Omer Reingold

How Well Do Random Walks Parallelize?

APPROX-RANDOM, 2009

[Abstract] [BiBTeX] [Paper: PDF]## Raffaell Clifford , Klim Efremenko, Ely Porat and Amir Rothschild

From Coding Theory to Efficient Pattern Matching

20 nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2009

[Abstract] [BiBTeX] [Paper: PDF]## Klim Efremenko and Ely Porat

Approximating General Metric Distances Between a Pattern and a Text

19nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA),2008

[Abstract] [BiBTeX] [Paper: PDF]## Raffaell Clifford , Klim Efremenko, Ely Porat and Amir Rothschild

Pattern Matching with Don't Cares and Few Errors

Journal of Computer and System Sciences (JCSS), 2010

[Abstract] [BiBTeX] [Paper: PDF]## Raffaell Clifford , Klim Efremenko, Benny Porat and Ely Porat

A Black Box for Online Approximate Pattern Matching

Information and Computation, 2011

[Abstract] [BiBTeX] [Paper: PDF]## Raffaell Clifford , Klim Efremenko, Benny Porat, Ely Porat and Amir Rothschild

Mismatch sampling

Information and Computation, 2012

[Abstract] [BiBTeX] [Paper: PDF]

In this paper we show first constant rate coding scheme for a noisy broadcast model
defined by El Gamal in 1984. In this model a set of $n$ players, each holding a private input bit, communicate over a noisy broadcast channel. Their mutual goal is for all players to learn all inputs. At each round one of the players broadcasts a bit to all the other players, and the bit received by each player is flipped with a fixed constant probability (independently for each recipient). How many rounds are needed?
The best know protocol before our work was given in 1988 by Gallager, who gave an elegant noise-resistant protocol
requiring only $\mathcal{O}(n \log \log n)$ rounds. We show that $O(n)$ rounds is suffice. Moreover we generalized the above result and initiate the study of interactive coding over the noisy broadcast channel.
We show that any interactive protocol that works over the noiseless broadcast channel can be simulated over our restrictive noisy broadcast model with only constant, (independent of number of players), blowup of the communication.
In the paper by Goyal, Kindler, and Saks in 2008 in~\cite{GKS08} it was shown that Gallager's algorithm is essentially tight for a restrictive (non-adaptive) model of communication\footnote{They do not write it explicitly, but in there proofs they rely on it}. Before our work this restriction does not considered a significant restriction.
In this work we show that this is not a case and adaptivity can significantly improve the rate of communication.

{\em Arithmetic complexity}, the study of the cost of computing polynomials via additions and multiplications, is considered (for many good reasons) simpler to understand than {\em Boolean complexity}, namely computing Boolean functions via logical gates. And indeed, we seem to have significantly more lower bound techniques and results in arithmetic complexity than in Boolean complexity. Despite many successes and rapid progress, however, foundational challenges, like proving super-polynomial lower bounds on circuit or formula size for explicit polynomials, or super-linear lower bounds on explicit 3-dimensional tensors, remain elusive.
At the same time (and possibly for similar reasons), we have plenty more excuses, in the form of ``barrier results'' for failing to prove basic lower bounds in Boolean complexity than in arithmetic complexity. Efforts to find barriers to arithmetic lower bound techniques seem harder, and despite some attempts we have no excuses of similar quality for these failures in arithmetic complexity. This paper aims to add to this study.
In this paper we address {\em rank methods}, which were long recognized as encompassing and abstracting almost all known arithmetic lower bounds to-date, including the most recent impressive successes. Rank methods (under the name of {\em flattenings}) are also in wide use in algebraic geometry for proving tensor rank and symmetric tensor rank lower bounds. Our main results are barriers to these methods. In particular,
\begin{itemize}
\item Rank methods {\em cannot} prove better than $\Omega_d (n^{\lfloor d/2 \rfloor})$ lower bound on the tensor rank of {\em any} $d$-dimensional tensor of side $n$. (In particular, they cannot prove super-linear, indeed even $>8n$ tensor rank lower bounds for {\em any} 3-dimensional tensors.)
\item Rank methods {\em cannot} prove $\Omega_d (n^{\lfloor d/2 \rfloor})$ on the {\em Waring rank}\footnote{A very restricted form of depth-3 circuits} of any $n$-variate polynomial of degree $d$. (In particular, they cannot prove such lower bounds on stronger models, including depth-3 circuits.)
\end{itemize}
The proofs of these bounds use simple linear-algebraic arguments, leveraging connections between the {\em symbolic} rank of matrix polynomials and the usual rank of their evaluations. These techniques can perhaps be extended to barriers for other arithmetic models on which progress has halted.
To see how these barrier results directly inform the state-of-art in arithmetic complexity we note the following.
First, the bounds above nearly match the best explicit bounds we know for these models, hence offer an explanations why the rank methods got stuck there. Second, the bounds above are a far cry (quadratically away) from the true complexity (e.g. of random polynomials) in these models, which {\em if} achieved (by any methods), are known to imply super-polynomial formula lower bounds.
We also explain the relation of our barrier results to other attempts, and in particular how they significantly differ from the recent attempts to find analogues of ``natural proofs'' for arithmetic complexity. Finally, we discuss the few arithmetic lower bound approaches which fall outside rank methods, and some natural directions our barriers suggest.

The problem of constructing error-resilient interactive protocols was introduced in the seminal works of Schulman (FOCS 1992, STOC 1993). These works show how to convert any two-party interactive protocol into one that is resilient to constant-fraction of error, while blowing up the communication by only a constant factor. Since these seminal works, there have been many followup works which improve the error rate, the communication rate, and the computational efficiency.
All these works assume that in the underlying protocol in each round each party sends a single bit. Thus, in this model, any protocol with $2T$ bits of communication requires $T$ rounds of back-and-forth communication. Moreover, all these works assume that the communication complexity of the underlying protocol is {\em fixed} (and a priori known).
In this work, we consider the model where in each round each party may send a message of {\em arbitrary length}, where the length of the messages and the length of the protocol may be {\em adaptive}, and may depend on the private inputs of the parties and on previous communication. In particular, we do not assume any bound on the communication complexity of the underlying protocol.
This model is known as the (synchronous) {\em message passing model}, and is commonly used in distributed computing, and is the most common model used in cryptography.
We consider the adversarial error model, where $\epsilon$-fraction of the communication may be corrupted. Our error model not only allows the adversary to toggle with the corrupted bits, but also allows the adversary to insert and delete bits. We also assume a bound $\epsilon'$ on the fraction of rounds that can be (fully) corrupted, and argue that such a bound is necessary in order to obtain an error-resilient version with small blowup in the round complexity.
We show how to convert any protocol $\Pi$ into another protocol $\Pi'$ with comparable efficiency guarantees, that is resilient to such adversarial error (for some fixed constants $\epsilon,\epsilon'>0$).
We construct such $\Pi'$ for a various range of blowup parameters.
In particular, we construct such $\Pi'$ with $(1+\tilde{O}\left(\epsilon^{1/4}\right))$ blowup in communication and $O(1)$ blowup in rounds. We also show how to reduce the blowup in rounds at the expense of increasing the blowup in communication, and construct $\Pi'$ where both the blowup in rounds and communication, approaches one (i.e., no blowup) as $\epsilon$ and $\epsilon'$ approach zero.
We also give evidence that our parameters are optimal.
Finally, we emphasize that our transformation is quite simple, and preserves the computational efficiency of the protocol. Namely, if in underlying protocol~$\Pi$ the parties run in time~$T$, the their runtime in $\Pi'$ is $\poly(T)$.

We study coding schemes for multiparty interactive communication
over synchronous networks that suffer from stochastic noise, where each bit is independently flipped with probability~$\eps$. We analyze the minimal overhead that must be added by the coding scheme in order to succeed in performing the computation despite the noise.
Our main result is a lower bound on the communication of any noise-resilient protocol over a synchronous star network with $n$-parties (where all parties communicate in every round).
Specifically, we show a task that can be solved by communicating
$T$~bits over the noise-free network, but for which any protocol with success probability of~${1-o(1)}$ must communicate at least $\Omega(T \frac{\log n}{\log\log n})$ bits when the channels are noisy.
By a 1994 result of Rajagopalan and Schulman, the slowdown we prove is the highest one can obtain on any topology, up to a $\log \log n$ factor.
We complete our lower bound with a matching coding scheme that achieves the same overhead; thus, the capacity of (synchronous) star networks is $\Theta(\log\log n /\log n)$. Our bounds prove that, despite several previous coding schemes with rate $\Omega(1)$ for certain topologies, no coding scheme with constant rate $\Omega(1)$ exists for arbitrary $n$-party noisy networks.

We consider the task of multiparty computation performed over networks in
the presence of random noise. Given an $n$-party protocol that takes $R$
rounds assuming noiseless communication, the goal is to find a coding
scheme that takes $R'$ rounds and computes the same function with high
probability even when the communication is noisy, while maintaining a
constant asymptotic \emph{rate}, i.e., while keeping $\liminf_{n,R\to
\infty} R/R'$ positive.
Rajagopalan and Schulman (STOC '94) were the first to consider this
question, and provided a coding scheme with rate~$O(1/\log (d+1))$, where
$d$ is the maximal degree in the network. While that
scheme provides a constant rate coding for many practical situations,
in the worst case, e.g., when the network is a complete graph, the rate
is~$O(1/\log n)$, which tends to $0$ as $n$ tends to infinity.
We revisit this question and provide an efficient coding scheme with
a constant rate for the interesting case of fully connected networks.
We furthermore extend the result and show that if a ($d$-regular) network has
mixing time~$m$, then there exists an efficient coding scheme with
rate $O(1/m^3\log m)$. This implies a constant rate coding scheme for
any $n$-party protocol over a $d$-regular network with a constant mixing time,
and in particular for random graphs with $n$ vertices and
degrees $n^{\Omega(1)}$.

Let $G=(V,E)$ be a connected undirected graph with $k$ vertices. Suppose
that on each vertex of the graph there is a player having an $n$-bit
string. Each player is allowed to communicate with its neighbors according
to an agreed communication protocol, and the players must decide,
deterministically, if their inputs are all equal. What is the minimum
possible total number of bits transmitted in a protocol solving
this problem ? We determine this minimum up to a lower order
additive term in many cases (but not for all graphs).
In particular, we show that it is $kn/2+o(n)$ for any
Hamiltonian $k$-vertex graph, and that for any $2$-edge connected
graph with $m$ edges containing no
two adjacent vertices of degree exceeding $2$ it is $mn/2+o(n)$.
The proofs combine graph theoretic ideas with
tools from additive number theory.

The method of shifted partial
derivatives
introduced in
\cite{DBLP:journals/eccc/Kayal12,gupta4} alone cannot prove that the padded permanent
$\ell^{n-m}\tperm_m$ cannot
be realized inside the $GL_{n^2}$-orbit closure of the determinant $\tdet_n$
when $n>2m^2+2m$.

The minimal free resolution of the Jacobian ideals of the determinant polynomial
were computed by Lascoux \cite{MR520233}, and it is an active area of
research to understand the Jacobian ideals of
the permanent, see e.g., \cite{MR1777172,MR2386244}. As a step in this direction we compute several new cases
and completely determine the linear strands of the minimal free resolutions
of the ideals generated by sub-permanents.
Our motivation is an exploration of the utility and limits of the method of shifted partial
derivatives
introduced in
\cite{DBLP:journals/eccc/Kayal12,gupta4}. The method of
shifted partial derivatives amounts to computing Hilbert functions
of Jacobian ideals, and the Hilbert functions
are in turn the Euler characteristics of the minimal free resolutions of the
Jacobian ideals. We compute several such Hilbert functions
relevant for complexity theory. We show that the method of shifted partial derivatives alone cannot prove the padded permanent
$\ell^{n-m}\tperm_m$ cannot
be realized inside the $GL_{n^2}$-orbit closure of the determinant $\tdet_n$
when $m< 1.5 n^{2}$.

In this paper we extend the notion of list-decoding to the setting of interactive communication and study its limits. In particular, we show that any protocol can be encoded, with a constant rate, into a list-decodable protocol which is resilient
to a noise rate of up to $\frac{1}{2}-\varepsilon$, and that this is tight.
Using our list-decodable construction, we study a more nuanced model of noise where the adversary can corrupt up to a fraction $\alpha$
Alice's communication and up to a fraction $\beta$ of Bob's communication. We use list-decoding in order to fully characterize the region $\mathcal{R}_U$ of pairs $(\alpha, \beta)$ for which unique decoding with a constant rate is possible. The region $\mathcal{R}_U$ turns out to be quite unusual in its shape. In particular, it is bounded by a piecewise-differentiable curve with infinitely many pieces.
We show that outside this region, the rate must be exponential. This suggests that in some error regimes, list-decoding is necessary for optimal unique decoding.
We also consider the setting where only one party of the communication must output the correct answer. We precisely characterize the region of all pairs $(\alpha,\beta)$ for which one-sided unique decoding is possible in a way that Alice will output the correct answer.

We provide tight upper and lower bounds on the noise resilience of interactive communication
over noisy channels with \emph{feedback}.
In this setting, we show that the maximal fraction of noise that any robust protocol can resist is~$1/3$.
Additionally, we provide a simple and efficient robust protocol that succeeds as long as the fraction of noise is at most~$1/3-\eps$.
Surprisingly, both bounds hold regardless of whether the parties communicate via a binary or an arbitrarily large alphabet.
This is contrasted with the bounds of~$1/4$ and $1/8$ shown by Braverman and Rao (STOC~'11) for the case of robust protocols over noisy channels without feedback, assuming a large (constant size) alphabet or a binary alphabet respectively.
We also consider interactive communication over \emph{erasure} channels. We provide a protocol that matches the optimal tolerable erasure rate of $1/2 - \eps$ of previous protocols (Franklin et~al., CRYPTO~'13) but operates in a much simpler and more efficient way. Our protocol works with an alphabet of size~$6$, in contrast to prior protocols in which the alphabet size grows as~$\eps\to0$.
Building on the above algorithm with \emph{fixed} alphabet we are able to devise
a protocol that works for \emph{binary} erasure channels,
improving the best previously known bound on the tolerable erasure rate from $1/4-\eps$ to~$1/3-\eps$.

Locally Decodable Code (LDC) is a code that encodes a message in
a way that one can decode any particular symbol of the message by reading only a constant number of locations,
even if a constant fraction of the encoded message is adversarially corrupted.
In this paper we present a new approach for the construction of LDCs. We show that if there exists an irreducible representation $(\rho, V)$ of $G$ and $q$ elements $g_1,g_2,\ldots, g_q$
in $G$ such that there exists a linear combination of matrices $\rho(g_i)$ that is of rank one,
then we can construct a $q$-query Locally Decodable Code
$C:V-> F^G$.
We show the potential of this approach by constructing constant query LDCs of sub-exponential length matching the parameters of the best known constructions.

@article{Efremenko11,

author = { Klim Efremenko },

title = { From Irreducible Representations to Locally Decodable Codes.},

journal = {Electronic Colloquium on Computational Complexity (ECCC)},

year = {2011}

}

author = { Klim Efremenko },

title = { From Irreducible Representations to Locally Decodable Codes.},

journal = {Electronic Colloquium on Computational Complexity (ECCC)},

year = {2011}

}

Recently Efremenko showed locally-decodable codes of sub-exponential
length. That result showed that these codes can handle up to
$\frac{1}{3} $ fraction of errors. In this paper we show that the
same codes can be locally unique-decoded from error rate
$\half-\alpha$ for any $\alpha>0$ and locally list-decoded from
error rate $1-\alpha$ for any $\alpha>0$, with only a constant
number of queries and a constant alphabet size. This gives the first
sub-exponential codes that can be locally list-decoded with a
constant number of queries.

@inproceedings{BET10,

author = {Avraham Ben-Aroya and Klim Efremenko and Amnon Ta-Shma},

title = {Local List Decoding with a Constant Number of Queries},

booktitle = {FOCS},

year = {2010},

pages = {715-722},

ee = {http://dx.doi.org/10.1109/FOCS.2010.88}

}

author = {Avraham Ben-Aroya and Klim Efremenko and Amnon Ta-Shma},

title = {Local List Decoding with a Constant Number of Queries},

booktitle = {FOCS},

year = {2010},

pages = {715-722},

ee = {http://dx.doi.org/10.1109/FOCS.2010.88}

}

Trevisan [Tre03] suggested a transformation that allows amplifying the error rate a
code can handle. We observe that this transformation, that was suggested in the non-local
setting, works also in the local setting and thus gives a generic, simple way to amplify
the error-tolerance of locally decodable codes. Specifically, this shows how to transform a
locally decodable code that can tolerate a constant fraction of errors to a locally decodable
code that can recover from a much higher error-rate, and how to transform such locally
decodable codes to locally list-decodable codes.
The transformation of [Tre03] involves a simple composition with an approximately
locally (list) decodable code. Using a construction of such codes by Impagliazzo et
al. [IJKW10], the transformation incurs only a negligible growth in the length of the code
and in the query complexity.

@article{DBLP:journals/eccc/Ben-AroyaET10a,

author = {Avraham Ben-Aroya and Klim Efremenko and Amnon Ta-Shma},

title = {A Note on Amplifying the Error-Tolerance of Locally Decodable Codes},

journal = {Electronic Colloquium on Computational Complexity (ECCC)},

volume = {17},

year = {2010},

pages = {134},

ee = {http://eccc.hpi-web.de/report/2010/134},

bibsource = {DBLP, http://dblp.uni-trier.de} }

author = {Avraham Ben-Aroya and Klim Efremenko and Amnon Ta-Shma},

title = {A Note on Amplifying the Error-Tolerance of Locally Decodable Codes},

journal = {Electronic Colloquium on Computational Complexity (ECCC)},

volume = {17},

year = {2010},

pages = {134},

ee = {http://eccc.hpi-web.de/report/2010/134},

bibsource = {DBLP, http://dblp.uni-trier.de} }

Locally Decodable Codes (LDC) allow one to decode any particular symbol of the input message by making a constant number of queries to a codeword, even if a constant fraction of the codeword is damaged. In a recent work ~\cite{Yekhanin08} Yekhanin constructs a 3-query LDC with sub-exponential length. However, this construction requires a conjecture that there are infinitely many Mersenne primes. In this paper, we give the first unconditional constant query LDC construction with subexponantial codeword length. In addition, our construction reduces codeword length.

@article{Efremenko09,

author = {Klim Efremenko},

title = {3-Query Locally Decodable Codes of Subexponential Length },

year = {2009},

booktitle= {The 41st ACM Symposium on Theory of Computing} }

author = {Klim Efremenko},

title = {3-Query Locally Decodable Codes of Subexponential Length },

year = {2009},

booktitle= {The 41st ACM Symposium on Theory of Computing} }

A random walk on a graph is a process that explores the graph in a random way: at each step the walk is at a vertex of the graph, and at each step it moves to a uniformly selected neighbor of this vertex. Random walks are extremely useful in computer science and in other fields. A very natural problem that was recently raised by Alon, Avin, Koucky, Kozma, Lotker, and Tuttle (though it was implicit in several previous papers) is to analyze the behavior of $k$ independent walks in comparison with the behavior of a single walk. In particular, Alon et al. showed that in various settings (e.g., for expander graphs), $k$ random walks cover the graph (i.e., visit all its nodes), $\Omega(k)$-times faster (in expectation) than a single walk. In other words, in such cases $k$ random walks efficiently ``parallelize" a single random walk. Alon et al.\ also demonstrated that, depending on the specific setting, this ``speedup" can vary from logarithmic to exponential in $k$.
In this paper we initiate a more systematic study of multiple random walks. We give lower and upper bounds both on
the cover time {\em and on the hitting time} (the time it takes to hit one specific node) of multiple random walks. Our study revolves over three alternatives for the starting vertices of the random walks: the worst starting vertices (those who maximize the hitting/cover time), the best starting vertices, and starting vertices selected from the stationary distribution. Among our results, we show that the speedup when starting the walks at the worst vertices cannot be too large - the hitting time cannot improve by more than an $O(k)$ factor and the cover time cannot improve by more than $\min\{k \log n,k^2\}$ (where $n$ is the number of vertices). These results should be contrasted with the fact that there was no previously known upper-bound on the speedup and that the speedup can even be {\em exponential} in $k$ for random starting vertices. We further show that for $k$ that is not too large (as a function of various parameters of the graph), the speedup in cover time is $O(k)$ {\em even for walks that start from the best vertices} (those that minimize the cover time). As a rather surprising corollary of our theorems, we obtain a new bound which relates
the cover time $C$ and the mixing time $\mix$ of a graph. Specifically, we show that $C=O(m \sqrt{\mix}\log^2 n)$ (where $m$ is the number of edges).

@inproceedings{DBLP:conf/approx/EfrReingold09,

author = {Klim Efremenko and Omer Reingold},

title = {How Well Do Random Walks Parallelize?},

booktitle = {APPROX-RANDOM},

year = {2009},

pages = {476-489},

crossref = {DBLP:conf/approx/2009},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

author = {Klim Efremenko and Omer Reingold},

title = {How Well Do Random Walks Parallelize?},

booktitle = {APPROX-RANDOM},

year = {2009},

pages = {476-489},

crossref = {DBLP:conf/approx/2009},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

We consider the classic problem of pattern matching with few mismatches in the presence of promiscuously matching wildcard symbols. Given a text $t$ of length $n$ and a pattern $p$ of length $m$ with optional wildcard symbols and a bound $k$, our algorithm finds all the alignments for which the pattern matches the text with Hamming distance at most $k$ and also returns the location and identity of each mismatch. The algorithm we present is deterministic and runs in $\tilde{O}(kn)$ time, matching the best known randomised time complexity to within logarithmic factors. The solutions we develop borrow from the tool set of algebraic coding theory and provide a new framework in which to tackle approximate pattern matching problems.

@inproceedings{DBLP:conf/soda/CliffordEPR09,

author = {Rapha{\"e}l Clifford and Klim Efremenko and Ely Porat and Amir Rothschild},

title = {From coding theory to efficient pattern matching},

booktitle = {SODA},

year = {2009},

pages = {778-784},

ee = {http://doi.acm.org/10.1145/1496770.1496855},

crossref = {DBLP:conf/soda/2009},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

author = {Rapha{\"e}l Clifford and Klim Efremenko and Ely Porat and Amir Rothschild},

title = {From coding theory to efficient pattern matching},

booktitle = {SODA},

year = {2009},

pages = {778-784},

ee = {http://doi.acm.org/10.1145/1496770.1496855},

crossref = {DBLP:conf/soda/2009},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

Let $T=t_0 \ldots t_{n-1}$ be a text and $P = p_0 \ldots p_{m-1}$
a pattern taken from some finite alphabet set $\Sigma$, and let
$\dist$ be a metric on $\Sigma$. We consider the problem of
calculating the sum of distances between the symbols of $P$ and the
symbols of substrings of $T$ of length $m$ for all possible offsets.
We present an $\varepsilon$-approximation algorithm for this problem
which runs in time $O(\frac{1}{\varepsilon^2}n\cdot \mathrm{
polylog}(n,\abs{\Sigma}))$. This algorithm is based on a low
distortion embedding of metric spaces into normed spaces (especially, into $\ell_{\infty}$), which is done as a preprocessing
stage. The algorithm is also based on a technique of sampling.

@inproceedings{PoratE08,

author = {Ely Porat and Klim Efremenko},

title = {Approximating general metric distances between a pattern and a text},

booktitle = {SODA},

year = {2008},

pages = {419-427}

}

author = {Ely Porat and Klim Efremenko},

title = {Approximating general metric distances between a pattern and a text},

booktitle = {SODA},

year = {2008},

pages = {419-427}

}

We present solutions for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern of length m with don't care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give a O(n(k+logmlogk)logn) time randomised algorithm which finds the correct answer with high probability. We then present a new deterministic O(nk^2log^2m) time solution that uses tools originally developed for group testing. Taking our derandomisation approach further we develop an approach based on k-selectors that runs in ?(nkpolylogm) time. Further, in each case the location of the mismatches at each alignment is also given at no extra cost.

@article{DBLP:journals/jcss/CliffordEPR10,

author = {Rapha{\"e}l Clifford and Klim Efremenko and Ely Porat and Amir Rothschild},

title = {Pattern matching with don't cares and few errors},

journal = {J. Comput. Syst. Sci.}, volume = {76},

number = {2},

year = {2010},

pages = {115-124},

ee = {http://dx.doi.org/10.1016/j.jcss.2009.06.002},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

author = {Rapha{\"e}l Clifford and Klim Efremenko and Ely Porat and Amir Rothschild},

title = {Pattern matching with don't cares and few errors},

journal = {J. Comput. Syst. Sci.}, volume = {76},

number = {2},

year = {2010},

pages = {115-124},

ee = {http://dx.doi.org/10.1016/j.jcss.2009.06.002},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

We present a deterministic black box solution for online approximate matching.
Given a pattern of length $m$ and a streaming text of length $n$ that
arrives one character at a time, the task is to report the distance
between the pattern and a sliding window of the text as soon as the
new character arrives. Our solution requires $O(\Sigma_{j=1}^{\log_2{m}} T(n,2^{j-1})/n)$ time for each input character, where $T(n,m)$ is the total running time of the best offline
algorithm. The types of approximation that are supported include exact matching with wildcards, matching under the Hamming norm, approximating the Hamming norm, $k$-mismatch and numerical measures such as the $L_2$ and $L_1$ norms. For these examples, the resulting online algorithms take $O(\log^2{m})$, $O(\sqrt{m\log{m}})$, $O(\log^2{m}/{\epsilon}^2)$, $O(\sqrt{k \log k} \log{m})$, $O(\log^2{m})$ and $O(\sqrt{m\log{m}})$ time per character respectively. The space overhead is $O(m)$ which we show is optimal.

@inproceedings{DBLP:conf/cpm/CliffordEPP08,

author = {Rapha{\"e}l Clifford and Klim Efremenko and Benny Porat and Ely Porat},

title = {A Black Box for Online Approximate Pattern Matching},

booktitle = {CPM},

year = {2008},

pages = {143-151},

ee = {http://dx.doi.org/10.1007/978-3-540-69068-9_15},

crossref = {DBLP:conf/cpm/2008},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

author = {Rapha{\"e}l Clifford and Klim Efremenko and Benny Porat and Ely Porat},

title = {A Black Box for Online Approximate Pattern Matching},

booktitle = {CPM},

year = {2008},

pages = {143-151},

ee = {http://dx.doi.org/10.1007/978-3-540-69068-9_15},

crossref = {DBLP:conf/cpm/2008},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

We consider the well known problem of pattern matching under the
Hamming distance. Previous approaches have shown how to count the
number of mismatches efficiently especially, when a bound is known
for the maximum Hamming distance. Our interest is different in
that we wish collect a random sample of mismatches of fixed size
at each position in the text. Given a pattern $p$ of length $m$
and a text $t$ of length $n$,
we show how to sample with high probability $c$ mismatches where possible from every alignment of $p$ and $t$ in time.
Further, we guarantee that the mismatches are sampled uniformly and that they can therefore be seen as representative
of the types of mismatches that occur.

@inproceedings{DBLP:conf/spire/CliffordEPPR08,

author = {Rapha{\"e}l Clifford and Klim Efremenko and Benny Porat and Ely Porat and Amir Rothschild},

title = {Mismatch Sampling},

booktitle = { Information and Computation},

year = {2012},

pages = {99-108},

ee = {http://dx.doi.org/10.1007/978-3-540-89097-3_11},

crossref = {DBLP:conf/spire/2012},

bibsource = {DBLP, http://dblp.uni-trier.de}

}

author = {Rapha{\"e}l Clifford and Klim Efremenko and Benny Porat and Ely Porat and Amir Rothschild},

title = {Mismatch Sampling},

booktitle = { Information and Computation},

year = {2012},

pages = {99-108},

ee = {http://dx.doi.org/10.1007/978-3-540-89097-3_11},

crossref = {DBLP:conf/spire/2012},

bibsource = {DBLP, http://dblp.uni-trier.de}

}