Report Number: CS-TR-77-629
Institution: Stanford University, Department of Computer Science
Title: The complexity of pattern matching for a random string
Author: Yao, Andrew Chi-Chih
Date: October 1977
Abstract: We study the average-case complexity of finding all
occurrences of a given pattern $\alpha$ in an input text
string. Over an alphabet of q symbols, let c($\alpha$,n) be
the minimum average number of characters that need to be
examined in a random text string of length n. We prove that,
for large m, almost all patterns $\alpha$ of length m satisfy
c($\alpha$,n) = $\Theta (\lceil \log_q (${n-m}/{ln m} +
2)\rceil )$ if $m \leq n \leq 2m$, and c($\alpha$,n) =
$\Theta ({\lceil \log_q m\rceil}/m n)$ if n > 2m. This in
particular confirms a conjecture raised in a recent paper by
Knuth, Morris, and Pratt [1977].
http://i.stanford.edu/pub/cstr/reports/cs/tr/77/629/CS-TR-77-629.pdf