Report Number: CS-TR-77-629
Institution: Stanford University, Department of Computer Science
Title: The complexity of pattern matching for a random string
Author: Yao, Andrew Chi-Chih
Date: October 1977
Abstract: We study the average-case complexity of finding all occurrences of a given pattern $\alpha$ in an input text string. Over an alphabet of q symbols, let c($\alpha$,n) be the minimum average number of characters that need to be examined in a random text string of length n. We prove that, for large m, almost all patterns $\alpha$ of length m satisfy c($\alpha$,n) = $\Theta (\lceil \log_q (${n-m}/{ln m} + 2)\rceil )$ if $m \leq n \leq 2m$, and c($\alpha$,n) = $\Theta ({\lceil \log_q m\rceil}/m n)$ if n > 2m. This in particular confirms a conjecture raised in a recent paper by Knuth, Morris, and Pratt [1977].
http://i.stanford.edu/pub/cstr/reports/cs/tr/77/629/CS-TR-77-629.pdf