Copyright | Justin Bailey Chris Kuklewicz Daniel Fischer |
---|---|

License | BSD3 |

Maintainer | Daniel Fischer <daniel.is.fischer@googlemail.com> |

Stability | Provisional |

Portability | non-portable (BangPatterns) |

Safe Haskell | None |

Language | Haskell98 |

Fast search of lazy `ByteString`

values using the
Knuth-Morris-Pratt algorithm.

A description of the algorithm can be found at http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm.

Original authors: Justin Bailey (jgbailey at gmail.com) and Chris Kuklewicz (haskell at list.mightyreason.com).

- indices :: ByteString -> ByteString -> [Int64]
- nonOverlappingIndices :: ByteString -> ByteString -> [Int64]
- strictify :: ByteString -> ByteString

# Overview

This module provides two functions for finding the occurrences of a
pattern in a target string using the Knuth-Morris-Pratt algorithm.
It exists mostly for systematic reasons, the functions from
Data.ByteString.Lazy.Search are much faster, except for very short
patterns or long patterns with a short period if overlap is allowed.
In the latter case, `indices`

from this module may be the best choice
since the Boyer-Moore function's performance degrades if there are many
matches and the DFA function's automaton needs much space for long
patterns.
In the former case, for some pattern/target combinations DFA has better
performance, for others KMP, usually the difference is small.

## Complexity and Performance

The preprocessing of the pattern is *O*(`patternLength`

) in time and space.
The time complexity of the searching phase is *O*(`targetLength`

) for both
functions.

In most cases, these functions are considerably slower than the Boyer-Moore variants, performance is close to that of those from Data.ByteString.Search.DFA.

## Partial application

Both functions can be usefully partially applied. Given only a pattern, the auxiliary data will be computed only once, allowing for efficient re-use.

# Functions

:: ByteString | Strict pattern to find |

-> ByteString | Lazy string to search |

-> [Int64] | Offsets of matches |

:: ByteString | Strict pattern to find |

-> ByteString | Lazy string to search |

-> [Int64] | Offsets of matches |

finds the starting indices of all
non-overlapping occurrences of the pattern in the target string.
It is more efficient than removing indices from the list produced
by `nonOverlappingIndices`

`indices`

.

## Convenience

strictify :: ByteString -> ByteString #

transforms a lazy `strictify`

`ByteString`

into a strict
`ByteString`

, to make it a suitable pattern for the searching
functions.