MPS: improving exact string matching through pattern character frequency
Journal article
Sahota, V., Maozhen, L. and Bayford, R. 2013. MPS: improving exact string matching through pattern character frequency. Journal of Data Processing. 3 (3), pp. 127-137.
Authors | Sahota, V., Maozhen, L. and Bayford, R. |
---|---|
Abstract | One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and establish its relevance. Intrinsic to this big data analysis problem is the need for tools to scale well in proportion to the growth of data and have the flexibility to operate across disparate datasets. Exact string matching is a fundamental tool used to search through data, though many existing algorithms do not scale well since their computational cost occurs and grows during reading of data. A proof of concept is presented in this paper which transfers all the required calculations to a pre-processing stage, removing all calculations during the reading stage; creating a search trie which incorporates the statistical distribution of characters in the search pattern to reduce overall calculation and lookups in a search. The resulting algorithm produces a total calculation costs which is independent of data (source) size. Preliminary results shows the new algorithm out performing existing general algorithms, as the search pattern becomes large for natural English text and when searching a small alphabet source (DNA). |
Year | 2013 |
Journal | Journal of Data Processing |
Journal citation | 3 (3), pp. 127-137 |
Publisher | DLINE |
ISSN | 2278-6481 |
Related URL | http://www.dline.info/jdp/v3n3.php |
Publication dates | |
Sep 2013 | |
Publication process dates | |
Deposited | 07 Jul 2015 |
Accepted | Sep 2013 |
Output status | Published |
https://repository.canterbury.ac.uk/item/875z9/mps-improving-exact-string-matching-through-pattern-character-frequency
32
total views0
total downloads0
views this month0
downloads this month