MPS: improving exact string matching through pattern character frequency

Journal article


Sahota, V., Maozhen, L. and Bayford, R. 2013. MPS: improving exact string matching through pattern character frequency. Journal of Data Processing. 3 (3), pp. 127-137.
AuthorsSahota, V., Maozhen, L. and Bayford, R.
Abstract

One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and establish its relevance. Intrinsic to this big data analysis problem is the need for tools to scale well in proportion to the growth of data and have the flexibility to operate across disparate datasets. Exact string matching is a fundamental tool used to search through data, though many existing algorithms do not scale well since their computational cost occurs and grows during reading of data. A proof of concept is presented in this paper which transfers all the required calculations to a pre-processing stage, removing all calculations during the reading stage; creating a search trie which incorporates the statistical distribution of characters in the search pattern to reduce overall calculation and lookups in a search. The resulting algorithm produces a total calculation costs which is independent of data (source) size. Preliminary results shows the new algorithm out performing existing general algorithms, as the search pattern becomes large for natural English text and when searching a small alphabet source (DNA).

Year2013
JournalJournal of Data Processing
Journal citation3 (3), pp. 127-137
PublisherDLINE
ISSN2278-6481
Related URLhttp://www.dline.info/jdp/v3n3.php
Publication dates
PrintSep 2013
Publication process dates
Deposited07 Jul 2015
AcceptedSep 2013
Output statusPublished
Permalink -

https://repository.canterbury.ac.uk/item/875z9/mps-improving-exact-string-matching-through-pattern-character-frequency

  • 27
    total views
  • 0
    total downloads
  • 1
    views this month
  • 0
    downloads this month

Export as