MPS: improving exact string matching through pattern character frequency

Journal article


Sahota, V., Maozhen, L. and Bayford, R. 2013. MPS: improving exact string matching through pattern character frequency. Journal of Data Processing. 3 (3), pp. 127-137.
AuthorsSahota, V., Maozhen, L. and Bayford, R.
Abstract

One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and establish its relevance. Intrinsic to this big data analysis problem is the need for tools to scale well in proportion to the growth of data and have the flexibility to operate across disparate datasets. Exact string matching is a fundamental tool used to search through data, though many existing algorithms do not scale well since their computational cost occurs and grows during reading of data. A proof of concept is presented in this paper which transfers all the required calculations to a pre-processing stage, removing all calculations during the reading stage; creating a search trie which incorporates the statistical distribution of characters in the search pattern to reduce overall calculation and lookups in a search. The resulting algorithm produces a total calculation costs which is independent of data (source) size. Preliminary results shows the new algorithm out performing existing general algorithms, as the search pattern becomes large for natural English text and when searching a small alphabet source (DNA).

Year2013
JournalJournal of Data Processing
Journal citation3 (3), pp. 127-137
PublisherDLINE
ISSN2278-6481
Related URLhttp://www.dline.info/jdp/v3n3.php
Publication dates
PrintSep 2013
Publication process dates
Deposited07 Jul 2015
AcceptedSep 2013
Output statusPublished
Permalink -

https://repository.canterbury.ac.uk/item/875z9/mps-improving-exact-string-matching-through-pattern-character-frequency

  • 34
    total views
  • 0
    total downloads
  • 1
    views this month
  • 0
    downloads this month

Export as

Related outputs

Unveiling pollution peaks: Comparing swarm intelligence with Drone Hill Climber
Prior, Oliver J., Hannan Bin Azhar, M. A., Sahota, Vijay and Turner, Scott 2024. Unveiling pollution peaks: Comparing swarm intelligence with Drone Hill Climber. in: 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics (SISY) IEEE. pp. 399-404
Modeling P2P grid information services with colored Petri nets
Sahota, V. and Li, M. 2009. Modeling P2P grid information services with colored Petri nets. in: Wang, L., Jie, W. and Chen, J. (ed.) Grid Computing: Technology, Services, and Applications CRC Press.
GREMO: a GT4 based Resource Monitor
Sahota, V. and Li, M. 2006. GREMO: a GT4 based Resource Monitor. in: Cox, S. (ed.) Proceedings of the UK e-Science All Hands Meeting 2006 National e-Science Centre.
Resource monitoring with Globus Toolkit 4
Sahota, V., Li, M. and Guo, W. 2006. Resource monitoring with Globus Toolkit 4. in: Second International Conference on Semantics, Knowledge and Grid, 2006. SKG '06 IEEE. pp. 1-3
Tracking conductivity variations in the absence of accurate stat evolution models in electrical impedance tomography
Hashemzadeh, P., Sahota, V., Callaghan, M., Dib, H., Tizzard, A., Svensson, L. and Bayford, R. 2010. Tracking conductivity variations in the absence of accurate stat evolution models in electrical impedance tomography. in: 2010 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE) : June 18-20, 2010 Chengdu, China IEEE. pp. 1-6
Jinv: a parallel method for distributed matrix inversion
Sahota, V. and Bayford, R. 2010. Jinv: a parallel method for distributed matrix inversion. in: Developments in E-systems Engineering (DESE 2010) IEEE. pp. 163-167
A novel algorithm for online exact string matching
Sahota, V., Li, M. and Bayford, R. 2013. A novel algorithm for online exact string matching. in: 2013 Third International Conference on Innovative Computing Technology (INTECH) Picastaway, New Jersey IEEE. pp. 291-295
A grouped P2P network for scalable grid information services
Sahota, V., Li, M., Baker, M. and Antonopoulos, N. 2009. A grouped P2P network for scalable grid information services. Peer-to-Peer Networking and Applications. 2 (1), pp. 3-12. https://doi.org/10.1007/s12083-008-0016-4
Web services discovery with rough sets
Li, M., Yu, B., Sahota, V. and Qi, M. 2009. Web services discovery with rough sets. International Journal of Web Services Research. 6 (1), pp. 69-86.
Modelling scalable grid information services with colored Petri nets
Sahota, V., Li, M. and Hadjinicolaou, M. 2010. Modelling scalable grid information services with colored Petri nets. International Journal of Grid and High Performance Computing. 2 (1), pp. 51-68. https://doi.org/10.4018/978-1-4666-0879-5.ch309