<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id/>
<issn/>
<banner>
<href>banner.jpg</href>
<size width="100%"/>
</banner>
</journal-meta>
<doi>0047-cd</doi>
<article-meta>
<title-group>
<article-title>Real-Time Queries on Large Volumes of Safety Text</article-title>
</title-group>

<author>Matthew Newall<sup>1</sup> and Coen van Gulijk<sup>2</sup></author>

<aff>Institute of Railway Research, University of Huddersfield, UK</aff>
<email><a href="mailto:M.D.Newall@hud.ac.uk"><sup>a</sup>M.D.Newall@hud.ac.uk</a></email>

<email><a href="mailto:C.VanGulijk@hud.ac.uk"><sup>b</sup>C.VanGulijk@hud.ac.uk</a></email>
</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>It is often necessary to parse large volumes of text in the process of carrying out Safety and Risk Management duties. One example of this is the Close Call system, operated in the UK to log safety related incidents on the GB railways. Approximately 300,000 unstructured text reports are added each year. Traditionally, locating and categorizing potential risk indicators in the Close Call text (and other systems like it) has been a human task. Though steps have been taken towards augmenting this with computer-based analysis, real-time feedback has not been possible. <br/> This paper will discuss a platform which allows real-time queries on large volumes of text. A novel application of
Integer based hashing is applied to n-grams of the text. Using this method, in combination with search optimizations such as binary searching (which would be cumbersome or impossible to perform on unmodified text) it can be shown that pattern matching performance is improved by several orders of magnitude when compared to Brute force
matching, or even more developed methods such as the Boyer-Moore algorithm.</p>
<p><italic>Keywords: </italic>Text, Close Calls, Binary Search, Real-Time.</p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>0047</hpdf>
</fpdf>
</body>
</article>