Gridifying IBM's Generic Log Adapter to Speed-Up the Processing of Log Data
Abstract
Problem determination in today's computing environments consumes between 30 and 70% of an organization's IT resources and represents from one third to one half of their total cost of ownership. The first step to cutting down costs in this area and to enable autonomic computing systems is to have all parts of the system report status in a common log data format and semantics in order to be able to exploit the status information of the system as a whole. The Generic Log Adapter (GLA) is a generic parsing engine shipped with the IBM's Autonomic Computing Toolkit that has been conceived to convert proprietary log data into a standard log data event-based format in real time. However, in order to provide generic support for parsing the majority of today's unstructured log data formats the GLA makes heavy use of regular expressions that incur in performance limitations. Until now all the approaches that have been proposed to increase GLA's performance have revolved around fine-tuning the set of regular expressions used to configure the GLA for a particular log data format or writing specific parsing code. In this work we propose a very new approach consisting in transparently parallelizing the GLA by taking advantage of its internal architecture and the fact that structuring log data is a task that lends itself very well to parallelization. We present a master-worker strategy that “gridifies” the GLA efficiently in a completely transparent way for the user. (http://doi.ieeecomputersociety.org/10.1109/CISIS.2007.31 )