Strathmore University 
SU+ @ Strathmore 
University Library  
  
 
Electronic Theses and Dissertations 
 
2018 
A Platform for analyzing log files using temporal 
logic approach: a test case with web server logs 
 
Peris N. Muema 
Faculty of Information Technology (FIT) 
Strathmore University 
 
 
 
Follow this and additional works at https://su-plus.strathmore.edu/handle/11071/5990 
 
 
 
Recommended Citation 
Muema, P. N. (2018). A Platform for analyzing log files using temporal logic approach: a test 
case with web server logs (Thesis). Strathmore University. Retrieved from https://su-
plus.strathmore.edu/handle/11071/5990 
 
 
 
This Thesis - Open Access is brought to you for free and open access by DSpace @Strathmore  University. It has been accepted for 
inclusion in Electronic Theses and Dissertations by an authorized administrator of DSpace @Strathmore University. For more 
information, please contact librarian@strathmore.edu
 
 A Platform for Analysing Log Files Using Temporal Logic 
Approach: A Test Case with Web Server Logs 
 
Muema, Peris Ndululu 
066275  
 
A Dissertation Submitted in partial fulfilment of the requirements for the Degree of 
Master of Science in Information Systems Security at Strathmore University 
 
 
Faculty of Information Technology 
Strathmore University 
Nairobi, Kenya 
 
April, 2018
  
i 
 
DECLARATION 
I declare that this work has never been previously submitted and approved for the award of a 
degree by Strathmore University or any other university. To the best of my knowledge and belief, 
this dissertation contains no material previously published or written by another person except 
where due reference is made in the dissertation itself. 
© No part of this dissertation may be reproduced without permission of the author and 
Strathmore University 
 
Student Name        Muema, Peris Ndululu 
Student Number     066275 
Signature                …………………………… 
Date                        …………………………… 
 
Approval 
This dissertation of Muema Peris Ndululu was reviewed and approved by the following: 
Supervisor Name     Dr. Petr Matousek 
Lecturer, Faculty of Information Technology 
Brno University 
Signature                   ………………………………… 
Date                           ………………………………… 
 
  
ii 
 
ACKNOWLEDGEMENT 
I thank God Almighty for giving me the opportunity, strength and guidance to study Masters of 
Science in Information System Security at Strathmore University.  My sincere thanks to my 
supervisor, Dr. Petr Matousek for his continual willingness to guide, understand and provide 
constructive feedback; to Dr. Joseph Sevilla, for his guidance and support as well throughout my 
course work. 
I also acknowledge my family for their encouragement and prayers and my classmates Rachael, 
Collins and David for the team work and encouragement we have had during our coursework 
period. 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
  
iii 
 
DEDICATION 
I dedicate this dissertation work to God Almighty for strength, fit mind and good health throughout 
my studies. To my loving parents Mr. Benson K. Muema and Mrs. Anne W. Muema for their 
continual prayers, support and good will throughout my studies.  To those who stood by me; my 
brothers David Kilonzo, Emmanuel Mbiuki and Simeon Ngugi, May God richly bless you. To my 
supervisors, Dr. Joseph Sevilla and Dr. Petr Matousek, who greatly assisted and guided me. I give 
special thanks to my dear friends and colleagues for their motivation and encouragement. 
 
 
 
  
ii 
 
ABSTRACT 
Web logs are a set of recorded events between clients and web servers. Information provided by 
these events is valuable to computer system administrators, digital forensic investigators and 
system security personnel during digital investigations. It is important for these entities to 
understand when certain system events were initiated and by whom. To achieve this, it is 
fundamental to gather related evidence to the crime from log files. These forensic procedures 
however pose a major challenge due to large sizes of the web log files, difficulty in understanding 
and correlating to attack patterns associated to digital crimes. The connections of events that are 
remotely positioned in the large log files require extensive computational manpower. 
This dissertation proposes the design, implementation and evaluation of a web log analysis system 
based on temporal logic and reconstruction. The case study will be on web server misuse. 
Temporal Logic operators represent system changes over time. The reconstruction of records in 
web server log files as streams will enable the implementation of temporal logic on the streaming 
data. The web server attack patterns established will be described by a special subset of temporal 
logic known as MSFOMTL (Many Sorted First Order Metric Temporal Logic). The attack patterns 
will be written in a special EPL (Event Processing Language) as queries and be parsed through 
Esper, a Complex Event Processing (CEP) engine. To ensure the proposed system increases the 
quality of log analysis process, log analysis will be performed based on a time window mechanism 
on sorted log files. 
Keywords: web server log, log analysis, web server misuse, misuse patterns, complex event 
processing, Esper 
 
 
 
 
 
 
  
ii 
 
TABLE OF CONTENTS 
DECLARATION ......................................................................................................................i 
ACKNOWLEDGEMENT ....................................................................................................... ii 
DEDICATION ....................................................................................................................... iii 
ABSTRACT ........................................................................................................................... ii 
LIST OF FIGURES ................................................................... Error! Bookmark not defined. 
LIST OF TABLES ................................................................................................................viii 
LIST OF ABBREVIATIONS .................................................................................................. ix 
DEFINITION OF TERMS ....................................................................................................... x 
CHAPTER 1: INTRODUCTION .............................................................................................. 1 
1.1 Introduction .................................................................................................................... 1 
1.2 Background of the Study ................................................................................................. 1 
1.2 Problem Statement .......................................................................................................... 3 
1.3 Research Objectives ........................................................................................................ 3 
1.4 Research Questions ......................................................................................................... 4 
1.5 Scope and Limitation of the Study ................................................................................... 4 
1.6 Research Relevance......................................................................................................... 4 
CHAPTER TWO: LITERATURE REVIEW ............................................................................. 5 
2.1 Introduction .................................................................................................................... 5 
2.2 Apache Web Server Log Files ......................................................................................... 5 
2.2.1 Error Logs ................................................................................................................ 5 
2.2.3 Access Logs.............................................................................................................. 6 
2.3 Attacks on Apache .......................................................................................................... 6 
2.3.1 SQL Injection Attacks ............................................................................................... 7 
2.3.2 Brute Force Attack .................................................................................................... 7 
  
iii 
 
2.3.3 Command Injection Attack ........................................................................................ 8 
2.3.4 Apache Vulnerabilities .............................................................................................. 8 
2.4 Log Analysis ................................................................................................................... 8 
2.4.1 AWStats ................................................................................................................... 9 
2.4.2 Web Log Expert........................................................................................................ 9 
2.4.3 Sawmill .................................................................................................................. 10 
2.4.4 PyFlag .................................................................................................................... 10 
2.4.5 Webalizer ............................................................................................................... 11 
2.5 Temporal Logic Overview ............................................................................................. 12 
2.6 Related Work ................................................................................................................ 13 
2.6.1 Focus on Analysis of Large Log Files ...................................................................... 13 
2.6.2 Addressing a Variety of Log Formats ...................................................................... 14 
2.6.3 Correlation of Events through Log Entries ............................................................... 15 
2.7 Conclusion .................................................................................................................... 16 
CHAPTER THREE: METHODOLOGY................................................................................. 17 
3.1 Introduction .................................................................................................................. 17 
3.2 System Development Methodology ............................................................................... 17 
3.3 System Analysis ............................................................................................................ 19 
3.3.1 Feasibility Study ..................................................................................................... 19 
3.3.2 Research Design ..................................................................................................... 19 
3.4 System Design .............................................................................................................. 19 
3.5 System Implementation ................................................................................................. 20 
3.6 System Testing .............................................................................................................. 21 
3.7 System Evaluation......................................................................................................... 22 
3.8 Conclusion .................................................................................................................... 22 
  
iv 
 
CHAPTER FOUR: SYSTEM DESIGN AND ARCHITECTURE ............................................ 23 
4.1 Introduction .................................................................................................................. 23 
4.2 System Architecture ...................................................................................................... 23 
4.2.1 Web Server ............................................................................................................. 24 
4.2.2 Web server Logs ..................................................................................................... 24 
4.2.3 Web Attack Patterns................................................................................................ 26 
4.3 Requirement Analysis ................................................................................................... 28 
4.3.1 Functional Requirements ......................................................................................... 28 
4.3.2 Non-functional Requirements .................................................................................. 29 
4.4 System Design .............................................................................................................. 29 
4.4.1 Context Diagram ..................................................................................................... 30 
4. 5 System Model .............................................................................................................. 30 
4.5.1 Use Case Diagram .................................................................................................. 30 
4.5.2 Sequence Diagrams ................................................................................................. 34 
CHAPTER FIVE: SYSTEM IMPLEMENTATION AND TESTING ...................................... 36 
5.1 Introduction .................................................................................................................. 36 
5.2 System Specification ..................................................................................................... 36 
5.3 System Implementation and Testing............................................................................... 36 
5.3.1 Web Server Configuration ....................................................................................... 36 
5.3.2 Web Log Analysis Configuration ............................................................................ 37 
5.4 System Features ............................................................................................................ 40 
5.4.1 Viewing Apache Access Logs ................................................................................. 40 
5.4.2 Creating Web Server Logs Datasets ......................................................................... 40 
5.4.3 Loading Datasets to Web Log Analysis System ....................................................... 42 
5.4.4. Creating EPL Requests ........................................................................................... 42 
  
v 
 
5.4.5 Viewing EPL Responses ......................................................................................... 43 
5.4.6 Identifying Web Attacks ......................................................................................... 44 
5.5 System Testing .............................................................................................................. 45 
5.5.1 User Acceptance Testing ......................................................................................... 45 
5.5.2   Unit and Integration Testing .................................................................................. 49 
CHAPTER SIX: DISCUSSION OF RESULTS ....................................................................... 51 
6.1 Introduction .................................................................................................................. 51 
6.2 Findings and Achievements ........................................................................................... 51 
6.3 Review of Research Objectives ...................................................................................... 52 
CHAPTER SEVEN: CONCLUSIONS AND RECOMMENDATIONS ................................... 53 
7.1 Introduction .................................................................................................................. 53 
7.2 Conclusions .................................................................................................................. 53 
7.3 Recommendations ......................................................................................................... 53 
7.4 Future Work.................................................................................................................. 54 
REFERENCES ...................................................................................................................... 55 
APPENDICES ....................................................................................................................... 59 
Appendix A: Apache2 Web Server Installation .................................................................... 59 
Appendix B: Esper Configuration in Eclipse IDE ................................................................. 60 
Appendix C: Log.java class ................................................................................................. 61 
Appendix D: Apache Log Input Adapter Configuration ........................................................ 64 
 
 
 
 
 
  
vi 
 
LIST OF FIGURES 
Figure 1. 1: Web Server Usage 2016 - 2017 (Web Technology Survey, 2016) ............................ 1 
Figure 1. 2: Cyber Attacks Types 2016 (Calyptix, 2016)............................................................ 2 
Figure 1. 3: Web Application Vulnerabilities (Gordey, 2010) .................................................... 2 
 
Figure 2. 1: Web Application Attack Vector over HTTP 2016 (Stockley, 2016) ......................... 7 
Figure 2. 2: PyFlag Overview Architecture (Cohen, 2008) ....................................................... 11 
Figure 2. 3: MSFOMTL Syntax (Gunestas & Bilgin, 2016) ..................................................... 12 
 
Figure 3. 1: Agile Software Development Methodology (Harvin, 2016) ................................... 17 
Figure 3. 2: Usage of Content Management Systems (World Wide Web Technology Survey, 
2016) ..................................................................................................................................... 20 
Figure 3. 3: V-Model Testing Methodology (Borba & Cavalcanti, 2007) ................................. 22 
 
Figure 4. 1: Web Log Analysis System Architecture ................................................................ 23 
Figure 4. 2: SQL Injection Attack using GET method Pattern .................................................. 27 
Figure 4. 3: Brute Force Attack Pattern ................................................................................... 28 
Figure 4. 4: Web Log Analysis Context Diagram .................................................................... 30 
Figure 4. 5: Use Case Diagram for Web Server Log Analysis Platform .................................... 31 
Figure 4. 6: Sequence Diagram for Data Collection ................................................................. 35 
Figure 4. 7:  Sequence Diagram for Log Analysis .................................................................... 35 
 
Figure 5. 1: Apache2 Start Service .......................................................................................... 37 
Figure 5. 2: Web Server Logs Directory Location .................................................................... 37 
Figure 5. 3: Web Server Logs CSV Format ............................................................................. 37 
Figure 5. 4: Log Parser Script ................................................................................................. 38 
Figure 5. 5: Input Adapter Event Generation ........................................................................... 38 
Figure 5. 6: Input Adapter for- loop ........................................................................................ 39 
Figure 5. 7: Input Adapter CSV Configuration ........................................................................ 39 
Figure 5. 8 : Web Log Analysis System Network Topology ..................................................... 39 
Figure 5. 9: Apache Web Server Access Logs ......................................................................... 40 
  
vii 
 
Figure 5. 10: Apache Logs in Raw Format .............................................................................. 41 
Figure 5. 11: Log Data Conversion to CSV Command ............................................................. 41 
Figure 5. 12: Apache Log CSV File ........................................................................................ 42 
Figure 5. 13: Log CSV Loaded to Analysis System ................................................................. 42 
Figure 5. 14: SQL Injection Attack Using GET method EPL Query ......................................... 43 
Figure 5. 15: SQL Injection Attack Using POST method EPL Query ....................................... 43 
Figure 5. 16: EPL Response .................................................................................................... 44 
Figure 5. 17: SQL Injection using GET Method Traces in Apache Log .................................... 44 
Figure 5. 18: SQL Injection Attack using GET method Log Output ......................................... 45 
Figure 5. 19 : WordPress Successful Login ............................................................................. 46 
Figure 5. 20: WordPress Login Traces in Apache Log ............................................................. 46 
Figure 5. 21: WordPress Failed Login ..................................................................................... 47 
Figure 5. 22: WordPress Failed Login Traces in Apache Logs ................................................. 47 
Figure 5. 23 : Failed Login Attempt One ................................................................................. 48 
Figure 5. 24 : Failed Login Attempt Two ................................................................................ 48 
Figure 5. 25: Brute Force Login Attack Traces in Apache Logs ............................................... 48 
Figure 5. 26 : Brute Force EPL Query ..................................................................................... 48 
Figure 5. 27 : Brute Force Log Output..................................................................................... 49 
Figure 5. 28: Web Server Activities Captured in Apache Logs ................................................. 50 
Figure 5. 29: All Log Events Output ....................................................................................... 50 
 
 
 
 
 
 
 
 
 
 
 
  
viii 
 
LIST OF TABLES 
Table 4. 1: Log Format Directive Percentages ......................................................................... 25 
Table 4. 2: View Apache Logs Use Case ................................................................................. 32 
Table 4. 3: Create Apache Logs Dataset Use Case ................................................................... 32 
Table 4. 4: Load Datasets Use Case ........................................................................................ 33 
Table 4. 5: Create EPL Requests Use Case .............................................................................. 33 
Table 4. 6: View EPL Responses Use Case ............................................................................. 34 
Table 4. 7: Identify Web Attack Use Case ............................................................................... 34 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
ix 
 
LIST OF ABBREVIATIONS 
AWS         - Apache Web Service 
CEP          - Complex Event Processing 
CLF           - Common Log Format 
ELF           - Extended Log Format 
EPL            - Event Processing Language 
HTTP         - Hyper Text Transfer Protocol 
IIS               - Internet Information Services 
IPLoM         -Iterative Partitioning Log Mining 
JSON           - JavaScript Object Notation 
TL                - Temporal Logic 
MSFOMTL - Many Sorted First Order Metric Temporal Logic 
NCSA           - National Centre for Supercomputing Applications 
SIEM           - Security Information and Event Management   
URL             - Uniform Resource Identifier 
W3C            - World Wide Web Consortium     
 
 
 
 
 
  
x 
 
DEFINITION OF TERMS 
Log Files – These are records of computer systems and applications events. Records are emitted 
by network devices, operating systems, programmable and application devices. This information 
is valuable to system and network administrators, digital investigators and other information 
security professionals. 
Log Analysis -The process of examining log records. This may be done as procedures of system 
troubleshooting, security incident response, compliance to security and audit policies as well as 
understanding system and user interaction. 
Web Server –A computer system that processes requests and provides responses via HTTP, a 
network protocol for distributing information over the World Wide Web. 
Web Sever Misuse – This covers a range of activities such as hacking and intrusion which lead to 
server damage. 
Temporal Logic – A system of rules for representing and reasoning about propositions qualified 
in terms of time. It is an extension of classical logic which includes operators dealing with time 
allowing formal specification of temporal events 
Misuse Patterns – Comprise of signatures representing evidence indicating misuse. They 
correspond to activities being investigated. 
Complex Event Processing (CEP) - Comprises of techniques for analysing streams of data 
regarding events in order to derive a conclusion. CEP analyses information to infer events or 
patterns to identify threats and respond to them. 
 
 
  
1 
 
CHAPTER 1: INTRODUCTION 
1.1 Introduction 
This chapter gives a background study on web servers, web server logs, attacks targeting web 
servers as well as challenges experienced during web log analysis. The problem statement, 
specific research objectives of the study, related research questions, scope and limitations of 
the study including the research relevance of this study are discussed in this chapter. 
1.2 Background of the Study 
Apache is the most widely used Web server for commercial Web sites (Aulds, 2000). Figure 
1.1 below indicates the percentages of websites using various web servers (Web Technology 
Survey, 2016). 
 
Figure 1. 1: Web Server Usage 2016 - 2017 (Web Technology Survey, 2016) 
With the continual advances in technology, web servers have become an integral part of the 
fast developing enterprise environment today as they run and manage critical applications and 
provide valuable content to clients. Despite the tremendous progress in computer information 
security, complete secure computer systems and servers are still a challenge. In the year 2016, 
web application attacks composed of 24% of cyber-attacks (Calyptix, 2016). 
  
2 
 
 
Figure 1. 2: Cyber Attacks Types 2016 (Calyptix, 2016) 
Web servers are still prone to attacks through Cross Site scripts, information leakages and 
injection attacks. This is due to weak codes in programming and lack of web application 
structure sanitization (Kumar, 2016). Interest in web application security has risen dramatically 
relative to the number of vulnerable applications. The Open Web Application Security Project 
(OWASP) is an entity dedicated to improving the security of web application related software 
(OWASP Application Security Project, 2017). 
 
Figure 1. 3: Web Application Vulnerabilities (Gordey, 2010) 
 
  
3 
 
Log files are text like files that record users’ activities on web servers. They reside inside the 
web server (Sharma & Gupta, 2013). Information about web server users such as user name, 
IP Address, timestamps, access request, referred URL, number of bytes and result status are 
contained in log files (Grace, 2011). These logs are useful in performing postmortem inspection 
of web attack incidents. 
Web logs serve as essential sources of information describing web server traffic.  The analysis 
of web log files containing crucial information is however a tedious task for forensic 
investigators due to challenges such as large size of the web log files, difficulty in 
understanding misuse patterns as well as complexities in correlating these patterns to the actual 
crime committed as this requires much effort in computation (Calzarossa & Massari, 2011).  
Current web log analysis tools such as AWStats, PyFlag and Sawmill are integrated with 
forensic analysis toolkits (Fry, 2011). In digital forensic investigations, real time analysis and 
monitoring of web logs is not possible. Log files are first extracted then analysed after (Singer 
& Bird, 2004). 
1.2 Problem Statement 
With the increasing rate of crimes associated with digital systems, implementing digital 
procedures to obtain evidence has become vital. Web servers’ events are stored in log files. 
These log files serve as important sources of evidence during web server misuse or crime 
incidents (Gordey, 2010). Existing systems used for the extraction and analysis of these 
evidence face major challenges due to the enormous sizes of web log files and complexities in 
understanding the attack patterns connected to the crime. This leads to slow log analysis which 
is time consuming. Currently, system network and security professional lack efficient standard 
platforms to define and share attack patterns in regards to log analysis. 
1.3 Research Objectives 
This study is based on the following research objectives. 
1. To study, understand and examine typical attacks on web servers focusing on HTTP. 
2. To identify and generate log files of these attacks for analysis. 
3. To review the existing systems available for web server log analysis. 
4. To develop a web server log analysis platform used to define attack patterns using 
temporal logic. 
  
4 
 
5. To test the proposed solutions’ capability in increasing the quality of log analysis to 
detect web attacks. 
1.4 Research Questions 
This study is based on the following research questions. 
1. Which attacks target web servers focusing on HTTP? 
2. How are web server log files of these attacks identified and generated? 
3. What are the existing systems available for web server log analysis? 
4. How will the proposed web server log analysis platform be developed? 
5. How will the platform be tested to proof its capability in increasing the quality of log 
analysis to detect web attacks? 
1.5 Scope and Limitation of the Study 
This dissertation will only focus on Apache web server log data as the case study. Due to their 
publicity, these servers are mostly prone to attacks by hackers and malicious Internet users. In 
addition, WordPress Content Management System hosted on an Apache HTTP Server has been 
selected to aid in define misuse patterns. Log analysis will be limited to analysis of logs in the 
access.log files to detect SQL injection and brute force attacks specifically. 
1.6 Research Relevance 
The collection of evidence in log files is not an easy task due to the large size of log files. 
Temporal logic capability of detecting complex patterns over streams in real time will increase 
the quality of the log analysis process. This system will highly benefit digital forensic 
investigators as it provides a library to share and store attack patterns previously detected and 
identified using a standard language formats while providing a fast accurate mechanism for 
large log file analysis. 
 
 
 
  
5 
 
CHAPTER TWO: LITERATURE REVIEW 
2.1 Introduction 
This chapter focuses on the current web log analysis mechanisms studied and discussed by 
various researchers, their limitations and major challenges experienced. A review of different 
web server log files and evidence they contain as well as current web attacks and their impact 
is discussed. 
2.2 Apache Web Server Log Files 
There exists two types of Apache log files namely error and access logs. Access logs contain 
information related to client requests to web server and are used for analysing to traffic to the 
web server (Fry, 2011). Viewing the normal operation of the AWS within the access logs and 
error logs should determine if a problem exists through anomaly detection. 
2.2.1 Error Logs 
Error log files are where the Apache HTTPD sends diagnostic data and records any errors that 
are encountered during request processing. Error logs are written to error_log in UNIX systems 
and error.log in Windows. Diagnostic information logged in these logs include process startup 
and shutdown messages, critical event data, errors in request serving by the server having status 
codes between 400 and 503, and standards informational messages (Wainwright, 2008). 
The format of events generated for error logs appear as: 
[ Day of Week / Month / Day / HH:MM:SS / Year ] [ LogLevel ] [ Hostname : Location 
(Originating Ip Address) ] [ Error Message ] 
 
Information contained in error logs has the correlation advantage. This greatly assists digital 
forensic investigators due to the fact that error events corresponding to request contain more 
information than those contained in access logs. In forensically determining an attack based on 
a specific module or the AWS, correlation of the version numbers within the error logs along 
with the attack string would be useful to an investigator. The more verbose the error logs 
directive is set to, the more likely development debug suggestions may offer additional 
information to an investigator (Fry, 2011). 
  
6 
 
2.2.3 Access Logs 
These logs record all client requests sent to the web server. The Custom Log directive controls 
the location and contents of access logs. The log format directive is used to simplify the 
selection of log contents. The log format is specified and the logging can optionally be made 
conditional on request characteristics using environment variables. These access logs are 
formatted to three standards: 
1.  Common Log Format 
The World Wide Web Consortium, W3C, defines the Common Log Format, CLF, as the 
default log format used by UNIX web servers. A new log file is set up by the Custom Log 
directive using the defined common name. The configuration will write log entries in a format 
known as Common Log. Although this format is advantageous to system administrators, it 
poses a challenge to digital forensic system developers as tools used for development have to 
conform to a variety of user generated log files. A system administrator can add additional 
details as events are being recorded. An example to modify CLF is the Extended Log Format, 
ELF, also referred to as combined log file format (Aulds, 2000).  
2. Combined Log Format 
The Combined Log Format is another frequently used format string. This format resembles the 
Common Log Format with an additional two fields (Fry, 2011). 
3. Multiple Access Logs 
These logs are created by specifying multiple Custom Log directives in configuration file 
(Grace et al., 2011). 
2.3 Attacks on Apache 
Web servers are essential components of web-based applications. The Apache Web Server is 
often located at the border of a computer network hence making it vulnerable to attacks 
(Kumar, 2016). 
  
7 
 
 
Figure 2. 1: Web Application Attack Vector over HTTP 2016 (Stockley, 2016) 
 
2.3.1 SQL Injection Attacks 
These attacks are carried out by inputting malicious inputs especially to data driven 
applications as interactive web sites. It consists of insertion of a SQL query via input data from 
a client to a web application. A successful attack may access sensitive data in databases, modify 
data as well as perform administration operations on the databases hence threatening the 
confidentiality, integrity and availability of these systems. An example, instead of submitting 
correct credentials in a web site authentication form, an attacker inputs the [‘OR 1=1 --] and [ 
] as username and password respectively (Halfold & Orso, 2006). 
SELECT * FROM user WHERE login='' OR 1=1 –'AND pwd=' '  
Using the single quote the attacker closes the SQL Where clause on login field enabling SQL 
injection of SQL control code into the query. The attacker then enters OR 1=1 which will 
evaluate to true –. The -- (double dash) operator marks the start of comments, prompting the 
SQL parser to ignore the Where clause on the password field (Janot & Zavarsky, 2008). 
2.3.2 Brute Force Attack 
This attack is an attempt to uncover passwords by continuously and systematically trying every 
combination of letters, symbols and numbers. The attacker sends these combinations to the 
server and analyses the responses until the correct combination that works is discovered 
(OWASP Brute Force Attack, 2016). Attackers use tools employed with wordlists and smart 
  
8 
 
rule sets to intelligently guess authentication passwords (Sowmya & Kumar, 2013). This attack 
majorly targets web sites which require user authentication. This causes a risk to user accounts 
as well as unnecessary traffic to the web server. Brute-force attacks also try to discover hidden 
web pages with the web application. Various HTTP brute force tools relay requests via a list 
of open proxy servers, since each request comes from different IP addresses, it may be difficult 
to prevent this attack by blocking the IP address. 
These attacks are sent via the GET and POST methods as requests to the web server. An 
attacker my take a wordlist of known web pages and request each known web page then analyse 
the HTTP response code in order to determine of the web page exists on the targeted web 
server.  
2.3.3 Command Injection Attack 
The goal of this attack is to inject and execute arbitrary system commands specified by attacker 
in a vulnerable web application. These injection attacks are possible as the vulnerable web 
application executes improper user supplied data to a system shell. User supplied data may 
include forms, cookies and HTTP headers. Arbitrary system commands are executed with the 
same privileges and environment as the vulnerable web application. Such attacks are possible 
due to lack of sufficient input data validation (OWASP Command Injection, 2016).  
2.3.4 Apache Vulnerabilities 
Programming errors have security implications, the errors can be exploited to misuse system 
resources hence pose as vulnerability. In 1998, a programming error in Apache, small sized 
requests caused Apache to allocate large memory. Vulnerabilities such as non-exploitable 
buffer overflows may cause the Apache server to crash when attacked. Since Apache runs in 
pre-fork mode, many instances of the server run in parallel too. If the child crashes, the parent 
process creates a new child. An attacker will send multiple requests to disrupt the server’s 
operations. In a multithreaded mode of operation, there exist one server process hence crashing 
will result to shut down of the whole server making it unavailable (Ristic, 2005). 
2.4 Log Analysis 
The advancement of digital evidence extraction from digital devices is continually increasing 
in complexity due to new systems developed over time. With the large storage capacity in 
devices, the partitioning of memory for evidence analysis enable current forensic tools provides 
  
9 
 
relevant information to forensic investigators. This greatly minimises the amount of data 
analysed. Most software applications, operating systems and system services are able to store 
log data about events occurring during normal operations due to their development process. 
The aggregation of log events is crucial to a forensic investigator during reconstruction of the 
crime incident. The analysis of data in the log files is accurately reconstructed to display user 
activities. Although log files are not specifically for the purpose of forensic investigations, they 
contain important evidentiary information and provide hints to other sources that may contain 
vital information on an incident as well (Fry, 2011). This section provides an overview of some 
of the current tools which exits and aid in performing enhancing the log analysis process. 
2.4.1 AWStats 
Advanced Web Statistics, AWStats is a free software distributed under the GNU General 
Public License v3 (Destailleur, 2015). It is a Perl-based open source log analyser which creates 
advanced web, mail, ftp and streaming server statistical reports based on information contained 
in server logs. The resulting data is presented in clearly visible web pages. This tool can be run 
via a web browser’s common gateway interface or from an operating system command line 
directly. AWStats is capable of processing large log files from Apache, Microsoft IIS and 
WebStar through the use of intermediary data base files. Examples of analysed data presented 
in the reports include most visited sites, authenticated user visits, domains, countries of visitor’s 
hosts, latest visits and unresolved IP address lists, HTTP errors and referrer search engine. 
AWStats analyses the AWS log files in CLF, ELF, MS IIS W3C, other FTP, mail server, proxy 
and streaming media. A user may also define their log formats for analysis; however data 
gathered is limited to data in the log files themselves. This tool is neither designed as a log file 
correlation engine nor a forensic tool but for the purpose of analysis of web related log files. 
This data presented in the reports greatly assist in investigations. Addition of correlation 
capabilities and support for other log files will provide necessary characteristics for forensic 
web analysis .The AWStats software is available on the website link: http://www.awstats.org/ 
(Last accessed 7th February 2017). 
2.4.2 Web Log Expert 
It is a software application that analyses individual web server log files such as Apache and 
IIS, and generates reports corresponding to specific web pages (Tyagi & Choudhary, 2015). It 
is Windows-based and provides detailed information on site’s visitors. The results include 
general statistics, accessed files, and statistics about paths taken through the site, information 
  
10 
 
about referring pages, search engines, browsers, operating systems and errors. The built-in 
wizard aid in profile creation of a specific site in order to analyse it, this is mandatory in 
performing log analysis. After analysis, a HTML file will be present precise information on 
total hits, average page views per day, graphs displaying daily visitors, top hosts and daily 
referring sites. The Weblog Expert application is available on the website link 
https://www.weblogexpert.com/ (Last accessed 7th February 2017). 
2.4.3 Sawmill 
The main advantage of this tool is, it includes plugins that automatically detect over 800 
different types of log files and provides methods for plugin definition for nonstandard log file 
types (Fry, 2011). Sawmill architecture includes the log importer, sawmill database, web 
server, reporting interfaces, command line interfaces, scheduler and data manipulation 
languages for log analysis. Salang, sawmill language is used in displaying pages defining log 
filters in terms of r regular expressions and conditional logic. This tool provides the access 
necessary to perform correlation between log data sources. The Sawmill tool is available on 
the website link https://www.sawmill.net/ (Last accessed 7th February 2017). 
2.4.4 PyFlag 
It is an open source web based application that performs log analysis through an extensive 
Graphical User Interface. It is capable of analysing large volumes of log files from disks, 
images and network traffic such as tcpdump. The data is added to MySQL database for faster 
querying but log types are specified by user since this tool only views log files hence an analyst 
must have prior knowledge and experience to perform the required log analysis. Although 
regular expression can be considered as inputs, this tool does not contain prebuilt analysis 
capabilities (Cohen, 2008). The PyFlag application is available on the website link 
http://pyflag.sourceforge.net/Downloads/ (Last accessed 7th February 2017). 
  
11 
 
 
Figure 2. 2: PyFlag Overview Architecture (Cohen, 2008) 
 
2.4.5 Webalizer 
This is a fast and free web server log file analysis program which produces detailed usage 
statistics in HTML format for viewing with a standard web browser (Barret, 2014). The code 
is written in C language hence portable in Linux, Solaris and UNIX operating systems. Log 
formats that are analysed include CLF, variations of NCSA combine log format, xferlog CLF 
logs and W3C ELF. In addition, this tool has the capability of decompressing bzip2 and gzip 
compressed logs without the need for uncompressing hence saving on memory space thus 
providing analysis compatibility for larger log files. The log analysis carried out by this tool 
provides summary statistical information on the web server by sites or time based reports 
presented in graphical or tabular formats that are configurable form command line. This tool 
does not run in real time, provides nor correlation capabilities and does not automatically detect 
log file types. The Webalizer program is available on the website link 
http://www.webalizer.org/ (Last accessed 7th February 2017). 
  
12 
 
2.5 Temporal Logic Overview 
The development of formal methods for automatic verification and specification of real time 
computer systems has been steadily increasing Ahmed & Lisitsa (2011). System satisfaction 
based on its specifications is proofed through these verification techniques. Verification 
techniques aid in ensuring that a system execution satisfies a specific property. Logical -based 
formal methods applied in runtime verification techniques provide concise procedures to 
formally represent system specifications and provide the necessary mechanisms to reason about 
given properties  
The main objective of logic in computer science is to create languages modeling situations 
encountered by computer system professionals in ways they can be formally reasoned. 
Temporal logic is an extension of classical logic which includes operators dealing with time 
allowing formal specification of temporal events. Quantitative temporal properties are vital in 
dealing with real-time systems. Koymans (1990) introduced the Metric Temporal Logic where 
the qualitative temporal operators are converted to metric temporal operator (quantitative).The 
conversion from qualitative to quantitative temporal operators is carried out by Metric 
Temporal Logic through constraining the temporal operator with bounded or unbounded 
interval. The metric extension to temporal logic is useful in relating events in real- time 
systems.  
A special subset of temporal logic is Many Sorted First Order Metric Temporal Logic 
(MSFOMTL) which aids in modeling attack patterns for analysing log records that change 
overtime. MSFOMTL syntax is: 
 
Figure 2. 3: MSFOMTL Syntax (Gunestas & Bilgin, 2016) 
 
Where p is any propositional atom from some atoms and the symbols  are MSFOMTL 
formulas. Symbols are grouped into logical and non-logical symbols. Logical symbols are 
  
13 
 
quantifiers ∀ and ∃, the logical connectives ∧, ∨ and ¬, the logical binary predicate symbols =, 
not =, <, ≤, >, and ≥. The bounded future temporal operators (“eventually”), and ] 
(“always”), the past bounded temporal operators (“sometimes in the past”), and 
(“always in the past”) and  means  “repeats n times”. The subscripts [t1, t2] in the 
operators refer to their scope (between the moments t1 and t2 from now) (Gunestas & Bilgin, 
2016).  
2.6 Related Work 
Log analysis has been an active field of study over time, it involves the processes of inspecting 
computer system records in order to identify attacks and mitigate risks. Various researchers 
have developed and implemented different approaches for the purposes of log file analysis. 
Studies have shown that various log analysis approaches exist with different levels of success. 
This research can be grouped into three main categories:  
1. Focus on analysis of large log files 
2. Addressing a wide variety of log formats 
3. Correlation of events through log entries. 
2.6.1 Focus on Analysis of Large Log Files 
Vernekar and Buchade (2013) proposed a system based on Map Reduce algorithm for log 
analysis which provides appropriate security alerts. Map Reduce is a distributed system 
algorithm which uses clusters of computer as resources. The Hadoop framework is the most 
popular implementation of Map Reduce algorithm. This algorithm consists of Map phases and 
Reduce phases. The log file to be processed is provided as an input to the Map phase where it 
is divided into 16Mb to 128Mb chunks. These chunks are then distributed to several Map 
functions residing on the nodes of the Hadoop clusters allowing parallel processing of various 
file chunks generating the intermediate key value pair output faster. Each Reduce function is 
associated with a key. The intermediate output produced by the each Map phase is assigned to 
the Reduce phase, where Reduce functions are given the values which belong to the particular 
key. The Reduce function will then provide the final result or log report.  
The Iterative Partitioning Log Mining approach proposed by Makanju & Milios (2009) is 
another system towards log analysis of large log files. It is an algorithm for dividing event logs 
  
14 
 
into clusters and mining the patterns for alert generation based on the patterns.  It works by 
partitioning a set of log events through four iterative steps. 
i. Partition by token count. 
ii. Partition by token position. 
iii. Partition by search for bijection. 
iv. Discover cluster descriptions or line formats 
At each step the resulting partition get closer to containing log messages only produced by the 
same line format. The fourth steps attempt to identify the line formats that produced the lines 
in each partition, these discovered partitions and line formats are the output of the algorithm 
Havens and Lunt (2012) focuses on the efficiency of three off the shelf Bayesian spam filters 
through classification of log entries. The Filter Effectiveness Scale is used to compare the 
filters. The filters are first tested with Spam Assassin corpus, then they are tested for their 
ability to differentiate two types of log entries taken from actual production systems and finally 
the filters are trained on log entries from actual system outages and then tested on effectiveness 
for finding similar outages via the log files. 
Kalamatianos and Matthews (2012) proposed a log analysis technique aimed to aid digital 
analysts compute smaller collection of events related to their analysis objective rather than the 
entire original large log file. This technique is based on computing a similarity score between 
logged events and a group of significant events known as beacons. The beacon events are 
selected through an automated process which searches for unusual events either by operators 
auditing system logs or hypothesis generation process. Being a domain independent approach, 
domain knowledge is not utilized. Different types of events are not treated in a preferential 
manner while computing the similarity score, therefore collection of log data conforming to 
different formats and schemas is possible. A pre-existing training data set is not required as 
this limits most approaches based on machine learning techniques. 
2.6.2 Addressing a Variety of Log Formats 
Software log analysis heavily contributes to software testing and troubleshooting.  The first 
step in automating log analysis is extraction of data. Jayathilake (2011) introduced a new log 
  
15 
 
data extraction generic scheme that provides advanced features to hasten log file analysis 
procedures. It has the capability of handling various types of log files such as xml, tabular, line 
and binary logs with complex structures and syntax. Log entries are differentiated using 
attributes such as length, minimum length, maximum length, delimiters and possible values. 
The output consists of a tree structure resembling a mind map containing information related 
to a specific case. 
The information in computer system logs is crucial for gathering forensic evidence when 
investigating system attacks. Arasteh and Debbabi (2007) proposed a model checking approach 
for addressing the issue of formal analysis of logs. The logs are modeled as a tree with labeled 
log events.  Each event is expressed as an algebraic term providing structure to these 
events.  Signatures of the algebraic term are selected to include relevant information required 
to conduct the analysis process. 
2.6.3 Correlation of Events through Log Entries 
In today’s technology Security Information and Event Management, SIEM systems aid in 
gathering information from network devices, software and hardware security systems and 
applications. Event and activity monitoring and reporting capabilities of SIEM systems can be 
used in real time correlation of events with other conceptual information. The widely used 
SIEM products include Splunk which improves the detection and response to advanced threats 
through the Splunk User Behavior Analytics which provides broad security intelligence, 
LogRhythm which unifies SIEM, log management and network endpoint monitoring and 
forensics with advanced security analytics and ArcSight which provides big data security 
analytics and intelligence for security information and log management by collecting security 
log data from operating systems and applications and analysing them for patterns of attacks or 
malicious activity. 
OSSIM, an open source SIEM system product provides complete event collection, 
normalization and rule- based correlation. It allows users define the dependencies between 
events via xml file instead of using temporal logic. On the other hand, MASSIF which is based 
on Complex Event Processing translates OSSIM’s directives into complex event processing 
queries that run in parallel (Vianello, et al., 2013) 
Complex Event Processing is a technique involves collecting events from different sources, 
filtering, and transforming, detecting patterns, correlating and aggregating them to complex 
  
16 
 
event. These systems employ temporal logic to some extent via various event processing 
languages in their processes. Esper and StreamBase are examples of Complex Event Processing 
frameworks used today together with event processing languages as EventFlow and 
StreamSQL (Albek & Bax. 2005). 
Ahmed et al. (2011) implemented MSFOMTL to define misuse patterns which are transformed 
to StreamSQL queries and run in Streambase platform. (Herrerías & Gómez, 2010) designed 
an Automated Forensics Diagnosis System that reconstructs attacks after incidents to carry out 
log analysis with the event correlation module. Therefore, the system detects multi-step attacks 
reducing false positives. The main objective of this system is reducing time required to search 
for digital evidence by forensic investigators as well as eradicating complexities involved in 
this process. 
2.7 Conclusion 
Computer systems, networks and software applications events are recorded in their log files. 
Log files serve as an important source of information for purposes of analysis of security 
breaches in the system. Since most systems maintain their events in log files, this is beneficial 
in identifying problems and security threats in the system through analysis of log files for 
pattern identification indicating suspicious system behaviour.  In the past, log file analysis was 
carried out manually which would lead to missing some event logs containing important 
information. Log files are large in size and this would prolong the process of log analysis. 
Similar to research done on improving the analysis of large log files, the proposed approach 
based on temporal logic and log reconstruction serves as a solution towards large log files 
analysis by minimizing the investigation scope using time windows. The Many Sorted First 
Order Metric Temporal Logic (MSFOMTL), a unique case of temporal logic will be used in 
this case study as it allows for the  specification of packets arrival time and enables 
representation of attacks formally and conduct runtime verification to detect their occurrences 
in the stream of incoming events.. It will be applied in modeling attack patterns for analysing 
log events over time periods. 
  
17 
 
CHAPTER THREE: METHODOLOGY 
3.1 Introduction 
A methodology involves a systematic approach to a resolution of an existing problem. It offers 
a theoretical understanding of which methods can be applied to specific cases to produce 
specific results (Irny & Rose, 2005).  This chapter is concerned with describing the 
methodology that will be employed to enable the proposed system answer the research 
questions outlined in Chapter 1. In this chapter the System Design and Analysis methods, 
Implementation methods, Testing and Evaluation methods are discussed. 
3.2 System Development Methodology 
The system design method used in this case study was the agile software development 
methodology. It provides opportunities to assess the direction of a project throughout the 
development lifecycle (Harvin, 2016). 
 
Figure 3. 1: Agile Software Development Methodology (Harvin, 2016) 
 
 
 
  
18 
 
In this methodology, each system task was allotted a time slot ensuring delivery of specific 
features for each system release based on the previous system functionality. The system was 
developed in incremental iterations or cycles (Szalvay, 2004). Each system release was tested 
ensuring continuous system quality. The following are the steps involved in system 
development process: 
 Planning: This step involves outlining the necessary procedures required to achieve 
this study’s research objectives. This included reviewing how web log analysis has been 
addressed by various researchers and existing systems and tools used to perform web 
log analysis as well as their merits and limitation. 
 System and Requirement Analysis: This involves analysis of data collected so as to 
identify requirements needed for system development. This included emulating web 
attack and extracting the web logs in preparation for analysis. Once the system 
requirements have been understood, a system design can be done determining the 
system structure and system functionalities.  
 System Design: The system requirements were translated into a technical description 
and design. It involved a web log analysis system architecture which was modelled in 
various design diagrams as use cases and sequence diagrams. 
 System Implementation: This included the development of the actual system 
dependent on the design diagrams produced in the previous phase. This was achieved 
through extraction of web logs from Apache web server and analysis using Esper 
Complex Event Processing engine running on a Linux distribution. 
 System Testing:  The system developed was presented to various system 
administrators for use. Sample web logs were loaded to the Esper, CEP engine as events 
for analysis. This aided in determining if the system functionalities satisfy the study’s 
research objectives, Feedback was collected and was used for further enhancements in 
the system under development.  
 System validity and reliability was checked by evaluating the success rate of tests 
during system requirement analysis. This was done to ensure he developed system 
answered the outlined research questions of this study.  
 
 
  
19 
 
3.3 System Analysis  
The system requirements were identified and analysed. The results were used to answer the 
research questions as well as aid in designing the web log analysis system. 
3.3.1 Feasibility Study 
This was conducted through literature review on existing and other proposed methods used for 
web server log analysis. 
3.3.2 Research Design 
The research design used was dependent on the specific research objectives and had to relate 
with the research questions (Kothari, 2004). It serves as a road map indicating how a researcher 
goes about answering specific case study research questions (Bryman & Bell, 2007). 
The quasi-experimental research design will be used for this test case. It tests the relations in 
given environments with the aim of analysing outcomes of interest based on treatments (Levy 
& Ellis, 2011). This research design has been chosen as the research analyses if the proposed 
web log analysis using temporal logic approach enables system network and security 
administrators as well as other information security professionals identify and define web 
server misuse and attack patterns as well as accurately analyse large log file in a timely manner. 
3.4 System Design  
This involves the sectioning of the system to be developed into components for purposes of 
studying how these components work together to achieve system functionality (Gemino & 
Parker, 2009).  The use case diagram models the system functionalities and gives an illustration 
of how system actors interact with the system processes known as use cases (Mishra & 
Mohanty, 2012). Use cases will be represented in texts and describe actions an actor can do in 
the system. In this system, the actors include a malicious web user who launches attacks on the 
web application and the forensic investigator who analyses the web server log files to extract 
evidence of the web attacks launched. Sequence and data flow diagrams will be used to 
illustrate how the developed system will handle different data flows between system processes. 
It aids in analysing the system to determine if the required system data and processes have been 
defined (Mohapatra & Joseph, 2014). 
  
20 
 
3.5 System Implementation  
The system included the Apache web server, WordPress Content Management System hosted 
on the Apache server, Apache web server logs and Esper, a complex event processing (CEP) 
engine, to query the attack patterns.  
i. Apache HTTP  Server 
Due to their publicity, these types of servers are mostly targeted by attackers and malicious 
web users. Web logs from these server will be extracted for analysis to identify the presence of 
web server attacks. 
ii. Web Application 
Widely available themes such as WordPress, Joomla, Drupal and Magento are used as web 
server development basis. Different web server activities built on a similar theme will leave 
same traces on log records. Thus, once patterns specific to a particular theme are defined, then 
it is possible to search for similar patterns indicating malicious events on other web sites built 
on the same theme (Gunestas & Bilgin, 2016). The diagram below indicates that 27.3% of all 
the websites use WordPress that is a content management system market share of 58.5%. 3.4% 
of all the websites use Joomla that is a content management system market share of 7.2% while 
2.2% of all the websites use Drupal that is a content management system market share of 4.8%. 
 
 
Figure 3. 2: Usage of Content Management Systems (World Wide Web Technology Survey, 
2016) 
 
  
21 
 
In this dissertation, WordPress has been selected as the main focus of this case study to define 
misuse patterns. The web application developed will be based on WordPress theme. The 
hosting of the website will be done on an Apache HTTP Server. 
iii. Complex Event Processing Engine 
Temporal logic, a form of modal logic is used widely in verifying the correctness of critical 
computer systems (Huth & Ryan, 2004). A special subset of temporal logic referred to as Many 
Sorted First Order Temporal Logic, MSFOTL, will be used to model web server attack patterns 
for analysing web logs which span over time. MSFOMTL entails special features that are 
efficient and concise in describing log events requiring investigation. After modelling the web 
attack patterns, the Event Processing Language, EPL, will be used to define these patterns into 
queries. A Complex Event Processing engine, Esper, will be used to query these patterns 
(Herreman, 2006).  
3.6 System Testing  
The system was tested against its specifications to verify whether it complies with the 
functional requirements. The V-Model testing methodology also referred to the verification 
and validation model was employed for system testing. In the V-Model, development and 
quality assurance activities are done simultaneously (Borba & Cavalcanti, 2007). There is no 
discrete phase called Testing, rather testing starts right from the requirement phase. As 
illustrated in Figure 3.4 below, User Acceptance Testing, System Testing, Integration Testing 
and Unit Testing occur simultaneously as verification and validation activities go hand in hand.  
 
 
 
  
22 
 
 
Figure 3. 3: V-Model Testing Methodology (Borba & Cavalcanti, 2007) 
 
3.7 System Evaluation 
The system will be evaluated by the developer to establish its validity and whether the study’s 
research objectives are achieved. This is essential as it will indicate if the system developed 
will enhance web log analysis process. 
3.8 Conclusion 
This chapter has provided an overview of methods that will be used to ensure that the proposed 
system meets the research objectives as well as answering the research questions. 
 
 
 
 
 
  
23 
 
CHAPTER FOUR: SYSTEM DESIGN AND ARCHITECTURE 
4.1 Introduction 
The web server log analysis system based on temporal logic and reconstruction was 
implemented with the sole aim of enabling network and system security administrators define 
web server attack patterns for fast and easier identification of the attacks. This chapters 
illustrates the system architecture, system design and the components used to implement the 
proposed web server log analysis system. Interaction diagrams have been used to aid in 
illustrating the interaction between users and the system as well as data flow between various 
system modules required for proper system function.   
4.2 System Architecture 
The web log analysis system comprises of a web server, web server logs, defined web attack 
patterns using temporal logic and a complex event processing engine used to query and analysis 
web attack patterns. 
 
Figure 4. 1: Web Log Analysis System Architecture 
 
  
24 
 
4.2.1 Web Server 
The web server selected for this system is Apache web server due to its ease of use as well as 
its vulnerability to most web attacks including SQL injection attacks, brute force attacks and 
remote shell injection attacks. Web logs from the Apache web server will be extracted for 
analysis and identification of these attacks. 
 
4.2.2 Web server Logs 
There exists two types of Apache log files namely access and error logs. Access logs contain 
information related to client requests to web server and are used for analysing to traffic to the 
web server (Fry, 2011). The system will specifically analyse the access logs. 
I. Access Logs 
These logs record all client requests sent to the web server. The Custom log directive controls 
the location and contents of access logs. The log format directive is used to simplify the 
selection of log contents. The log format is specified and the logging can optionally be made 
conditional on request characteristics using environment variables. These access logs are 
formatted to three standards: 
A. Common Log Format 
The log format is used by the AWS access logs, the format is indicated below (Aulds, 2000): 
[remotehost] [identd] [authuser] [date] [request URL] [status] [bytes] 
The remotehost indicates the IP address of the clients that sent request to the web server. The 
second field, identd, contains the identity of the visitor such as email address or any other 
unique identifier. The authuser field contains the clients’ username credentials. This field 
appears only when the client requests a protected document requiring a user ID and password 
(Wainwright, 2008). The common format was modified to custom log file format as shown 
below: 
LogFormat    "%h  %l   %u  %t     \ " %r \ "   %>s     %b" common 
 
Example: 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326 
  
25 
 
The format string consists of directive percentages, each informing the server to log a particular 
piece of information. The entries give details about a client who had made a request to the web 
server. Table 4.1 gives a detailed description of each directive in the log format (Grace, 
Maheswari, & Nagamalai, 2011). This system works mostly with this format. 
Log Field Directive Example Description 
Remote 
Host 
%h (127.0.0.1) IP address of client who made request to web 
server. 
Identity %l    (-) The hyphen after the client’s IP address 
indicates that the requested information is 
unavailable 
AuthUser %u   (frank) This is the user ID of client requesting a 
document. It is determined by HTTP 
authentication. 
Date %t   ( [10/Oct/2000:13:55:36 
-0700]) 
This indicates the time and date. It resembles 
the format [day/month/year: hour: minute: 
second zone] 
Request 
URL 
\" %r \ "   ("GET /apache_pb.gif 
HTTP/1.0") 
This is the client’s request in quotes. GET is 
the method used apache_pb.gif is the 
information requested by the client. The 
protocol used by the client is given as 
HTTP/1.0 
Status %>s   (200) This is the status code sent by the server. The 
codes beginning with 2 for successful 
response, 3 for redirection, 4 for error caused 
by the client, 5 for error in the server 
Bytes %b   (2326) The last entry indicates the size of the object 
returned to the client by the server, not 
including the response headers. If there is no 
content returned to the client, this value will be 
"-" 
 
Table 4. 1: Log Format Directive Percentages 
  
26 
 
B. Combined Log Format 
The Combined Log Format is another frequently used format string as shown below.z 
LogFormat "%h  %l  %u  %t  \"%r\"  %>s  %b  \"%{Referer}i\"  \"%{User-agent}i\" 
" combined 
This format resembles the Common Log Format with an additional two fields, each uses the 
directive percentage %{header}i, where header can be any HTTP request header. 
[remotehost] [identd] [authuser] [date] [request URL] [status] [bytes] [referrer] [agent] 
The referrer field indicates that the client request was generated by clicking a link from 
different page in another web site and header content shows the page URL. The agent field 
represents the browser used by the client, requester (Fry, 2011). 
C. Multiple Access Logs 
They are created by specifying multiple Custom Log directives in configuration file. The 
configuration is as shown below. 
LogFormat "%h %l %u %t \"%r\" %>s %b" common 
CustomLog logs/access_log common 
CustomLog logs/referer_log "%{Referer}i -> %U" 
CustomLog logs/agent_log "%{User-agent}i" 
There are three files created as access logs containing client information. It is a combination of 
common log format and combined log format. The first line is basic common log format 
information and the second line is referrer and browser information (Grace et al., 2011). 
4.2.3 Web Attack Patterns 
Many Sorted First Order Temporal Logic, MSFOTML, a special subset of temporal logic is 
used to model the web server attack patterns. For illustration purposes,  
P (X1,X2, X3, X4,X5,X6,X7,X8,X9,X10) where; 
X1- represents client IP address. 
X2- represent the host. 
  
27 
 
X3- represent the username or user ID. 
X4- date-time variable representing the Timestamp. 
X5- represents Request Method. 
X6- represents Request URL 
X7 -represents Request code. 
X8- represents sent bytes. 
X9- represents the referrer. 
X10- represents user agent. 
There are four pattern types which are classified into; 
i. Single Record Pattern 
From a single log record, one can deduce a specific log activity hence defining a pattern 
representing corresponding activity. While defining these patterns, keywords or regular 
expressions matches are employed over log entry values. An example of such a pattern can be 
written to represent a SQL injection attack using GET method. 
 
Figure 4. 2: SQL Injection Attack using GET method Pattern 
 
ii. Multiple Record Patterns 
These record patterns are used in representing complex events comprising of more than one 
events spanned over time that are related to each other. Not only are related multiple record 
patterns defined but also those corresponding to evidence of complex patterns by including 
multiple single record patterns. An example of such a pattern can be written to represent a brute 
force login attack against a WordPress login page which displays when unsuccessful login 
attempts occur. 
  
28 
 
 
Figure 4. 3: Brute Force Attack Pattern 
 
iii. Compound Record Patterns 
It is possible to develop complex event patterns comprising of single and multiple record 
patterns in cases where single record patterns are inadequate in addressing log activities 
precisely. This aids in corroborating evidence of an activity that occurred in the system. 
Compound record patterns are defined in the event that single or multiple record patterns fail 
to differentiate an expected activity leading to false positives alerts. 
iv. Abstract Record Patterns 
Some compound patterns have similarities hence can be abstracted from. After abstraction, 
general record patterns that address more system misuse activities are obtained.   
4.3 Requirement Analysis 
Based on the study’s research objectives and the system user requirements, this section outlines 
various requirements that ought to be met by the system developed. 
4.3.1 Functional Requirements 
1. Connected to Apache2 web server and view logs related to HTTP requests and responses 
based on WordPress CMS. 
a. No duplication of web server access logs. 
b. Retrieved metadata of each access log.  
c. Log2CSV parser to convert log metadata to CSV form and store in CSV file with the 
following headers; 
i. Host 
ii. Log Name 
iii. Date Time 
  
29 
 
iv. Time Zone 
v. Method 
vi. URL 
vii. Response Code 
viii. Bytes Sent 
ix. Referrer 
x. User Agent 
2. Created a CSV Input Adapter to send log data stored in CSV file as events to Esper CEP 
engine for analysis. 
3. Performed analysis on each web log to identify web misuse cases based on defined web 
attack patterns. 
4.3.2 Non-functional Requirements 
4.3.2.1 Software Requirements 
i. Apache2 web server. 
ii. Simple web application based on WordPress theme. The web application is hosted on 
the Apache2 web server. 
iii. Log2CSV parser to convert the access log metadata to CSV form. 
iv. Eclipse IDE for Java Developers (Neon.3) for system code development. 
v. Esper 6.0.1, a Complex Event Processing engine used for log event analysis. 
vi. Java, a general purpose programming language for coding. 
vii. Windows 10 operating system 
4.3.2.2 Hardware Requirements 
i. Machine: Intel Core i5 or higher 
ii. Clock speed and processor: 2.5OGHz or higher 
iii. System Memory: 4 GB. 
 
4.4 System Design 
The system design gives a description of the architectures, components, modules, interfaces 
and data for a specific system. It aids in studying how system components interact and function 
(Gemino & Parker, 2009). The sections below include notations used to describe the designed 
system. 
 
  
30 
 
4.4.1 Context Diagram 
A context diagram specifies details of the system design. It illustrates the external entities 
which interact with the system. It displays the inputs and outputs to and from the external 
entities. The figure below shows the context diagram of the web log analysis system. 
 
 
Figure 4. 4: Web Log Analysis Context Diagram 
 
4. 5 System Model 
The system model includes the system inputs, system processes and system outputs. The UML 
notation used in describing the system model includes the use case diagram and the sequence 
diagrams.    
 
4.5.1 Use Case Diagram 
A use case diagram is a representation of the interactions between a user and a system or 
between a system and another system under observation hence capturing the functional aspects 
of the system (Aggarwal, 2005). Figure below gives an illustration of the use case for the web 
log analysis system. 
  
31 
 
 
Figure 4. 5: Use Case Diagram for Web Server Log Analysis Platform 
 
i. View Apache Access Logs Use Case Description 
 
Use Case Title View Apache Access Logs 
Description This use case scenario describes the steps of viewing Apache 
web access logs. 
Actors System Administrator (User) 
Pre-Conditions User has logged in as root. 
Basic Flow 1. User changes directory to /var /log/apache2/access.log  
2. The terminal displays access log metadata. 
  
32 
 
Post Conditions User is able to view the access logs and their metadata. 
Frequency of 
Use 
Many 
 
Table 4. 2: View Apache Logs Use Case 
 
ii. Create Apache Access Log Datasets Use Case Description 
 
Use Case Title Create Apache Access Log Datasets 
Description This use case scenario describes the steps of creating access log 
datasets. 
Actors System Administrator (User) 
Pre-Conditions User has logged in as root. 
Basic Flow 1. Create a log2CSV parser 
2. Run the parser across access logs 
3. Terminal displays access log metadata in CSV form. 
Post Conditions User is able to convert access log metadata to CSV format. 
Frequency of 
Use 
Many 
 
Table 4. 3: Create Apache Logs Dataset Use Case 
 
iii. Load Datasets to Web Log Analysis System Use Case Description 
 
Use Case Title Create Apache Access Log Datasets 
Description This use case scenario describes the steps of loading the datasets 
to the web log analysis system 
Actors System Administrator (User) 
Pre-Conditions User accesses the Esper CEP engine. 
Basic Flow 1. Create a CSV Input Adapter java class. 
2. CVS Input Adapter reads log metadata from the CSV 
files and sends them to Esper as events. 
  
33 
 
Post Conditions User is able to convert access log metadata to CSV format. 
Frequency of 
Use 
Once 
 
Table 4. 4: Load Datasets Use Case 
 
iv. Create EPL Requests 
 
Use Case Title Create Apache Access Log Datasets 
Description This use case scenario describes the steps of loading the datasets 
to the web log analysis system 
Actors System Administrator (User) 
Pre-Conditions User accesses the Esper CEP engine. 
Basic Flow 1. User creates SQL like statements using Event Processing 
Language defining a specific web attacks. 
2. User runs these queries across log datasets. 
Post Conditions User is able to create EPL queries. 
Frequency of 
Use 
Many 
 
Table 4. 5: Create EPL Requests Use Case 
 
v. View EPL Responses 
 
Use Case Title Create Apache Access Log Datasets 
Description This use case scenario describes the steps of loading the datasets 
to the web log analysis system 
Actors System Administrator (User) 
Pre-Conditions User accesses the Esper CEP engine. 
  
34 
 
Basic Flow 1. User runs EPL queries across log datasets. 
2. System displays EPL results to user. 
Post Conditions User is able to view EPL queries results. 
Frequency of 
Use 
Many 
 
Table 4. 6: View EPL Responses Use Case 
 
vi. Identify Web Attack 
 
Use Case Title Create Apache Access Log Datasets 
Description This use case scenario describes the steps of loading the datasets 
to the web log analysis system 
Actors System Administrator (User) 
Pre-Conditions User accesses the Esper CEP engine. 
Basic Flow 1. System displays EPL queries results to user. 
2. User identifies log metadata containing web attack 
evidence. 
Post Conditions User is able to identify web attacks. 
Frequency of 
Use 
Many 
 
Table 4. 7: Identify Web Attack Use Case 
 
4.5.2 Sequence Diagrams 
A sequence diagram displays how external system actors generate their order of intersystem 
events (Larman, 2002). Figures 4.6 and 4.7 below illustrate sequence diagrams indicating 
objects interaction.  
  
35 
 
 
Figure 4. 6: Sequence Diagram for Data Collection 
 
Figure 4.6 above represents the first stage in the system. The process is activated when the user 
runs the python script. Figure 4.7 below represents the second stage in the system, this include 
the steps taken for log analysis. 
 
 
Figure 4. 7:  Sequence Diagram for Log Analysis 
 
Figure 4.7 above represents the second stage in the system. The process is activated when the 
user loads the Apache log CSV files to Esper CEP engine for processing. Esper then presents 
the results to user for analysis.  
  
36 
 
CHAPTER FIVE: SYSTEM IMPLEMENTATION AND TESTING 
5.1 Introduction 
This chapter describes the implementation of the proposed system which involved Apache log 
parsing and Apache log analysis using temporal logic to detect HTTP attacks. The systems’ 
key features are highlighted as well as system screenshots indicating users’ screens and back 
end screens. This section of this chapter describes the various system tests carried out for its 
functionalities and usability according to the system objectives. 
5.2 System Specification 
The system holding the application will have the following specifications: 
i. Windows 10 64-bit operating system 
ii. 4GB RAM 
iii.  20GB disk. 
iv. Eclipse IDE for Java Developers (Neon.3) for system code development. 
v. Esper version 6.0.1 which is the core complex event processing engine for log event 
analysis. 
vi. Log2CSV parser to convert the access log metadata to CSV form.  
vii. Apache 2 which is the web server  installed on a virtual machine with the following 
specifications; 
 Ubuntu 16.04 operating system 
 Memory of 512 MB RAM 
 Hard disk size of 9 GB  
 
5.3 System Implementation and Testing 
5.3.1 Web Server Configuration 
Apache2 web server was installed in the Ubuntu virtual machine and a WordPress based web 
site was created and hosted on the Apache web server, see Appendix A. In order to capture the 
web server logs consisting of specific user activities, Apache2 service has to be started as 
shown below in figure 5.1. 
  
37 
 
 
Figure 5. 1: Apache2 Start Service 
The web server logs are stored in the access.log file under /var/log/apache2 directory. 
 
Figure 5. 2: Web Server Logs Directory Location 
5.3.1.1 CSV (Comma Separated Values) 
CSV files were also used to store retrieved Apache access log metadata in comma separated 
values (CSV) form as in the example shown in figure 5.3 below. 
 
Figure 5. 3: Web Server Logs CSV Format 
5.3.2 Web Log Analysis Configuration 
The main component of the log analysis, Esper, was installed in the Eclipse Neon.3 IDE 
running on Windows 10 operating system, see Appendix B. In order to reconstruct the web 
activities through Apache server log entries, the records from the access log file are first read 
then sent to the Esper engine as events. To do so, an Apache Log Input adapter was developed 
in Java for reading the access log metadata as input streams for processing.  
5.3.2.1 Apache Log Input Adapter Configuration 
Currently Esper provides only 7 types of adapters such as AMQP, CSV and HTTP, none of 
them supports Apache log files hence an Apache Log input adapter had to be developed in Java 
language, see Appendix D. First the log records are converted to a CSV file via a log parser. 
The log parser, accesslog2csv.pl, written in Perl language was downloaded from GitHub 
website link http://github.com/woonsan/accesslog2csv/blob/master.accesslog2csv.pl (Last 
accessed 20th April 2018). 
  
38 
 
 
Figure 5. 4: Log Parser Script 
The input adapter is set to generate log events and read through the EPL queries. The code 
snippet below shows how the adapter is set up to retrieve log metadata from the log.java class, 
see Appendix C. 
 
Figure 5. 5: Input Adapter Event Generation 
The adapter loops over all EPL queries, this depends on the number of queries. For the adapter 
to loop over only one query, the system user can modify the for-loop in the code. 
  
39 
 
 
Figure 5. 6: Input Adapter for- loop 
The adapter reads through the Apache logs in the CSV form and generates EPL responses based 
on the EPL queries. 
 
Figure 5. 7: Input Adapter CSV Configuration 
See Appendix D for full configuration 
Figure 5.8 below indicates the system network topology consisting of the system devices used. 
 
Figure 5. 8 : Web Log Analysis System Network Topology 
  
40 
 
5.4 System Features 
Following the complete installation and configuration of all system components on the Ubuntu 
virtual and the Windows 10 machines, the following were the features available on the web log 
analysis platform. 
5.4.1 Viewing Apache Access Logs 
Specific system user activities on the web site are captured in the web server logs. In order to 
view the logs from the Apache 2 web server, the system user accesses the var/log server 
directory. Figure 5.9 shows an example of the access logs that were viewed by the system 
administrator using the command sudo tail -100 /var/log/apache2/access.log which displays 
the last 100 web server logs.  
 
Figure 5. 9: Apache Web Server Access Logs 
5.4.2 Creating Web Server Logs Datasets 
The web server access logs datasets were created through the process of log parsing. To do so, 
the Apache access log records were converted to CSV (Comma Separated Values) form. This 
was done through an Apache log parser written in perl script see figure 5.4 above.  
Access log metadata included; 
i. Host 
ii. Log Name 
  
41 
 
iii. Date Time 
iv. Time Zone 
v. Method  
vi. URL 
vii. Response Code 
viii. Bytes Sent 
ix. Referrer 
x. User Agent 
Figure 5.10 below shows the raw format of the access logs. 
 
Figure 5. 10: Apache Logs in Raw Format 
To convert the access log data to CSV, the command in figure 5.11 below was run and the 
records were saved in the accesslogs.csv file. 
 
Figure 5. 11: Log Data Conversion to CSV Command 
When parsed the raw access log format in figure 5.9 above can be displayed in a more detailed 
output with specified fields giving information on each metadata of the access log as seen in 
figure 5.12 below. 
 
 
  
42 
 
 
Figure 5. 12: Apache Log CSV File 
5.4.3 Loading Datasets to Web Log Analysis System 
Following the creation of the Apache logs datasets in CSV format, the log files are then loaded 
to the CEP engine, Esper, for processing by the Apache Log Input adapter already created.  
 
Figure 5. 13: Log CSV Loaded to Analysis System 
5.4.4. Creating EPL Requests 
The temporal formulas in MSFOMTL as indicated in figure 4.2 are translated to EPL (Event 
Processing Language) requests which are applied over Esper. Simple EPL requests are 
designed to filter the web log records one at a time over one input stream. This results in small 
memory space consumption. Fields, additional expressions or user- defined functions over 
those fields with key roles in introducing the activity to the system user in the SELECT clause. 
Filters are designed through WHERE clause where logical operators along with built-in or user 
  
43 
 
define functions are employed. Figures 5.13 and 5.14 shows SQL-like EPL queries defining 
SQL injection attack using GET method and POST method. 
 
Figure 5. 14: SQL Injection Attack Using GET method EPL Query 
The query above receives Apache log records as shown in line 2, filters out records of which 
response value is ‘400’, method is equal to ‘GET’ and url contains the signature , ‘%or%=%’ 
that most SQL injection attacks feature in line 3. . In line 1, output the filtered records as an 
intended activity that address the host, timestamp, time zone, url, response code and user agent 
containing the signature and agent used through the attack.    
 
 
Figure 5. 15: SQL Injection Attack Using POST method EPL Query 
The query above receives Apache log records as shown in line 2, filters out records of which 
response value is ‘200’, method is equal to ‘POST’, url value contains location of the dynamic 
query page ($querypage) which has potential for SQL injection attack as shown in line 3. Since 
Apache log records do not store content through POST methods, one can define a maximum 
size value ($threshold) for responses. In line 1, output the filtered records as an intended 
activity that address the host,  timestamp, time zone , url and user agent containing the signature 
and agent used through the attack. 
5.4.5 Viewing EPL Responses 
EPL requests created are run through the Apache logs with the aid of the Apache Log Input 
adapter. The EPL query defining a SQL injection attack is run through the 
Apacheaccesslog.CSV and the EPL response is as seen in figure 5.16 below. 
  
44 
 
 
Figure 5. 16: EPL Response 
5.4.6 Identifying Web Attacks 
In order for the system administrator to identify a web attack through the log analysis, the 
following system steps are performed: 
i. The web attack is logged into the Apache log file 
Traces of the SQL Injection attack using GET method are logged into the access.log file of the 
Apache web server as shown in the red box in figure 5.17 below. 
 
Figure 5. 17: SQL Injection using GET Method Traces in Apache Log 
ii. Raw data log file records are converted to CSV and loaded to Esper 
The raw data log files as shown in figure 5.17 above are then converted to CSV data format by 
running the command shown in figure 5.11. The CSV file is then loaded to the CEP engine, 
Esper, for processing as indicated in figure 5.13. 
iii. The CSV database is queried  
The CSV database containing the log file records is then queried using temporal logic formula 
in Event Processing Language. The system administrator runs the SQL Injection attack using 
GET method EPL query, see figure 5.13. 
  
45 
 
iv. EPL result analysis 
The EPL response is as shown in the figure 5.18 below is then analysed by the system 
administrator.  
 
Figure 5. 18: SQL Injection Attack using GET method Log Output 
From the EPL query, the web log analysis system generates logs with response code equals to 
400, method is equal to ‘GET’ and url contains the signature ‘%or%=%’ as shown in the red 
box in figure 5.17 above.  
5.5 System Testing 
This section describes the various tests carried out on the system developed to ensure that it 
works well. The system was evaluated against both functional and non-functional requirements 
System testing was categorized into two sections, developer testing and web system user 
testing. System tests performed by the developer were to ensure that the system’s various 
functionalities were working well, tests included user acceptance testing and unit testing. 
5.5.1 User Acceptance Testing 
This testing was done by performing various web server user activities that were captured via 
the web server logs and were analysed by the web analysis platform. These activities were: 
i. Successful user login with administrator account. 
ii. Failed user login attempt. 
iii. Brute force attack. 
iv. SQL injection attack  
v. Identification of the attacks. 
This proposed web server log analysis platform majorly focused on WordPress theme as the 
case study. The results for the above tests are displayed in the figures below, where these user 
activities were captured in the web logs then detected by the web server log analysis system. 
WordPress successful login is a web activity where the web user successfully logs into the web 
site as shown in figure 5.18 below.  
  
46 
 
 
Figure 5. 19 : WordPress Successful Login 
The website responds to the client with code 302 which corresponds to redirection of the page 
from wp-login.php. The green rectangle in figure 5.19 below indicates successful user login 
traces captured in the Apache logs. 
 
Figure 5. 20: WordPress Login Traces in Apache Log 
WordPress failed login is a web activity where the user attempts to log into the web site 
unsuccessfully as shown in figure 5.20 below.  
  
47 
 
 
Figure 5. 21: WordPress Failed Login 
WordPress login page, wp-login.php is designed to accept login credentials through POST 
method. The website responds with code 200. This means the page returns successfully with a 
response message indicating a login failure. 
 
Figure 5. 22: WordPress Failed Login Traces in Apache Logs 
In web server misuse cases where the web user tries to retrieve the login credentials through a 
trial and error method.  
  
48 
 
 
Figure 5. 23 : Failed Login Attempt One 
 
Figure 5. 24 : Failed Login Attempt Two 
The continual failed login attempts traces are captured in the Apache logs as web activities 
entailing a repeating pattern. 
 
Figure 5. 25: Brute Force Login Attack Traces in Apache Logs 
The system administrator then proceeds to creating EPL query emulating the attack as shown 
in figure 5.25 below. 
 
Figure 5. 26 : Brute Force EPL Query 
  
49 
 
The query above receives Apache log records as shown in line 2, filters out records of which 
response value is ‘200’ and method is equal to ‘POST’ that most brute force attacks feature in 
line 3.. In line 1, output the filtered records as an intended activity that address the host, 
timestamp, time zone, url, response code and user agent containing the signature and agent 
used through the attack 
 
Figure 5. 27 : Brute Force Log Output 
5.5.2   Unit and Integration Testing 
In unit testing, the specific system units were tested for operation. The software components 
was tested separately to ensure that each performed their required function. Integration testing 
was performed when two or more system components were integrated and tested for their 
system functionality. During unit and integration testing, some system errors were encountered 
and rectified. Tests were done by checking if each system component were displaying the 
required output. 
5.5.2.1 Apache2 Web server 
Apache2 web server was installed in Ubuntu 16.04 environment. A WordPress web site was 
designed and hosted in the web server. The test was to check if the Apache web server activities 
were being captured .This was done by first starting the Apache service and checking the logs 
in the access.log file. 
  
50 
 
 
Figure 5. 28: Web Server Activities Captured in Apache Logs 
Figure 5.20 above shows us that the web server activities were being captured hence the Apache 
service was running properly. 
5.5.2.2 Esper Complex Processing Engine 
The Esper CEP was configured to run through the log files with the aid of the Apache Log 
Input Adapter and EPL queries. The figure 5.21 below shows the engine’s output once queried 
to select all log file events. 
 
Figure 5. 29: All Log Events Output 
 
 
  
51 
 
CHAPTER SIX: DISCUSSION OF RESULTS 
6.1 Introduction 
The purpose of the dissertation was to study, understand and examine typical attacks on web 
servers focusing on HTTP attacks, to identify and generate log files of these attacks for 
analysis, to review the existing systems available for web server log analysis, to design and 
develop a web server log analysis platform that can be used to define attacks patterns using 
temporal logic and to validate the capability of the proposed solution in increasing the quality 
of log analysis to detect web attacks. This was done in order to develop a suitable technique 
that will be used by system and network administrators to monitor and identify web server 
misuse activities. 
6.2 Findings and Achievements 
A review of the literature indicated that there is an increased rate of web server misuse.  Web 
servers’ events are stored in the log files which server as important sources of evidence during 
web server misuse or crime incidents. The  existing systems used for the extraction and analysis 
of these evidence face major challenges due to the enormous sizes of web log files and 
complexities in understanding the attack patterns connected to the crime. This leads to slow 
log analysis which is time consuming. Currently, system network and security professional 
lack efficient standard platforms to define and share attack patterns in regards to log analysis. 
The developed system focused on web server misuse and analysis of Apache log files. Formal 
attack patterns were developed in MSFOMTL (Many Sorted First Order Metric Temporal 
Logic) and the related queries in EPL. The analysis platform is built on Esper, and tests carried 
out reveal that the system increases the quality of log analysis process and would aid in digital 
forensics aspects to identify system misuse cases in a timely manner. The developed system is 
based on open-source tools that are widely documented and supported meaning that the cost is 
not a major factor when choosing this system. EPL queries can be saved in EPL libraries, this 
in turn speeds up misuse investigations as it saves time by focusing on something novel instead 
of searching through already know attack patterns.   
 
 
  
52 
 
6.3 Review of Research Objectives 
This dissertation identifies the challenges faced by system and network administrators in 
identifying web server misuses cases. A web server log analysis was developed with a selected 
technique from the literature review and the system results from the system analysis. This 
research was guided by the five research objectives outlined in Chapter 1.  
The first objective was to study, understand and examine typical attacks on web servers 
focusing on HTTP; to the extent of this objective, these include cross site scripting, SQL 
injection, file inclusion and brute force attacks. The second objective was to identify and 
generate log files of these attacks for analysis; the main log file focused in this research was 
the access log file where traces of these web attacks were identified.  
The third objective was to review the existing systems available for web server log analysis; a 
number of web log analysis applications and mechanisms were identified and their merits and 
limitations were reviewed and documented. The forth objective was to develop a web server 
log analysis system used to define web attack patterns using temporal logic; attack patterns 
were defined through MSFOMTL while corresponding queries were written in EPL.  The EPL 
queries were run on Esper, an open source CEP engine, resulting to identification of web 
attacks. The fifth and final objective was to test the system developed to proof its capability in 
increasing the quality of log analysis to detect web attacks; through system testing and 
evaluation, the expected results were verified in accordance to the system functional 
requirements.   
 
 
 
 
 
 
 
  
53 
 
CHAPTER SEVEN: CONCLUSIONS AND RECOMMENDATIONS  
7.1 Introduction 
This chapter gives a summary of the research study. The various existing methods and web log 
analysis techniques researched and implemented by different authors were reviewed and 
technological gaps and options available for this system were identified.  The design, 
architecture and requirements of this system were identified as well. Temporal logic approach 
was used for system implementation. System testing and evaluation was performed throughout 
system development.  
7.2 Conclusions 
The sole purpose of this research was to develop a system that would aid network and system 
security administrators identify HTTP attacks, specifically SQL injection and brute force 
attacks through analysis of web server logs using temporal logic approach and log 
reconstruction.  This comes as an aid for business and IT entities to ensure that they are able to 
identify web server misuses. The system tests revealed this analysis platform increases the 
quality of log analysis process thus contributing to digital forensics in various aspects.  This 
approach could be used to enhance host based intrusion detection mechanisms.  
7.3 Recommendations 
From the results discovered during the research study, the following came out as 
recommendations; 
i. The system should be developed to analyse more web attacks such as cross site 
scripting, file inclusion, Xpath injection and command execution detection. This gives 
the system more flexibility. 
ii. The system accuracy can be enhanced by having more functionalities that focus on 
analysing logs located in the error.log files. 
iii. The system should be made available for other web themes including Joomla and 
Drupal. 
iv. The system should be developed to analyse other log files such as system security and 
audit events not just web server events. 
  
54 
 
7.4 Future Work 
With further research, this system may be applicable in the business intelligence sector in use 
case analysis. In addition, it may aid in enhancing SIEM and intrusion detection mechanisms 
in building misuse and anomaly based Network Intrusion Detection Systems in which temporal 
formalisms for representing attack patterns are combined. Network and System Security 
administrators will then be able to run a number of queries previously stored in the EPL 
libraries. This will greatly save on time during web server attack investigations as the focus 
will be on searching for new attacks rather than searching for attack patterns already known 
and stored. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
55 
 
REFERENCES 
Ahmed, A., Lisitsa, A., & Dixon, C. (2011). A misuse-based network intrusion detection system 
using temporal logic and stream processing. Network and System Security 
International Conference. 
Albek, E., Bax, E., Billock, G., Chandy, K. M., & Swett, I. (2005). An event processing 
language (epl) for building sense and respond applications. IEEE International Parallel 
and Distributed Processing Symposium. 
Arasteh, R. A., Debbabi, M., Sakha, A., & Saleh, M. (2007). Analyzing multiple logs for 
forensic evidence. Digital Investigation. 
Aulds, C. (2000). Linux Apache Web Server Administration (Craig Hunt Linux Library Series). 
SYBEX Inc. 
Barret, B. (2014, May 28). Retrieved January 7, 2017, from Home of the Webalizer: 
http://www.webalizer.org/ 
Borba, P., & Cavalcanti, A. (2007). Testing Techniques in Software Engineering. Brazil: PSSE. 
Bryman , J. M., & Bell, E. (2007). Business Research Methods Revised Editon. Oxford 
University Press. 
Calyptix. (2016, August 1). Top 5 Cyber Attacks Types in 2016. Retrieved January 4, 2017, 
from http://www.calyptix.com/top-threats/top-5-cyber-attack-types-in-2016-so-far/ 
Calzarossa, M. C., & Massari, L. (2011). Analysis of web logs: challenges and findings. 
Springer Berlin Heidelberg. 
Cohen, M. I. (2008). PyFlag–An advanced network forensic framework. Digital investigation. 
Destailleur, L. (2015). Retrieved January 7, 2017, from AWStats log file analyzer: 
http://www.awstats.org/ 
Fry, A. (2011). A Forensic web Log Analysis Tool: Techniques and implementation . Québec, 
Canada: Concordia University Montréal. 
Gemino, A., & Parker, D. (2009). Use case diagrams in support of use case modeling: Deriving 
understanding from the picture. Journal of Database Management. 
Goel, N., & Jha, K. C. (2013). Analyzing Users’ Behavior from Web Access Logs using 
Automated Log Analyzer tool. International Journalof Computer Applications. 
Gordey, S. (2010). Web Application Security Statistics. Retrieved January 4, 2017, from 
http://projects.webappsec.org/w/page/13246989/Web%20Application%20Security%2
0Statistics 
  
56 
 
Grace, J. L., Maheswari, V., & Nagamalai, D. (2011). Analysis of Web Logs and Web User in 
Web Mining. International Journal of Network Security & Its Applications. 
Gunestas, M., & Bilgin, Z. (2016). Log Analysis Using Temporal Logic and Reconstruction 
Approach: Web Server Case. The Journal of Digital Forensics, Security and Law. 
Halfond, W. G., Viegas, J., & Orso, A. (2006). A classification of SQL-injection attacks and 
countermeasures. International Symposium on Secure Software Engineering . 
Harvin, H. (2016, August 27). Agile Estimation and Planning. Retrieved from Henry Harvin 
Education: http://certificationcourses.henryharvin.com/Agile-development-
methodology-provides-opportunities-to-assess-the-direction-of-a-project-throughout-
the-development-lifecycle-Agile-methodologies-are-an-alternative-to-waterfall-or-
traditional-seq/b94 
Haven, R. W., Lunt, B., & Teng, C. C. (2012). Naïve Bayesian filters for log file analysis: De-
spam your logs. In 2012 IEEE Network Operations and Management Symposium. 
Herreman, D. (2006). EsperTech Event Series Intelligence. Retrieved December 17, 2016, from 
http://www.espertech.com/esper/ 
Herrerías, J., & Gómez, R. (2010). Log analysis towards an automated forensic diagnosis 
system. Availability, Reliability, and Security International Conference. 
Huth, M., & Ryan, M. (2004). Logic in Computer Science: Modelling and reasoning about 
systems. Cambridge University Press. 
Irny, S. I., & Rose, A. A. (2005). Designing a Strategic Information Systems Planning 
Methodology for Malaysian Institutes of Higher Learning. Issues in Information 
Systems , VI (1). 
Jain, K. R., Kasana, R. S., & Jain, S. (2009). Efficient Web Log Mining using Doubly Linked 
Tree. International Journal of Computer Science and Information Security. 
Janot, E., & Zavarsky, P. (2008). Preventing SQL Injections in Online Applications. Ghent, 
Belgium: Application Security Conference. 
Jayathilake, P. W. (2011). A novel mind map based approach for log data extraction. 
International Conference on Industrial and Information Systems . 
Jerkovic, J. I. (2009). Essential Techniques for Increasing Web Visibility SEO warrior. 
O'Reilly Media, Inc. Retrieved from http://yourproseo.com/wp-
content/uploads/2014/10/seo_warrior.pdf 
Kalamatianos, T., Kontogiannis, K., & Matthews, P. (2012). Domain independent event 
analysis for log data reduction. In 2012 IEEE 36th Annual Computer Software and 
Applications Conference. 
  
57 
 
Kothari, C. (2004). Research Methodology. New Age International. 
Koymans, R. (1990). Specifying real-time properties with metric temporal logic. Real-time 
systems. 
Kumar, C. (2016). Apache Web Server Hardening & Security Guide. Retrieved February 2, 
2017, from https://geekflare.com/apache-web-server-hardening-security/ 
Levy, Y., & Ellis, T. J. (n.d.). 2011.  
Levy, Y., & Ellis, T. J. (2011). A guide for novice researchers on experimental and quasi-
experimental studies in information systems research. Interdisciplinary Journal of 
information, knowledge, and management. 
Makanju, A., Nur Zincir-Heywood, A., & Milios, E. E. (2009). Clustering event logs using 
iterative partitioning. In Proceedings of the 15th ACM SIGKDD international 
conference on Knowledge discovery and data mining. 
Mishra, J., & Mohanty, A. (2012). Software Engineering. New Delhi, India: Dorling 
Kindersley. 
Mohapatra, S., & Joseph, P. (2014). Management Information Systems in the Knowledge 
Economy. PHI Learning. 
Oliver, B. (2010). Teaching Fellowship: Benchmarking Partnerships for Graduate 
Employability. Cutin University. 
OWASP. (2016). Retrieved February 2, 2017, from Comman Injection: 
https://www.owasp.org/index.php/Command_Injection 
OWASP Application Security Community. (n.d.). Retrieved December 15, 2016, from 
Welcome to OWASP: https://www.owasp.org/index.php/Main_Page 
PyFlag. (n.d.). Retrieved January 7, 2017, from PyFlag Tutorial: 
http://pyflag.sourceforge.net/Documentation/tutorials/ 
Ristic, I. (2005). Apache Security. Retrieved December 14, 2016, from The complete guide to 
securing your Apache Server: https://www.feistyduck.com/library/apache-
security/online/index.html 
Roll- Hansen, N. (2009). Why the distinction between basic (theoretical) and applied 
(practical) reserch is important in the politics of science. London: Centre of Philosophy 
of Natural and Socila Science Contigency and Dissent in Science. 
Sawmill. (2010). Retrieved January 7, 2017, from Analyze, Monitor, Alert. Sawmill, Universal 
Log File Analysis and Reporting: https://www.sawmill.net/index.html 
  
58 
 
Sharma, A. K., & Gupta, P. C. (2013). Analysis of Web Server Log Files to Increase the 
Effectiveness of the Website Using Web Mining Tool. International Journal of Advanced 
Computer and Mathematical Sciences. 
Singer, A., & Bird, T. (2004). Building a Logging Infrastructure. The Usenix Association. 
Sowmya, G., & Kumar, A. N. (2013). Brute Force Attack - Blocking Techniques.  
Srivastava, J., Cooley, R., Deshpande, M., & Tan, P.-N. (2000). Web Usage Mining: Discovery 
and Applications of Usage Patterns from Web Data. ACM Sigkdd Explorations 
Newsletter. 
Stockley, M. (2016, June 15). Naked Security. Retrieved February 8, 2017, from The web 
attacks that refuse to die: https://nakedsecurity.sophos.com/2016/06/15/the-web-
attacks-that-refuse-to-die/ 
Suneetha, K. R., & Krishnamoorthi, D. R. (2009). Identifying user behavior by analyzing web 
server access log file. IJCSNS International Journal of Computer Science and Network 
Security. 
Szalvay, V. (2004). An Introduction to SAgile Software Development. Danube Technologies. 
Tutorials Point. (2006). Retrieved from SDLC Agile Model: 
https://www.tutorialspoint.com//sdlc/sdlc_agile_model.htm 
Tyagi, A., & Choudhary, S. (2015). Web Usage Mining Using Web Log Expert Tool. 
International Journal of Advanced Research in Computer Science and Software 
Engineering. 
Vernekar, S. S., & Buchade, A. (2013). Map-Reduce based log file analysis for system threats 
and problem identification. In Advance Computing Conference (IACC), 2013 IEEE 
3rd International. 
Vianello, V., Gulisano, V., Jimenez-Peris, R., Patiño-Martínez, M., Torres, R., Díaz, R., & 
Prieto, E. (2013). A Scalable SIEM correlation engine and its application to the 
Olympic games IT infrastructure. Availability, Reliability and Security International 
Conference. 
Wainwright, P. (2008). Pro Apache (Third ed.). Appres. 
Web Technologies. (2016). Retrieved February 2, 2017, from Usage of web servers for 
websites: https://w3techs.com/technologies/overview/web_server/all 
Web Technology Survey. (2016). Retrieved January 2, 2017, from Usage of content 
management systems for websites: 
https://w3techs.com/technologies/overview/content_management/all 
  
59 
 
WebLog Expert. (n.d.). Retrieved January 7, 2017, from Powerful log analyzer: 
http://www.weblogexpert.com/ 
 (2014, May 28). Retrieved January 7, 2017, from Home of the Webalizer: 
http://www.webalizer.org/ 
 
 
APPENDICES 
Appendix A: Apache2 Web Server Installation 
Pre- requisites- Ubuntu 16.04 operating system with 512 MB RAM and Hard disk size of 9 GB  
The local package index has to be updated to reflect the latest upstream changes. Afterward 
Apache was installed using Ubuntu’s package manager, apt, as shown in figure below. 
 
To confirm Apache installation, start the Apache service and direct web browser in use to 
http://192.168.56.101, 10.0.3.15, and the local host, to view Apache2 default web page. 
 
  
60 
 
 
 
 
Appendix B: Esper Configuration in Eclipse IDE 
To add Esper to the project, right click on the project name select Build path -> Add External 
Archives. Browse to the esper- 6.0.1.jar file. 
 
 
  
61 
 
 
In order to run the project, add the following external archives 
 Antlr- runtime- 4.5.3.jar 
 Cglib-nodep-3.2.4.jar 
 Log4j-1.2.7.jar 
 
Appendix C: Log.java class 
This is the event class. The events consist of the following attributes, 
 Host 
 Time  
 Zone 
 Method 
 Url 
 Response 
 Bytes 
 Referer 
 User 
 
Package Logs; 
 
public class Log { 
 Private String symbol; 
 
 Private String host;      //Host 
 Private String      time;      //DateTime 
 Private String zone;      //TimeZone 
 Private String method;    //Method 
 Private String url;       //URL 
 Private String response;   ResponseCode 
 Private String bytes;     //Bytes Sent 
 Private String      referer;    //Referer 
 Private String      user;       //UserAgent 
  
62 
 
 Public Log (String symbol) { 
     this.symbol = symbol; 
 } 
 
 public Log(String symbol, String host, String time, String zone, String 
method, String url, String response, String bytes, String referer, String user) { 
             this.symbol = symbol; 
  this.host = host; 
  this.time= time; 
  this.zone = zone; 
  this.method = method; 
  this.url = url; 
  this.response = response; 
  this.bytes = bytes; 
  this.referer = referer; 
  this.user =user; 
 } 
 
 Public String getSymbol() {  
  return symbol;  
  } 
  
 public void setSymbol(String symbol) { 
  this.symbol = symbol; 
 } 
  
 public String getHost() {  
  return host;  
  } 
 public void setHost(String host) { 
  this.host = host;  
  } 
  
 public String getTime() {  
  return time;  
  } 
 public void setTime(String time) { 
  this.time = time;  
  } 
 public String getZone() {  
  return zone;  
  } 
 public void setZone(String zone) { 
  this.zone = zone;  
  } 
 public String getMethod() {  
  return method;  
  } 
 public void setMethod(String method) {  
  this.method = method;  
  } 
 public String getUrl() {  
  return url; 
  } 
 public void setUrl(String url) {  
  this.url = url;  
  } 
 public String getResponse() { 
  return response;  
  
63 
 
  } 
 public void setResponse(String response) {  
  this.response = response;  
  } 
 public String getBytes() { 
  return bytes;  
  } 
 public void setBytes(String bytes) {  
  this.bytes = bytes;  
  } 
 public String getReferer() { 
  return referer;  
  } 
 public void setReferer(String referer) { 
  this.referer =referer; 
  } 
 public String getUser() {  
  return user; 
  } 
 public void setUser(String user) {  
  this.user = user; 
  } 
 
 @Override 
 public String toString() { 
  //DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd"); 
 
    return 
   "Symbol: " + symbol + "  " + 
   "Host: " + host.toString() + "   \t   " + 
   "Time: " + time.toString() + "   \t   " + 
   "Zone: " + zone.toString() + "   \t   " + 
   "Method: " + method.toString() + "    \t    " + 
   "Url: " + url.toString() + "    \t    " + 
   "Response: " + response.toString() + "   \t   " + 
   "Bytes: " + bytes.toString() + "   \t   " + 
   "Referer: " + referer.toString() + "   \t   " + 
   "User: " + user.toString(); 
 } 
} 
 
 
 
 
 
 
 
 
  
64 
 
Appendix D: Apache Log Input Adapter Configuration 
The code is used to generate the log events and read through the EPL queries. 
Package Logs; 
import com.espertech.esper.client.*; 
import java.util.StringTokenizer; 
import java.text.*; 
import java.io.File; 
import java.io.FileReader; 
import java.io.FileNotFoundException; 
import java.io.BufferedReader; 
import java.util.Scanner; 
 
public class ApacheLogInputAdapter { 
 
 public static class CEPListener implements UpdateListener { 
  public void update(EventBean[] newData, EventBean[] oldData) { 
   System.out.println("EVENT! " + newData[0].getUnderlying()); 
  } 
 } 
 
 public static void main(String[] args) { 
 
// how the adapter is set up to retrieve log metadata from the log.java class 
 
  Configuration cepConfig = new Configuration(); 
  cepConfig.addEventType("Apachecsv", Log.class.getName()); 
 
// Configuration instance is then passed to EPServiceProviderManager to obtain a 
configured Esper engine. 
 
//getProvider () method returns Instance of Esper Engine 
// code snippet shows how to send events to the engine 
 
EPServiceProvider cep =EPServiceProviderManager.getProvider("ApacheLogInputAdapter 
", cepConfig); 
  EPRuntime cepRT = cep.getEPRuntime(); 
 
  EPAdministrator cepAdm = cep.getEPAdministrator(); 
 
// the adapter loops over all EPL queries, this depends on the number of queries 
 
  for(int queryCounter = 1; queryCounter <= 10; queryCounter++) { 
   System.out.println("------------------"); 
   System.out.println("PERFORMING QUERY " + queryCounter); 
   System.out.println("------------------"); 
 
   try { 
    String query = new Scanner(new File("query_" + 
queryCounter + ".epl")).useDelimiter("\\Z").next(); 
    cepAdm.destroyAllStatements(); 
    EPStatement cepStatement = cepAdm.createEPL(query); 
    cepStatement.addListener(new CEPListener()); 
   } catch(Exception e) { 
    System.err.println("Error: " + e.getMessage()); 
   } 
 
  
65 
 
 
//The adapter reads through the Apache logs in the CSV form and generates EPL 
responses based on the EPL queries 
 
   try { 
    File file = new File("Apacheaccesslogs.CSV"); 
    BufferedReader reader  = new BufferedReader(new 
FileReader(file)); 
    String line = null; 
    StringTokenizer st = null; 
 
    while((line = reader.readLine()) != null) { 
     Log log = new Log("Apacheaccesslogs"); 
      
     st = new StringTokenizer(line, ","); 
                     
                   
                    while(st.hasMoreTokens()) { 
     
                     
     //DateFormat dateFormat = new 
SimpleDateFormat("yyyy-MM-dd"); 
     try { 
     
 //log.setTimestamp(dateFormat.parse(tokenizer.nextToken())); 
     } catch(Exception e) { 
      System.err.println("Error: " + 
e.getMessage()); 
       
     } 
      
      
     log.setHost(st.nextToken()); 
     log.setTime(st.nextToken()); 
     log.setZone(st.nextToken()); 
     log.setMethod(st.nextToken()); 
     log.setUrl(st.nextToken()); 
     log.setResponse(st.nextToken()); 
     log.setBytes(st.nextToken()); 
     log.setReferer(st.nextToken()); 
     log.setUser(st.nextToken()); 
     
     //System.out.println("LOG!  " + log); 
     cepRT.sendEvent(log); 
    } 
 
    //reader.close(); 
    } 
   } 
    catch(Exception e) { 
    System.err.println("Error: " + e.getMessage()); 
   } 
  } 
 } 
}