|dc.description.abstract||As technology advances, networks get more sophisticated. As a result, the attack surface for hackers has continued to expand leading to rapid increase in insecurity; therefore, there is need for a line of defence that is reactive and predictive. Traditional protection techniques such as data encryption, user authentication and avoiding programming errors are in existence and act as the first line of defence for computer security however, these techniques are not sufficient to protect against malicious code and insider attacks. Attacks such as programming errors are unavoidable due to the complexity of the system and application software that is rapidly evolving and consequently leaves behind some weaknesses that could be exploited.
Research on Intrusion Detection Systems (IDS) has been considered a critical research area to bridge this gap .The challenge with network-based detection is the ability to scheme the behaviour of normal and abnormal traffic. This calls for a reliable system that can learn the structure of network data and differentiate between normal and abnormal. Since there are many applications using different internet protocols an IDS finds it difficult to detect all kinds of attacks efficiently. It suffers from the difficulty of building robust schemes that result in increasing false alarm rates caused by weak feature selection, inefficient classifier generation and data noise generated from imbalanced data. Due to this, predictive Machine language (ML) algorithms have been proposed since they are capable of solving such problems.
Various ML methods have previously been employed in areas of network intrusion detections however; Bayesian based Network has been considered a better approach due to its significant features. In this study, experiments were carried out using KDD99 data set. The first experiment was conducted using Weka, a Machine Learning tool and the second experiment was conducted using Python language. First, the data went through pre-processing where most relevant features were selected from the entire data set before classification and thereafter issues of data noise such as class imbalance were removed. Naive Bayes, a Bayesian based Network was used as a classifier to train and test the data in Weka. Secondly, Python language was used to train and test the classifier. In both experiments, training and testing ratios were 0.67 and 0.33 respectively. The algorithm obtained accuracy of 92% using Weka tool and of 90% using Python (JupyterLab).||en_US