SU+ @ Strathmore 

University Library 
 

Electronic Theses and Dissertations 

 
This work is availed for free and open access by Strathmore University Library.  

It has been accepted for digital distribution by an authorized administrator of SU+ @Strathmore University. 

For more information, please contact library@strathmore.edu 

 
2021 

 
Classification of X-rays images using Deep 

Convolutional Neural Network: COVID-19 

 
Bore, Laban Kipchirchir 
Strathmore Institute of Mathematical Sciences  
Strathmore University 
 
 
Recommended Citation 

Bore, L. K. (2021). Classification of X-rays images using Deep Convolutional Neural Network: COVID-19 [Thesis, 

Strathmore University]. http://hdl.handle.net/11071/12816 

 
Follow this and additional works at: http://hdl.handle.net/11071/12816 

https://su-plus.strathmore.edu/
https://su-plus.strathmore.edu/
http://hdl.handle.net/11071/2474
mailto:library@strathmore.edu
http://hdl.handle.net/11071/12816
http://hdl.handle.net/11071/12816


Classification of X-rays Images Using Deep Convolutional

Neural Network: COVID-19

Laban Kipchirchir Bore

The thesis presented in fulfillment of the academic requirement for the

degree of Masters of Statistical Science (Statistics) of Strathmore

University

September 2021 


Declaration and Approval

Declaration

I declare that this work has not been previously submitted and approved for the award

of a degree by this or any other University. To the best of my knowledge and belief, the

thesis contains no material previously published or written by another person except

where due reference is made in the thesis itself.

© No part of this thesis may be reproduced without the permission of the author

and Strathmore University.

Laban Kipchirchir Bore

Signature . . . . . . . . . . . . . . .

Date . . . . . . . . . . . . . . .

Approval

The thesis of Laban Kipchirchir Bore was reviewed and approved by the following:

Dr. Collins Odhiambo,

Senior Lecturer, Institute of Mathematical Sciences,

Strathmore University

Dr Godfrey Achono Madigu,

Dean, Institute of Mathematical Sciences,

Strathmore University

Dr. Bernard Shibwabo,

Director of Graduate Studies,

Strathmore University

ii

27/09/2021


Dedication
This thesis is dedicated to the memory of my father, Symon Kangogo Chebore, who

played a critical role in my education; offered support and encouragement during my

early education.

Special thanks to...

...my wife through challenging and glorious moments, your kindness, love, and support

have always been present during these critical times of my life.

...my mother whose relentless love, support, and prayers kept me going.

...finally, to my supervisor Dr. Elphas Okango for supervising and guiding the thesis

and my sister Asenath Bore for cheering me.

iii


Acknowledgment
First, I wish to extend my special thanks to my supervisor Dr. Elphas Okango who

guided and supervised me in this project. I would also like to thank Dr. Collins

Odhiambo for his advice on this project and supervision.

I wish to acknowledge Strathmore University, Institute of Mathematical Sciences

(SIMS) for the the world-class education, practical and technical knowledge.

iv


Abstract
The increased amount of labeled X-ray image archives has triggered increased research

work in the application of statistics, machine learning, deep learning, and computer vi-

sion across the different domains. The fresh studies on the application of deep transfer

learning (60) CNN to detect and classify few COVID-19 datasets have had major suc-

cess. COVID-19 dataset has been collected since the outbreak of the COVID-19 viruses

in quarter four of 2019. COVID-19 virus confused the diagnosis, treatment, and care

of patients because there is no cure and the virus mutates into different fatal variants.

This has led to thousands of people dying, increased admission into hospital beds, ICU,

and other health facilities. Hundreds of thousands of new infection cases are reported

daily across the world. The overburdening of the health system by the COVID-19 virus

has caused access to other health services difficult in the under-served world (89). Tra-

ditionally, medical doctors carry several tests such as full blood count tests to ascertain

if the body is fighting certain pathogens, sputum tests, and chest X-rays. Doctors will

examine patients’ medical history, carry physical exams such as listening to the lungs

with astethoscope for abnormal crackling sounds. The success of this traditional diag-

nosis process is dependants on the doctors’ experience, skills. quality of X-ray images

and the availability of patient’s historical records. This is almost unattainable and un-

sustainable in the under-served countries in Africa. The motivation of this paper is to

complement the traditional diagnosis and analysis of chest X-ray images by introducing

machine classification approaches and state-of-the-art deep residual network ResNet18

(14, 35). According to WHO (58), diagnosis is a process and requires classification steps

to inform research, health policies, and care of the patients. An alternative definition is

a “pre-existing set of categories agreed upon by the medical profession to designate a

specific condition” (43).

We applied statistical learning model to separate and classify all the X-Rays images

with patchy areas into one distinct class for further research, examination, analysis, and

care of the patients. The observed white patchy areas in our X-Rays images was our

statistical variables of interest in classifying Chest X-Rays images into COVID-19 and

v


non-COVID-19, fig 3.2. In addition, the final model can be replicated in other non-covid

datasets and extended to other related classification tasks. Deep CNN classification

model(ResNet18) as a subfield of non-parametric statistics was used for classifying and

predicting COVID-19 positive images. The datasets used were COVID-19 positive (184

cases) and the COVID-19 negative cases (5000) were aggregated from different sources.

The COVID-19 negative cases was from 10 disease categories (Pneumonia, Pneumotho-

rax, Lung opacity, Fracture, Atelectasis, Edema, pleural, etc). The finetuned deep CNN

model (ResNet18) performed significantly with precision (87.5%), sensitivity (75%) and

specificity (99.8%). Rerunning the model using larger datasets by adding noise through

data augmentation demonstrated sensitivity (90%) and specificity (100%). Hence, when

more dataset is fed into the neural model, the classification performance such as pre-

cision, AUC and recall improves significantly. This classification model can be used

to aid radiologists or medical practitioners in chest X-ray image diagnosis and treat-

ment (59) by categorization, diagnosis, detection, and prediction. Further extension

of this research work will focus on using larger COVID-19 or non-COVID-19 datasets

with more focus on systematic review around data acquisition, data certification, model

development and pitfalls, and explanation construction (39).

vi


Contents

Abbreviations ix

1 Introduction 1

1.1 Background to the study . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Significance of research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature review 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Methodology 8

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.2 X-Rays Features and Variables . . . . . . . . . . . . . . . . . . . 10

3.3 Convolution Neural Network Architectural . . . . . . . . . . . . . . . . . 11

3.3.1 Convolution Layers . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.2 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3.3 Filters/Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Stride and Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.1 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.2 Stride . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5.1 ReLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.7 Fully Connected Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

vii


3.8 Training CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.8.1 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.8.2 Softmax Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.9 Proposed CNN: ResNet18 . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Results 28

4.1 Chest X-Rays Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Model Analysis and Performance Metrics . . . . . . . . . . . . . . . . . . 28

4.2.1 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.3 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.4 Receiver Operating Characteristics, Area Under the Curve . . . . 33

4.2.5 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Discussion and Conclusion 35

5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

References 37

Appendix A Python Codes 46

A.1 Trained model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

A.2 Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Appendix B Similarity Report 61

Appendix C Ethical Approval Letter 62

viii


Abbreviations
ROC Receiver Operating Characteristic

ResNet18 Residual Network

AUC Area Under the Curve

CI Confidence Intervals

CNN Convolution Neural Network

DL Deep Learning

ReLu Rectified Linear Unit (ReLU)

ANN Artificial Neural Networks

Res Residue

CXR Chest X-Rays

CT Computed Tomography

MRI Magnetic Resonance Imaging

SVM Support Vector Machine

PCANet Principal Component Analysis Network

KHIS Kenya Health Information System

KNBS Kenya National Bureau of Statistics

WHO World Health Organization

SOR Society of Radiography in Kenya

NIH National Institutes of Health

Kag Kaggle

Img Image

Bir Bird

etc et cetera

ix


Chapter 1

Introduction
This chapter introduces the background of the study, problem statement, research ob-

jectives, and significance of the research.

1.1 Background to the study

In 1895 professor Wilhelm Rontgen discovered the concept of x-rays (32) which is based

on passing ionized radiation through the patient’s body and projecting its images on a

photosensitive plate. The body tissues are detected on the plate which will display the

presence of abnormalities. Nuclear medicine where patients were infused with radionu-

cleotides in combination with pharmaceutical was introduced in 1950 (32). This concept

records images and detects medical pathogens using a gamma camera. At around 1970

CT scan and MRI were developed which was build on magnetic nuclear technology.

Strong magnetic forces are directed on the body which can display the alignment of

protons in cells. This is further investigated and the problem in body tissue is differ-

entiated by a physician. To date, there has been improvement in medical imaging and

accuracy in diagnosing a medical condition. As a result of these technological successes,

medical practitioners are performing less exploratory surgery and analysis.

With recent improved high precision technology, availability of large datasets, fast

computing power, and the need to quickly make accurate diagnosis and decision, ra-

diologists are under immense decision fatigue to provide instant reliable examination

reports. The challenges of detecting medical problems from high-resolution images and

examination with the human natural eye may be partially met by a highly experienced

photographer and radiologist. The physician looks at the images with their physical eye

for abnormalities. This is not always the case when it comes to diagnosing emerging

challenging diseases or new variant of disease evidence in the image. The practitioners

1


with their level of skills may fail to diagnose or may misinterpret the anomalies displayed

in the image hence produce incorrect results.

A recent study on artificial neural network(ANN)-machine learning suggests that the

technology performs better in visual and auditory recognition tasks than the human eye

(48).

The technology has witnessed a successful landmark in its application to learning

rules instead of being programmed with the rules and find the underlying statistical

structure in performing specific tasks. Some of the applications are in the classification

of images, associating photos to specific tagging tasks, and computer vision. Image clas-

sification problem cannot be solved by machine learning models because it is complex

and the intensity of its information is represented by thousands of millions of pixels.

The image classification task cannot be performed by classical statistical methods such

as Bayesian analysis.

This research paper will focus on the application of non-parametric statistical clas-

sification and analysis of images in the medical field using Deep Convolution Neural

Network as compared to a shallow Convolution Neural network which is built by one or

two layers of neural network.

1.2 Problem Statement

Experienced and skilled medical practitioners or radiologists in the developing world

find challenges in accurately diagnosing, classifying, and interpreting the patients X-

rays image datasets. This is due to poor quality image and situations complicated by

the development and mutation of the existing or rare disease. The underlying disease

changes and damages in the human lungs can complicate the diagnosis process which

can lead to misinterpretation or prolong delay in disease diagnostics. In 2019 WHO

projected a shortfall of 18 million health workers towards accelerating universal health

coverage in lower and middle-income countries by 2030 in line with achieving health

Sustainable Development Goals. The shortfall in the health system is worsened by the

lack of quality health and diagnosis equipment and limited experienced and unskilled

2


recruits.

According to the Society of Radiography in Kenya, there are 1070 registered radio-

graphers are serving a country’s medical system with a population of over 50 million

people who are exposed to various pathogens and need body imaging and diagnosis at-

tention. The radiographers and radiologists have to go through clinical training to keep

abreast with the advanced imaging technology and also to provide accurate diagnosis,

quality imaging services, and build competencies.

I believe the application of deep CNN in the analysis of medical big data can be use-

ful in extracting important information, performing automatic abnormality detection,

producing varied diagnoses, and preparing introductory radiology reports. The deep

CNN-based analytical framework in the practice of radiology will complement the irre-

placeable and remarkable physician skills. The radiologist’s work will improve accurate

diagnosis and analysis, decision making, and interpretation of underlying body-tissues

conditions in chest X-ray images. Dr. Watson is one of the IBM radiology applications

which factors in some of the functions discussed in this paper (30).

To aid the practitioners’ in accurate diagnosis, making decisions, and interpretation

of body images, we will build and deploy a classification model to detect, discriminate,

and predict chest X-ray images into a different distinct class.

1.3 Research Objectives

The purpose of this study is to apply the deep ResNet-18 CNN model to classify 5000+

publicly labeled chest X-ray images into their distinct classes for ease of differentiation,

diagnosis, and interpretation by the radiologist and medical professions using the trans-

fer learning version of non-parametric deep convolution neural network. The learned

knowledge from the pre-trained ResNet model will be applied to the target X-ray im-

ages to solve the classification problem which would have been time-consuming in con-

structing the deep ResNet model from the scratch due to insufficient certified COVID-19

samples.

3


1.4 Significance of research

This research work will aid the practitioners to interpret and classify the target medical

images into different categories in a clinical setting for further disease diagnosis and

analysis, examination, and treatment of patients. The statistical model will be packages

as a software and used in hospital to classify X-rays images into different groups of

pathogens. The classification model will achieve the following:

• It will improve accuracy in image diagnosis, analysis and interpretation in hospi-

tals.

• The diagnostic results will support decision making process by informing patient

care, policies, research & development in the field of health and healthcare.

• Detect, discriminate and predict chest X-Rays images into distinct group.

4


Chapter 2

Literature review

2.1 Introduction

Diagnostic analysis and process are one of the important steps in the examination of

patients in clinics and hospitals. The diagnosis results will inform patient care, policies,

research, and development. According to Jutel (43) diagnosis is a “pre-existing set of

categories agreed upon by the medical profession to designate a specific condition”. This

can be viewed as a process and classification task (56). The clinical decision-making

process depends on the outcome of the diagnosis. The diagnosis should be accurate and

timely to provide correct doses and treat the patient’s disease. Diagnosis is a process

that involves patients data collection, clinical analysis to conclude on patient’s health

conditions. In most cases, the process requires cognitive skills, collaboration, and con-

centration around the patient.

Typically the diagnosis process is as follows: the patients seek help from a clinic after

experiencing certain symptoms. The physician in the health system collects patients

data, aggregate, interpret and determine the most probable workable diagnosis. The

understanding of the patients’ health problems involves gathering data by interview-

ing, conducting a physical examination, diagnostic tests, and consultation with relevant

experts in the health system (17). The tasks are carried out in a feedback loop mecha-

nism; the health information technology and other tech tools are used in the diagnostic

process. All components involved in the diagnosis process interact (23). During this

process depending on the outcome of the history and patients interviews; the doctor

might direct the patient to the radiology labs for body scanning. Pictures of the tar-

geted body parts are taken and return to the doctor for a further physical exam.

Image testing techniques such as CXR and CT scan are crucial in respiratory disease

diagnoses such as COVID-19 and pneumonia. Significant steps have been made in con-

volution neural network to classify the medical image as a result of large annotated chest

5


X-rays datasets. CNN method provides representative learning for quality annotated

images.

The fast research & development of digitized medical image and storage infrastruc-

tural technologies, medical image diagnosis, and understanding by doctors and com-

puters is a practiced topic in statistical/machine learning discipline and application to

specific problems (6). Previous research publication and work(83) has been used to

solve the image classification problem of which we will group into traditional and deep

methods. The traditional methods include low features such as texture, edge, color, and

SVM. The application of the deep learning approach to classify images is discussed on

deep and shallow CNN to classify lung image patches. A lot of medical image labeling

work has been done in creating fast and accurate annotated images which have been

labelled according to different specific pathology category (82). Images produced across

hospitals and regions may vary in quality, features, color, shape, and textures. With

the emergence of high-resolution image scanning technologies, the use of traditional

methods to classify different categories of the medical image has proved inefficient(53).

Traditional method of image classifications classifies images based on color, textures,

and shape (18). These traditional methods give features that generally describe the

background of the image in terms of color and texture. Celebi was able to extract fea-

tures such as color, texture, and shape by feeding the images into the SVM algorithm

(24). This achieved sensitivity (93%) and specificity (92%). Support vector methods

cons is its performance not consistent, constructing them is slow and also selecting and

extracting features is time-consuming (84). In this paper, we will advance from the

descriptive traditional method of classifying images to gaining more clarity on what the

image is and what falls under which group. The low traditional features are of little

interest to this research work. The advancement of deep learning models in computer

science and statistics has been boosted by the availability of powerful servers and appli-

cations. The technique has been used to address non-medical and medical images. The

theoretical concept of the deep learning framework was coined by Hinton et al.(29).

Since 2006, researchers have developed many methods and improved on the existing

ones to remedy the challenges faced in training CNN (33). A few notable CNN methods,

Alex Krishevsky et al. (5) developed a classical CNN framework that outperforms the

6


state-of-the-art compared to previous methods of image classification problems. The

architectural method for AlexNet is similar to that of LeNet but uses larger parameters

and 8 layers to model the 1.2 million ImageNet datasets (69). AlexNet uses five convolu-

tional layers, two fully connected hidden layers, and one fully connected output layer in

addition to the ReLU activation function while LeNet uses two convolutional layers, two

fully connected hidden layers and one fully connected output layer and sigmoid activa-

tion function. The success of the few more lines of AlexNet’s implementation methods

inspired more effective research works to improve CNN performance. Among the works

are ResNet12 (36), SqueezeNet (37) and GoogleNet10. (76). The PCA network model

provided a baseline for image face recognition which achieved accuracy (99.8%) on sin-

gle and not multiple images (25). Other deep learning methods are Visual Geometry

Group which is a linear SVM classifiers (13) with accuracy(87.5%), specificity(81%) and

sensitivity(93.5%). The successful research studies in this field has attracted different

disciplines working in varied domain problems. These approaches and methods was used

to solve real-life medical and non-medical image classification challenges. Typically im-

age classification is split into two steps. First is the extraction of features and second

is the use of the extracted features to classify the image (73). In a traditional setup,

professional doctors use their accumulated years of experience to extract features and

to classify the image datasets into different classes (31). This is always complex, time-

consuming, and boring, sometimes prone to error depending on the emotional state of

the doctor like fatigue. This method is unstable and doesn’t produce sufficient repeat-

able results. The emerging application of medical image classification has advantages of

this traditional method (63). There is still research and development in this field to pro-

duce reliable and efficient classifiers for further medical diagnosis and study. Doctors can

combine the proposed model to classify the image dataset with their prior professional

experience.

7


Chapter 3

Methodology

3.1 Introduction

The use of deep CNN methodology in the previous research work was centered on the

movement from shallow to deep network and application of the pre-trained CNN models

to new most recent datasets like the COVID-19 patient chest X-ray images. Increasing

the depth leads to the extraction of better features about the object with an increase in

non-linearity. Non-linearity makes the network more challenging to optimize and easily

leads to overfitting.

Due to the limited number of clearly labeled images, unavailability of enough datasets to

build the model from scratch, and other technical aspects like memory and computing

I opted to transfer learning from the states-of-the-art pre-trained model ResNet18 (36)

with a few modifications of the parameters.

3.2 Data Sources

Data from public database NIH Chest-X-ray and COVID-19 2020 database (26) was

researched and collected over three months. The dataset with unclear outcomes and

poor image quality was archived and only those with physician-diagnosed outcome were

used in this study. The X-rays images was collected, grouped, and stored in two folders:

the test and training dataset with respective sub-folders labelled ’Covid’ and ’Non-covid’

as demonstrated in the chart 3.1.

The positive COVID-19 images was certified by a radiologist and previously used in

the research paper by Shervin (55) based in Canada. The test and training set set was

split in the ratio 40%:60% respectively. The positive COVID-19 images were 184 and

split into 100 training covid sets and 84 test covid sets. We retained the certified 184

8


Figure 3.1: Flow chart of data set-up in the computer folders
.

positive COVID-19 datasets. The initial negative COVID-19 datasets was 2400 but after

aggregating from different sources we increase the number to 5000 negative-COVID-19.

The negative-COVID-19 datasets was aggregated from National Institutes of Health

(NIH) and Kaggle. This was split into a 3100 training and 2084 testing set. The

negative-COVID-19 datastet contain images of 10 categories (pneumonia, atelectasis,

edema, pleural, etc).

3.2.1 Data Preparation

We collected a limited number of certified positive-COVID-19 image cases as a result

we cannot use the few data as it is. To remedy this we applied data augmentation

procedures i.e flipping or rotating the positive-COVID-19 cases to double our exist-

ing database of positive-COVID-19 and optimizes our network (71, 80). We kept the

negative-COVID-19 database as it is since it was much richer in numbers. Since the

9


images were of different resolution we resized them to a 224 × 224. Color features was

not an important object in this study. The image was then fed into a 3 channel giving

us the final image input shape of 224 × 224 × 3. We rescale the image pixel values by

applying normal standardization procedure to the datasets to save time and to build a

stable model.

3.2.2 X-Rays Features and Variables

Figure 3.2 shows labelled Chest X-Rays images with ill-defined peripheral airspace opac-

ity. Three samples of images was taken from patients infected with COVID-19 and the

corresponding areas marked as shown below. The pattern seen in the marked areas

might be a challenge to identify visually. Since we have limited number of trained

experts radiologist such subtle abnormalities or patchy areas can pass undetected.

We apply statistical learning model to separate and predict all the X-Rays images

with patchy areas into one distinct class. The observed patchy areas in our X-Rays

images was our statistical variables of interest in classifying Chest X-Rays images into

COVID-19 and non-COVID-19. Input images is converted into input values and then

mapped to output values. White patches in the image takes a higher output values,

a signal of abnormalities. The model count such instance and calculate specificity and

other model evaluation metrics (47).

The model was used to detect such images, classify them together and output prob-

ability values.

10


Figure 3.2: Display of patients Chest X-Rays images who were infected with COVID-19
Virus. Image source (55)

.

3.3 Convolution Neural Network Architectural

The human brain is comprised of interconnected billions of neurons. Information is sent

from one neuron to another through a process call synapses depending on which part

of the brain has been triggered or activated. This information could be of an image,

sound, or text. Human beings can discriminate this data by looking at specific features

like edges, curves, color, and other attributes. Machines can mirror the same behavior

and interpret multiple levels of representation and abstraction using pixel values.

11


Figure 3.3: Human neuron and mathematical interpretation. Source:(Img)

3.3.1 Convolution Layers

Before successful application of deep learning network, computer vision for recogni-

tion was dependant on two separable and complementary steps: The input data was

transformed through a set of hand operations to a required form. The transformed

information is an abstract representation of the input data. The input data is changed

in such a way can be separable by a classifier. The transformed data is finally used to

train a classifier algorithms to recognize the information of the input signal (49, 51).

The used transformation affects the performance of any classifier.

12


Convolution layer frames can be defined as computational models that allow the com-

puter to extract useful information from the input data by representing multiple levels of

abstraction. Unique or important features of the input are amplified at higher layers in

the network and become more robust to insignificant variations. The multilayers stack

several blocks of modules with alternating linear and nonlinear functions.

A CNN is made up of an input layer x, an output layer y and a stack of multiple

hidden layers h where each layer consists of several units. The hidden layer/unit, hj, re-

ceives input from the units of the previous layer and is defined as a weighted combination

of the inputs and follows a nonlinear form:

hj = F (bj +
∑
i

wi,jxi) (3.1)

where wij are the weights controlling the strength of the connection between the

input layers units and the hidden unit layers, bj is the bias of the hidden layer and is

added to the weighted sum and passed through the non-linear activation function F(.)

to yield the output hj. The bias bj allows for the shifting of the activation function left

and right.

3.3.2 Image Analysis

A convolutional neural network works in a similarly way by comparing the pixel values

in an image. The features in an image are differentiated by the intensity of information

it carries. The activation functions activate when it torch the edges, color, shape, or

distinct areas of the image. We apply a defined filter to identify this information/abnor-

malities in an image. The most common filter is the sigmoid which reads and activates

when it identifies a curve in an image. We focused on ReLu function in this research.

Figure 3.4 illustrates the image of a dog and bird, with a pixel array of numbers

on top of the images. The pixels’ values are higher around the surface edges of the

objects. This is where we have distinct information. CNN can identify this information

13


Figure 3.4: Convolution; Image and Filter

by comparing the intensity of the neighboring pixel.

In our corresponding marked images, see 3.2, we analyse the marked white patch ar-

eas. The white patched areas contain distinct abnormalities which the model will treat

differently by amplifying its’ output values.

3.3.3 Filters/Kernel

A filter or kernel is convolved on the input image and computes different feature maps

through a convolution mathematical operations. The Element-wise nonlinear activation

function is applied to the image. Each neuron in a feature map is linked to a receptive

region in the previous layer. The filter is convolved and moved across all spatial points

location in an input image dataset. Several filters can be used to obtain a full feature

map.

Simple mathematical representation:

y = X ∗ f

where the symbol * denotes a convolution operator.

where y = convolved results, X = input image, f = filter or kernel.

To illustrate convolution consider an image with resolution size 3 X 3 and a filter of

14


size 2 X 2.

Figure 3.5: Convolution; Image and Filter.

Figure: 3.5 The filter convolves through the patches of the input images at some

location, performs an element-wise multiplication (cross-Correlation) between the values

in the filter and their corresponding values in the image, and the element-wise products

are summed up as demonstrated below. The sum of element-wise products is the output

values for the destination pixel in the output image. This process is repeated for all

locations. The Filter is moved across the surface of the image to the right, down, and

so on. The surface area of the image with information such as edges, colors, patterns,

and shapes has higher output values (85).

Figure 3.5, X is an input image with a height of 3 and width of 3 while the height and

width of the filter f, is 2. This filter gives the shape of kernel window.

The output size is given by the input size nh × nw minus the size of the convolution

kernel kh × kw.

Mathematically:

(nh − kh + 1)× (nw − kw + 1) (3.2)

where nh and nw is the n height and width respectively while kh and kw are the kernel

height and width. This is pairwise computation (42).

The Output is a 2 × 2 dimension as shown below:

15


(17×1 + 17× 1 + 16× 0 + 10× 1) = 44

(17× 1 + 9× 1 + 10× 0 + 0× 1) = 26

(16× 1 + 10× 1 + 18× 0 + 10× 1) = 36

(10× 1 + 0× 1 + 10× 0 + 0× 1) = 10

3.4 Stride and Padding

The stride method is used to reduce the dimension of a huge input image while padding

helps to keep interesting information along the boundaries of the input image since a con-

volution kernel with height and width greater than 1 will result in a significantly smaller

output image which might result in the loss of this information along the boundaries of

the input image. We solve the loss of the information along the original boundaries by

applying the padding technique and reduce the dimension of a massive image using the

stride technique.

3.4.1 Padding

The padding method (57, 77) helps to keep the information along the boundaries of the

original input image. We tend to lose a pixel on the perimeter of the input image when

we apply the convolution kernel. A single kernel might result in loss of pixel values

along the boundaries of the image, more kernel results in more loss of the pixels values.

To remedy this we add extra pixels fillers along the boundaries of the input image and

set those values to zeros. This increases the size of the input image (34). We add zeros

along the boundary of the images to protect loss of information that might be at the

boundary of the original image. When the original image with dimension 4×4 is padded,

it increases in size to 6×6 (21, 44). From eq 3.2 we add a total of ph rows of padding

half on top and half on the bottom and a total of pw columns of padding half on the

left and half on the right), the output shape will be of the size.

(nh − kh + 1 + ph)× (nw − kw + 1 + pw) (3.3)

16


We choose the odd kernel size of 1, 3, 5, or 7 to preserve the spatial dimension of the

input image when padding. Padding is used to give the output image the same height

and width as the input image.

3.4.2 Stride

Stride is a method (88) used to reduce the dimension/resolution of a huge input image.

Stride is the number of rows and columns traversed per slide/movement. When com-

puting element-wise multiplication (27) we convolve the kernel at the top-left corner

of the input image and slide all over the image locations both downwards and to the

right. We might move the kernel one element at a time or by skipping several elements

in the image locations. We slide the kernel window more than one element skipping the

intermediate location for computation efficiency reasons or when downsampling.

When computing the cross-correlation (42), we start with the convolution window

at the top-left corner of the input tensor and then slide its over all locations both down

and to the right (90).

Mathematically when stride for height is sh and the stride for the width is sw , the

output shape is given by

[(nh − kh + ph + sh)/sh]× [(nw − kw + pw + sw)/sw] (3.4)

3.5 Activation Function

The image dataset is non-linear and to capture complex information we introduce the

activation function to the CNN architecture. The activation function captures desir-

able nonlinear features in an image. Linear activation function produces linear decision

boundary which will cut some useful information in an image features.

Figure 3.6 below demonstrates that we cannot separate the red colors from the green

colors using a linear decision boundary(right figure).

Mathematically:

17


Figure 3.6: Decision boundary for linear and non-linear activation function. Source:
(CNN)

ali,j,k = a(yli,j,k)

where

ali,j,k is the activation value of convolutional feature yli,j,k.

There is a list of activation function and when to use will depend on the problem we

are solving. We will use ReLU (8) activation function for the image classification prob-

lem because of its consistent gains in improving classification accuracy across deeper

models (67).

3.5.1 ReLU

f(z) = max(0, z)

ReLU (8) is easy to compute and has a non-linearity at z = 0

18


Figure 3.7: Relu Activation Function. Source:(Rel)

3.6 Pooling

The pooling layers reduces the resolution size of the input and is placed between two

convolutional layer to decrease the number of connections between these layers. It cuts

off unwanted parts of the initial convolution layers and outputs only useful features.

According to Tobler’s First Law of Geography (15), ”everything is related to everything

else, but near things are more related than distant things.”(Tobler, 1970), using the

same reasoning on the spatial features of an input image we can confirm that pixel

values of input images that are close together are more likely to be alike than those

further apart hence initial output of convolution layer will produce similar values for

this near/close pixels (52). This is redundant information and we are not learning

anything new. To solve this, pooling achieves shift-invariance (33) by pooling the pixel

values of the input together. A specific feature map is linked to its counterpart feature

map in the following convolutional layer. Pooling operators have no parameters, they

are deterministic (10, 86). Typically we calculate the maximum, minimum, or average

pixel value in the pooling window (75). The pooling window is focused from the top left

of the input image and slide to the right and down, it output a max, min, or average

value on the location of the image it hits. A pooling window can be of shape p× q.

Mathematically:

19


Denote the pooling function as pool(.) for each feature map alm,n,k we obtain:

yli,j,k = pool(alm,n,k), ∀(m,n) ∈ Ri,j

where Ri,j is the neighborhood around location (i, j). The known pooling operations

are the max, min, sum, and average pooling (22). Under max-pooling, we choose the

pool size and apply it to the region of the image as we extract the maximum pixel value

in each region and place the max value in the output image of the corresponding pixel.

The output width and height of the initial convolution layer are divided by the pooling

size.

We can stack several convolutional and pooling layers to extract higher-level features

of the object datasets. These convolution and pooling outputs lead to one or more fully

connected layers that extract high-level feature representations.

3.7 Fully Connected Layers

The output from the convolution and pooling layers (16, 19); extracted information

from the data are used as input in this layer which in turn generates final results (81).

A fully connected layer works with 1-dimensional data; each input image in its row.

The output values from the previous convolutional and pooling layers are converted

into a one-dimensional data format. Each neuron of the previous layers is connected to

every single neuron in the current layer and generates a web of interconnected semantic

information (82).

The figure below shows conversions of a matrix to a vector or 1-dimensional data.

44 26

36 10
⇒

44

26

36

10

20


Individual values are separate features that represent an image. A fully connected

layer performs two mathematical operations; linear and non-linear transformation (91).

The commonly used classification methods are softmax and SVM classifiers

(64) which are supervised method respectively. Softmax achieves a better classification

performance compared to SVM. It output probability distribution which can be used

to classify input data. These methods are combined with CNN to solve classification

problems (7).

The linear transformation of the data is of the form:

yli,j,k = W T
k .X

l
i,j + blk (3.5)

where (i, j) is the location of feature value

k − th is feature map

and l − th is the layer.

and

Wk = weights vector,

X i,j = input image centered at location (i, j) of the l − th layer,

bk = constant bias term of k − th filter of lth layer.

Wk weight is a randomly initialized matrix to generate yi,j,k and is shared. The

sharing of weight reduces the complexity of the model and makes training of the network

easy.

For illustration consider m features from an input image and n neurons. The size of the

weights will be (i, j). When i = 4 and j = 2 the linear transformation will be of the

form:

Denote:

21


X =


x1

x2

x3

x4


Input

W =


W1,1 W1,2

W1,2 W2,2

W1,3 W3,2

W1,4 W4,2


Weight matrix

b =

 b1

b2


Bias

This is how it appears in eq 3.5 i.e yli,j,k = W T
k .X

l
i,j + blk

y =

 W1,1 W1,2 W1,3 W1,4

W1,2 W2,2 W3,2 W4,2

 ×


x1

x2

x3

x4

 +

 b1

b2


Consider a vector of weights and bias terms θ, to find an optimum parameter for

classification tasks we minimize the defined loss function.

3.8 Training CNN

Our primary purpose is to construct a deep CNN model that is going to classify an

image into COVID-19 positive and COVID-19 negative or other disease categories with

a certain probability by finetuning the pre-trained ResNet18 CNN model.

The transfer learning version of deep CNN is viable for this task and will extract

fixed image features and predict the class of the images: COVID-19 positive or COVID-

19 negative. We pre-trained the ConvNet model using a set of 3100 training datasets,

finetune the model to obtain a very small loss/error in the network. We evaluate model

performance by computing the distance scores between the predicted values of the con-

volution network and the actual target output values and compare these error scores.

The smaller the distance scores(loss) the better the deep CNN model.

22


3.8.1 Loss Function

We use the cross-entropy loss function (40) and is mathematical expressions is as follows:

L =
1

N

N∑
i=1

l(θ; yn, oi). (3.6)

We minimize the total loss over the whole datasets in eq 3.6 to find the best fitting

set of parameters. One of the commonly used methods to optimize the CNN network is

the stochastic gradient descent (33).

The predicted loss function L(f(xi;W )) is compared with the actual values of yi.

3.8.1.1 Loss Optimization

The loss function is optimized by updating the weights in the network until we find the

weight that results in a minimum loss over the trained datasets (50). This is called loss

optimization and is expressed mathematically as follows:

W ∗ = arg min
W

L(f(xi;W ), yi). (3.7)

where W is a vector of weights and W ∗ are the updated new weights that minimize

the loss in eq 3.7. We compute the gradient = ∂L
∂W

with respect to the weights at a

picked point and iterate the process until we achieve the lowest minimum. Each iteration

returns a new weight.

Implementation:

1. Initialize the weight randomly and draw them from normal distribution W ∼

N (µ, σ2).

2. Iterate until it converges

3. Compute gradient ∂L
∂W

4. Update the weights W ∗ = W − n ∗ ∂L
∂W

5. Return the final weights

23


3.8.2 Softmax Layers

Softmax (40) turns the output pixel values into probabilities. It predicts the output of

the deep CNN model and is used in multiclass classification problems. The outputs of

the Softmax transform are always in the range [0, 1] and sums up to 1. Hence, they

form a probability distribution.

It is mathematically expressed as:

s(xi) =
exi∑n
j=1 e

xj
.

Where s(xi) is the probability for the given numbers i = 1, 2, . . . , n, e = Euler’s num-

ber, e (mathematical constant), xi is the given numbers for i = 1, 2, . . . , n and
∑n

j=1 e
xj

the summation of all j. The bigger the value of xi, the larger its probability.

Softmax helps us quantify and predict the probability of an object belonging to a partic-

ular class. This is useful in training and evaluating the best performing CNN algorithm.

We compute cross-entropy loss (40) which accounts for the certainty of each pre-

diction. The mathematical expression of cross-entropy loss:

L = − ln (pC).

where L = binary cross-entropy loss, c is the correct image class or output digit, pc is

the predicted probability for class c and ln is the natural log. A lower L, (loss) is better

than a larger loss.

3.9 Proposed CNN: ResNet18

There has been a significant improvement on deep convolutional neural networks (66)

since the notable success of the previous research works; AlexNet 2012 (5). We will trans-

fer learning from the pre-trained model and state-of-the-art residual learning ResNet

(37, 87) which won the ILSVRC 2015 classification competition (78) with training on

million of ImageNet datasets to our COVID-19 image datasets. The purpose is to apply

24


the learned knowledge from the pre-trained ResNet model to the X-ray images to solve

the classification problem which would have been time-consuming in constructing the

deep ResNet model from the scratch due to insufficient certified COVID-19 samples.

ResNet-18 is a deep CNN with 18 layers presented in its’ framework. It can classify

images into 1000 object groups and the size of the input image is 224x224. It was im-

plemented in 2015 (36) and replaced VGG-16 layers using 101 layers (35). ResNet has

demonstrated that it performs efficiently with more layers. (70, 74).

Residual learning is expressed mathematically as follows: consider H(x) to be a map-

ping that is fitted by some stacked layers and x input images. Following the assumption

that multiple non-linear layers asymptotically approximate complex function, we can

equally assume that they can asymptotically approximate residual function (36). This

is expressed as H(x)− x. The stack layers can approximate H(x) but we express these

layers to approximate the residual function F (x) := H(x)− x.

The advantage of RestNet-18 is it addresses the vanishing gradient problem using

identity mapping and can be trained efficiently without increasing the percentage of

the training error (12, 36). Residual networks provides residual connections straight to

earlier layers unlike other non-residual neural model (79).

In this paper, we will use the residual network with 18 layers i.e ResNet18 (87).

We will combine the existing parameters of the pre-trained ResNet18 and finetune the

models to use updated parameters through freezing the parameters of the pre-trained

model. Table 3.1 below showcases the ResNet18 framework.

25


Table 3.1: ResNet18 Framework. Source:(Res)

Table 3.1 displays a convolution layer with kernel size and strike of 7×7 and 2

respectively. The input from the first layer is added to the output of the two-second

convolution layers which has been obtained by the 3×3 maximum pool layer with kernel

size 3×3, 64. This forms the first block of residual. The results from this block are

added to the two convolution layer with kernel size of 3×3 and 128. This is the third

part of the residual. The outputs from the third residual block are added through skip

connection with the output of the two convolution layers, kernel size 3×3, 256 to form

the fourth residual block. This block is again added through skip connection with the

output of the two convolution layers 3×3, 512 to form the fifth residual block. The

average pooling is applied to the fifth residual block and fed into the fully connected

layers. This is 1-dimension output which is finally fed into softmax layers.

The ResNet18 algorithm will be implemented using Python and R software. Opti-

mization algorithm is adopted from torch.optim package and only parameters of final

layer are being optimized. The implementation follows the success in (45, 46, 72). The

image is randomly sampled from a 224 × 224 cropped image. We choose the learn-

ing rate (0.01%), weight decay (10%), momentum (90%), cut-off threshold (0.1), batch

26


normalization (BN) (41) and fully connected layers (72).

The learning rates are significant parameters in fine-tuning the deep CNN model.

Large learning rates will result in a model divergence or unstable training with small

rates will lead to slow convergence of the model. After fine-tuning and retraining the

model we used a learning rate of 0.01% which was the optimum learning rate. We

started with a tiny learning rate and increase it after each training until a big loss or

loss expansion was observed. The learning rate was chosen one step below the learned

rate at the point the loss was minimal. For example, we choose a learning rate of 0.01%

when the loss was low at 0.1% and so on. The learning rates experimented in this study

was not discussed. A momentum of 90% was selected to accelerate the training and

learning rates time and to helps with convergence. The decay Gamma parameter in the

algorithm specifies the learning rate and we evaluated 5 decay rates (10%, 20%, 25%,

35%, and 40% ) and their influence on the performance of the model. A decay learning

rate of 10% gave us the best performing model so we selected it. 20 images was set

to be trained with 100 epochs. These helps with updating the weights in the network.

The presence of noise in the model was addressed by applying Adam Optimizer method

(68) which leverages the features of AdaGrad and RMSProp (20). Overfitting was

addressed in the algorithm by horizontally flipping the input images. This is a data

augmentation technique (62) and was used to increase small datasets to a relatively

large dataset since large datasets are a remedy for overfitting while underfitting (11)

was addressed by the increased layers or capacity of the model.

27


Chapter 4

Results

4.1 Chest X-Rays Exploration

The X-ray datasets comprised of two groups: The COVID-19 positive and the COVID-

19 negatives. We used 60% data for training and 40% for testing. The test set acted as

a validation set as well.

Figure 4.1 displays sample labels of patient’s X-rays images infected with COVID-19

virus and other common chest diseases. The images are of high and low resolution and

were collected from different sources (26). The noise was added by doubling and flipping

the images and other x-rays images added to the universe (62). The model needs to

achieve significant precision on this dataset. In a real clinical setup, it is improbable

to take X-rays images under a highly controlled environment that will result in quality,

clean, and high-resolution images. It is impossible in a clinic/hospital with limited or

poor image scanning equipment in the developing world. All images are normalized

to produce a standard distribution so that the model doesn’t perform strangely. We

down-sample all images to a resolution of 224 × 224 before feeding into the network.

4.2 Model Analysis and Performance Metrics

4.2.1 Model Analysis

We pre-trained ConvNet ResNet18 as a fixed feature extractor where we freeze the

parameters so that the gradients are not computed backward. Here the ResNet18 model

is loaded and pre-trained set to true. The test loss (12.85%) and accuracy (95.97%)

respectively. The best test accuracy (96.00%) and training time was 472m 26s. When

fine-tuning the model the new parameters was set as a requirements of the algorithm

28


Figure 4.1: COVID-19 Positive and COVID 19 Negative. Image source (26, 55)

to true otherwise false. The parameters used in this model were adopted from the

state-of-the-art algorithm of ResNet18 (36, 45). Table 4.1 displays the parameters and

their optimal values.

29


Table 4.1: Training Model Parameters

ResNet18 Pre-trained
Parameter Value
Learning rate 0.01 %
Momentum 90 %
Epochs 100
cut off threshold 10 %
Batch size 20
Criterion nn.CrossEntropyLoss
Optimizer Adam
Gamma 0.1

When finetuning the ResNet18 model we did not freeze the parameters as we allowed

the model to learn and optimize new parameters. The initial parameters used in the

pre-trained model was used here as starting values and updated as the model finetune,

train and learns from the data.

The finetuned training model achieved test loss (6.96%) and accuracy (97.31%)

respectively. The best test accuracy (97.4%) at epoch 13. The training time was 1016m

35s.

4.2.2 Performance Evaluation

We will evaluate the performance of our trained model of deep ResNet18 learning model

using the 40% test dataset (9). The test datasets are are input to the trained model

to validate the classification model. The performance metrics are Sensitivity/recall and

specificity at a selected threshold, AUC and ROC curve, precision, F1-score, accuracy,

and Cohens kappa (14).

30


Table 4.2: Model Evaluation Metrics

ResNet18
Metrics Scores
Recall 75 %
AUC 98.82 %
Precision 87.5 %
Accuracy 98.56 %
F1-score 80.77 %
Cohens kappa 80.03 %

Table 4.3: Sensitivity and Specificity (28, 54) for ResNet18 models under different cut
off threshold

ResNet18
Threshold Sensitivity Specificity
0.1 0.75 0.995
0.2 0.595 0.999
0.3 0.464 1

Sensitivity =
Correct prediction of COVID-19 positive

Total number of COVID-19 positive dataset
(4.1)

Specificity =
Correct prediction of COVID-19 negative

Total number of COVID-19 negative dataset
(4.2)

Precision =
True COVID-19 Positive

True COVID-19 Positive + False COVID-19 positive
(4.3)

F1− score(61) = 2× Precision x Recall

Precision + Recall
(4.4)

4.2.3 Confusion Matrix

The confusion matrix (38) shows the number of correctly classified images is 75% and

59.5% at cut-off points 0.1 and 0.2 respectively. The classification sensitivity improves

with an increase in the number of quality images.

31


Figure 4.2: Confusion Matrix at thresh = 0.1 & 0.2 for test datasets

Figure 4.2 top illustrates confusion matrix at threshold (0.1) and bottom (0.2).

32


4.2.4 Receiver Operating Characteristics, Area Under the Curve

Figure 4.3: Receiver Operating Characteristics Under Test Datasets

4.2.5 Confidence Interval

The graph fig 4.4 is a visual representation of the distance score using the histogram.

A score below the thesholds is COVID-19 negative while scores above the threshold are

COVID-19 positive.

33


Figure 4.4: Distance scores Histogram under thresh=0.1 & 0.2 for test datasets

34


Chapter 5

Discussion and Conclusion

5.1 Discussion

Confusion matrix in fig 4.2 shows significant a performance in detecting COVID-19 with

performance of over 75%. This is interpreted 75% of all positive COVID-19 datasets is

identified and classified as positive COVID-19 images by the model. On the other hand,

99.5% of negative COVID-19 was classified as negative COVID-19.

A close look at the cut-off threshold of 0.1 and 0.2 shows that the model performs well

at a lower cut-off point. A significant number of COVID-19 positive images is classified

with a probability greater than 0.1. The best threshold for these datasets is at 0.1 but as

we increase the datasets the cut-off threshold can be increased and still achieve a higher

classification performance. The first graph fig 4.4 demonstrates over 75% of positive

COVID-19 images was detected with probability greater than the threshold of 0.1. On

the second graph the negative COVID-19 images were all below the threshold of 0.1.

This is an indication that the model performed well in detecting both the COVID-19

positive and COVID-19 negative datasets.

The ROC curve in fig 4.3 demonstrates that COVID-19 cases can be identified by the

model ResNet18 with an accuracy of (AUC = 98.88%). Using this model to accurately

detect COVID-19 cases can be reliable. A close inspection of the loss function (test

loss = 12.85% ) when the model is used as a feature extracted but when finetuning

that is allowing the model to learn from the data and updates its parameters the test

loss is is cut by half (test loss = 6.96%) and test accuracy improved by 97.4%. In a

super-controlled environment, we expect the (test loss = 0%) which is not the case in

this study.

A close look at other metrics, eq 4.3 (precision = 87.5%), eq 4.1 (recall = 75 %), eq

4.4 (F1-score = 80.77%), and eq 4.2 (specificity = 99.5%), all illustrate that the model

35


was able to detect and classify the COVID-19 positive cases. Precision(87.5%) was used

instead of accuracy because it is the most reliable metric.

5.2 Conclusion

Deep ResNet18 convolutional neural network and transfer learning (65) was used to

identify and classify COVID-19 positive images from COVID-19 negative. The COVID-

19 positive images were examined and certified as COVID infected while the COVID-

19 negatives samples are any datasets collected and stored in public databases before

the breakout of the COVID virus in 2019. These comprise of the classes (Pneumonia,

normal, no finding, Edema, Pleural Effusion, and normal chest x-ray images). The initial

training to extract features was done using the pre-trained model version of ResNet18.

The second training was finetuning the model to learn on its own and produce its new

parameters from the fed datasets. The finetuning improved the model. The model

analysis shows that the finetune ResNet18 model using transfer learning performed

better than the traditional state-of-the-art pre-trained ResNet18 model constructed from

ImageNet datasets. We added several noises by flipping images, adding the non-COVID-

19 images from different sources, manipulating the images to double the size of the

training image set. The pre-trained model results achieved (test accuracy = 95.97%)

and (test loss = 12.85%) respectively. The finetuned model on the test dataset at

threshold (0.1) achieved accuracy (test accuracy = 97.31%) and (test loss = 6.96%)

with (precision = 87.5%), (F1-score = 80.77%), (sensitivity = 75%) and (specificity

= 99.5%). We desire that this paper will help radiologists and medical doctors in the

developing world to detect the abnormalities in chest X-ray images and offer first hand

information aid in the inspection, detection, and diagnosis of chest diseases. By the

time I am concluding this paper we had limited COVID-19 images available. Further

extension of this research work will focus on using much larger COVID-19 or non-

COVID-19 datasets with more focus on systematic review around data acquisition, data

certification, model development and pitfalls, and explanation construction. In addition

to demonstrating and expanding on the statistical aspect of the research work (39). The

success of the model can be extended to real clinical trials.

36


References
[Img] https://towardsdatascience.com/deep-learnings-mathematics-f52b3c4d2576.

Accessed: 2021-05-11.

[CNN] https://makshay.com/neural-network-basics-the-perceptron. Accessed:

2021-05-10.

[Rel] https://medium.com/@toprak.mhmt/activation-functions-for-deep-learning-13d8b9b20e.

Accessed: 2021-05-10.

[Res] https://www.researchgate.net/figure/ResNet-18-Architecture_tbl1_

322476121. Accessed: 2021-05-21.

[5] (December 2012). Imagenet classification with deep convolutional neural networks.

Advances in neural information processing systems. NIPS’12: Proceedings of the 25th

International Conference on Neural Information Processing Systems - Volume 1.

[6] A. Cruzroa, J. C. C. and Gonzalez, F. A. (2011). Visual pattern mining in his-

tology image collections using bag of features,. Artificial Intelligence in Medicine,

52(2):91–106.

[7] Agarap, A. F. (2017). An architecture combining convolutional neural network

(cnn) and support vector machine (svm) for image classification. arXiv preprint

arXiv:1712.03541.

[8] Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv preprint

arXiv:1803.08375.

[9] Ahuja, S., Panigrahi, B. K., Dey, N., Rajinikanth, V., and Gandhi, T. K. (2021).

Deep transfer learning-based automated detection of covid-19 from lung ct scan slices.

Applied Intelligence, 51(1):571–585.

[10] Akhtar, N. and Ragavendran, U. (2020). Interpretation of intelligence in cnn-

pooling processes: a methodological survey. Neural Computing and Applications,

32(3):879–898.

37


[11] Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma,

O., Santamaŕıa, J., Fadhel, M. A., Al-Amidie, M., and Farhan, L. (2021). Review of

deep learning: concepts, cnn architectures, challenges, applications, future directions.

Journal of big Data, 8(1):1–74.

[12] Ardakani, A. A., Kanafi, A. R., Acharya, U. R., Khadem, N., and Mohammadi, A.

(2020). Application of deep learning technique to manage covid-19 in routine clinical

practice using ct images: Results of 10 convolutional neural networks. Computers in

Biology and Medicine, 121:103795.

[13] Awais, M., Müller, H., Tang, T. B., and Meriaudeau, F. (2017). Classification of

sd-oct images using a deep learning approach. In 2017 IEEE International Conference

on Signal and Image Processing Applications (ICSIPA), pages 489–492. IEEE.

[14] Ayyachamy, S., Alex, V., Khened, M., and Krishnamurthi, G. (2019). Medical

image retrieval using resnet-18. In Medical Imaging 2019: Imaging Informatics for

Healthcare, Research, and Applications, volume 10954, page 1095410. International

Society for Optics and Photonics.

[15] Bacon, F. and Tobler, W. (2010). Data, information and knowledge.

[16] Bai, C., Huang, L., Pan, X., Zheng, J., and Chen, S. (2018). Optimization of

deep convolutional neural network for large scale image retrieval. Neurocomputing,

303:60–67.

[17] Balogh, E. P., Miller, B. T., and Ball, J. R. (2015). Improving diagnosis in health

care.

[18] Barata, C., Ruela, M., Francisco, M., Mendonça, T., and Marques, J. S. (2013).

Two systems for the detection of melanomas in dermoscopy images using texture and

color features. IEEE systems Journal, 8(3):965–979.

[19] Basha, S. S., Dubey, S. R., Pulabaigari, V., and Mukherjee, S. (2020). Impact

of fully connected layers on performance of convolutional neural networks for image

classification. Neurocomputing, 378:112–119.

38


[20] Bera, S. and Shrivastava, V. K. (2020). Analysis of various optimizers on deep

convolutional neural network model in the application of hyperspectral remote sensing

image classification. International Journal of Remote Sensing, 41(7):2664–2683.

[21] Boddeti, V. N. (2012). Advances in correlation filters: vector features, structured

prediction and shape alignment. PhD thesis, Carnegie Mellon University.

[22] Boureau, Y.-L., Ponce, J., and LeCun, Y. (2010). A theoretical analysis of feature

pooling in visual recognition. In Proceedings of the 27th international conference on

machine learning (ICML-10), pages 111–118.

[23] Carayon, P., Hundt, A. S., Karsh, B., Gurses, A. P., Alvarado, C., Smith, M., and

Brennan, P. F. (2006). Work system design for patient safety: the seips model. BMJ

Quality & Safety, 15(suppl 1):i50–i58.

[24] Celebi, M. E., Kingravi, H. A., Uddin, B., Iyatomi, H., Aslandogan, Y. A., Stoecker,

W. V., and Moss, R. H. (2007). A methodological approach to the classification of

dermoscopy images. Computerized Medical imaging and graphics, 31(6):362–373.

[25] Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., and Ma, Y. (2015). Pcanet: A

simple deep learning baseline for image classification? IEEE transactions on image

processing, 24(12):5017–5032.

[26] Cohen, J. P., Morrison, P., Dao, L., Roth, K., Duong, T. Q., and Ghassemi, M.

(2020). Covid-19 image data collection: Prospective predictions are the future. arXiv

preprint arXiv:2006.11988.

[27] Cong, J. and Xiao, B. (2014). Minimizing computation in convolutional neural

networks. In International conference on artificial neural networks, pages 281–290.

Springer.

[28] Dixon, W. J. and Mood, A. M. (1948). A method for obtaining and analyzing

sensitivity data. Journal of the American Statistical Association, 43(241):109–126.

[29] G. E. Hinton, S. O. and Teh, Y.-W. (2006). A fast learning algorithm for deep

belief nets,. Neural Computation, 18(7):1527–1554.

39


[30] Gaynor, M., Wyner, G., and Gupta, A. (2014). Dr. watson? balancing automation

and human expertise in healthcare delivery. In International Symposium On Lever-

aging Applications of Formal Methods, Verification and Validation, pages 561–569.

Springer.

[31] Giri, C. (2020). 3d convolution neural networks for medical imaging; classification

and segmentation: A doctor’s third eye. Master’s thesis, University of Agder.

[32] Glasser, O. (1993). Wilhelm Conrad Röntgen and the early history of the Roentgen

rays. Number 1. Norman Publishing.

[33] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X.,

and Wang, G. (2015). Recent advances in convolutional neural networks. CoRR,

abs/1512.07108.

[34] Hashemi, M. (2019). Enlarging smaller images before inputting into convolutional

neural network: zero-padding vs. interpolation. Journal of Big Data, 6(1):1–13.

[35] Hassan, M. (2019). Resnet (34, 50, 101): Residual cnns for image classification

tasks. Neurohive. io.

[36] He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image

recognition.

[37] He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image

recognition. In Proceedings of the IEEE conference on computer vision and pattern

recognition, pages 770–778.

[38] Heidari, M., Mirniaharikandehei, S., Khuzani, A. Z., Danala, G., Qiu, Y., and

Zheng, B. (2020). Improving the performance of cnn to predict the likelihood of covid-

19 using chest x-ray images with preprocessing algorithms. International journal of

medical informatics, 144:104284.

[39] Hryniewska, W., Bombiński, P., Szatkowski, P., Tomaszewska, P., Przelaskowski,

A., and Biecek, P. (2020). Checklist for responsible deep learning modeling of medical

images based on covid-19 detection studies. arXiv preprint arXiv:2012.08333.

40


[40] Huang, J.-T., Li, J., Yu, D., Deng, L., and Gong, Y. (2013). Cross-language

knowledge transfer using multilingual deep neural network with shared hidden layers.

In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing,

pages 7304–7308. IEEE.

[41] Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network

training by reducing internal covariate shift. In International conference on machine

learning, pages 448–456. PMLR.

[42] Jeon, J. and Lee, S. (2018). Reconstruction-based pairwise depth dataset for depth

image enhancement using cnn. In Proceedings of the European Conference on Com-

puter Vision (ECCV), pages 422–438.

[43] Jutel, A. (2009). Sociology of diagnosis: a preliminary review. Sociology of health

& illness, 31(2):278–299.

[44] Kim, J., Lee, J. K., and Lee, K. M. (2016). Accurate image super-resolution using

very deep convolutional networks. In Proceedings of the IEEE conference on computer

vision and pattern recognition, pages 1646–1654.

[45] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification

with deep convolutional neural networks. Advances in neural information processing

systems, 25:1097–1105.

[46] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification

with deep convolutional neural networks. Communications of the ACM, 60(6):84–90.

[47] Lee, H. and Song, J. (2019). Introduction to convolutional neural network using

keras; an understanding from a statistician. Communications for Statistical Applica-

tions and Methods, 26:591–610.

[48] Lee, J.-G., Jun, S., Cho, Y.-W., Lee, H., Kim, G. B., Seo, J. B., and Kim, N.

(2017). Deep learning in medical imaging: general overview. Korean journal of

radiology, 18(4):570.

[49] Liu, J.-e. and An, F.-P. (2020). Image classification algorithm based on deep

learning-kernel function. Scientific programming, 2020.

41


[50] Loshchilov, I. and Hutter, F. (2017). Decoupled weight decay regularization. arXiv

preprint arXiv:1711.05101.

[51] Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy,

A., and Yger, F. (2018). A review of classification algorithms for eeg-based brain–

computer interfaces: a 10 year update. Journal of neural engineering, 15(3):031005.

[52] Lv, Z., Zhang, P., and Atli Benediktsson, J. (2017). Automatic object-oriented,

spectral-spatial feature extraction driven by tobler’s first law of geography for very

high resolution aerial imagery classification. Remote Sensing, 9(3).

[53] M. R. Zare, A. Mueen, M. A. and Seng, W. C. (2013). Automatic classification

of medical x-ray images: hybrid generative-discriminative approach,. IET Image

Processing, 7(5):523–532.

[54] Maior, C. B., Santana, J. M., Lins, I. D., and Moura, M. J. (2021). Convolutional

neural network model based on radiological images to support covid-19 diagnosis:

Evaluating database biases. Plos one, 16(3):e0247839.

[55] Minaee, S., Kafieh, R., Sonka, M., Yazdani, S., and Soufi, G. J. (2020). Deep-covid:

Predicting covid-19 from chest x-ray images using deep transfer learning. Medical

image analysis, 65:101794.

[56] Mwadulo, M. W. (2016). A review on feature selection methods for classification

tasks.

[57] Nguyen, A.-D., Choi, S., Kim, W., Ahn, S., Kim, J., and Lee, S. (2019). Distribution

padding in convolutional neural networks. In 2019 IEEE International Conference on

Image Processing (ICIP), pages 4275–4279. IEEE.

[58] Organization(WHO), W. H. (2012). International classification of diseases (icd).

[59] Ozturk, T., Talo, M., Yildirim, E. A., Baloglu, U. B., Yildirim, O., and Acharya,

U. R. (2020). Automated detection of covid-19 cases using deep neural networks with

x-ray images. Computers in biology and medicine, 121:103792.

[60] Pathak, Y., Shukla, P. K., Tiwari, A., Stalin, S., and Singh, S. (2020). Deep transfer

learning based classification model for covid-19 disease. Irbm.

42


[61] Pehrson, J. and Lindstrand, S. (2020). Support unit classification through super-

vised machine learning.

[62] Perez, L. and Wang, J. (2017). The effectiveness of data augmentation in image

classification using deep learning. arXiv preprint arXiv:1712.04621.

[63] Pham, D. L., Xu, C., and Prince, J. L. (2000). Current methods in medical image

segmentation. Annual review of biomedical engineering, 2(1):315–337.

[64] Qi, X., Wang, T., and Liu, J. (2017). Comparison of support vector machine and

softmax classifiers in computer vision. In 2017 Second International Conference on

Mechanical, Control and Computer Engineering (ICMCCE), pages 151–155. IEEE.

[65] Rahman, T., Chowdhury, M. E., Khandakar, A., Islam, K. R., Islam, K. F., Mah-

bub, Z. B., Kadir, M. A., and Kashem, S. (2020). Transfer learning with deep con-

volutional neural network (cnn) for pneumonia detection using chest x-ray. Applied

Sciences, 10(9):3233.

[66] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul,

A., Langlotz, C., Shpanskaya, K., et al. (2017). Chexnet: Radiologist-level pneumonia

detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225.

[67] Ramachandran, P., Zoph, B., and Le, Q. V. (2017). Searching for activation func-

tions. arXiv preprint arXiv:1710.05941.

[68] Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv

preprint arXiv:1609.04747.

[69] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,

Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). Imagenet

large scale visual recognition challenge. 115(3):211–252.

[70] Shen, F., Gan, R., and Zeng, G. (2016). Weighted residuals for very deep networks.

In 2016 3rd International Conference on Systems and Informatics (ICSAI), pages

936–941. IEEE.

43


[71] Shijie, J., Ping, W., Peiyi, J., and Siping, H. (2017). Research on data augmenta-

tion for image classification based on convolution neural networks. In 2017 Chinese

automation congress (CAC), pages 4165–4170. IEEE.

[72] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for

large-scale image recognition. arXiv preprint arXiv:1409.1556.

[73] Song, M. and Civco, D. (2004). Road extraction using svm and image segmentation.

Photogrammetric Engineering & Remote Sensing, 70(12):1365–1371.

[74] Srivastava, R. K., Greff, K., and Schmidhuber, J. (2015). Training very deep

networks. arXiv preprint arXiv:1507.06228.

[75] Sun, M., Song, Z., Jiang, X., Pan, J., and Pang, Y. (2017). Learning pooling for

convolutional neural network. Neurocomputing, 224:96–104.

[76] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,

Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(CVPR).

[77] Tang, H., Ortis, A., and Battiato, S. (2019). The impact of padding on image

classification by using pre-trained convolutional neural networks. In International

Conference on Image Analysis and Processing, pages 337–344. Springer.

[78] Tsang, S.-H. (2018). Review: Resnet—winner of ilsvrc 2015 (image classification,

localization, detection).

[79] Veit, A., Wilber, M., and Belongie, S. (2016). Residual networks behave like en-

sembles of relatively shallow networks. arXiv preprint arXiv:1605.06431.

[80] Wang, J., Perez, L., et al. (2017). The effectiveness of data augmentation in image

classification using deep learning. Convolutional Neural Networks Vis. Recognit, 11.

[81] Wang, S.-H., Phillips, P., Sui, Y., Liu, B., Yang, M., and Cheng, H. (2018). Classi-

fication of alzheimer’s disease based on eight-layer convolutional neural network with

leaky rectified linear unit and max pooling. Journal of medical systems, 42(5):1–11.

44


[82] Xiaosong Wang, Yifan Peng, L. L. Z. L. M. B. R. S. (2017). Chestx-ray8: Hospital-

scale chest x-ray database and benchmarks on weakly-supervised classification and

localization of common thorax diseases, pp. 3462-3471.

[83] Y. Zhang, Z. Dong, A. L. e. a. (2015). Magnetic resonance brain image classification

via stationary wavelet transform and generalized eigenvalue proximal support vector

machine,. Journal of Medical Imaging and Health Informatics, 5(7):1395–1403.

[84] Yadav, S. S. and Jadhav, S. M. (2019). Deep convolutional neural network based

medical image classification for disease diagnosis. Journal of Big Data, 6(1):1–18.

[85] Yang, M.-H., Kriegman, D. J., and Ahuja, N. (2002). Detecting faces in images: A

survey. IEEE Transactions on pattern analysis and machine intelligence, 24(1):34–58.

[86] Yu, D., Wang, H., Chen, P., and Wei, Z. (2014). Mixed pooling for convolutional

neural networks. In International conference on rough sets and knowledge technology,

pages 364–375. Springer.

[87] Yu, X. and Wang, S.-H. (2019). Abnormality diagnosis in mammograms by transfer

learning based on resnet18. Fundamenta Informaticae, 168(2-4):219–230.

[88] Zaniolo, L. and Marques, O. (2020). On the use of variable stride in convolutional

neural networks. Multimedia Tools and Applications, 79(19):13581–13598.

[89] Zhao, Z., Li, X., Liu, F., Zhu, G., Ma, C., and Wang, L. (2020). Prediction of the

covid-19 spread in african countries and implications for prevention and control: A

case study in south africa, egypt, algeria, nigeria, senegal and kenya. Science of the

Total Environment, 729:138959.

[90] Zhou, M., Pan, Z., Liu, Y., Zhang, Q., Cai, Y., and Pan, H. (2019). Leak detection

and location based on islmd and cnn in a pipeline. IEEE Access, 7:30457–30464.

[91] Zoumpourlis, G., Doumanoglou, A., Vretos, N., and Daras, P. (2017). Non-linear

convolution filters for cnn-based learning. In Proceedings of the IEEE International

Conference on Computer Vision, pages 4761–4769.

45


Appendix A

Python Codes
We will break down the python codes into the training and inference section. The

training section discusses the use of the pre-trained model as a feature extractor, we

will not alter the parameter of the state-of-the-art algorithm. We then supply our

parameters and fine-tune the model. The inference section will use the saved fine-tuned

model and measure the model performance tests on the test datasets.

A.1 Trained model

1

2 # Laban Bore

3 # April 2021

4 # Training code for covid detection by finetuning ResNet18

5

6 # Inference code for covid detection

7

8 from __future__ import print_function , division

9 import torch , os , copy , time , pickle

10 import torch.nn as nn

11 import torch.nn.functional as F

12 import torch.optim as optim

13 from torch.optim import lr_scheduler

14 import torchvision # packages for loading the data

15 from torchvision import datasets ,models , transforms

16 from torch.autograd import Variable

17 import matplotlib.pyplot as plt

18 import numpy as np

19 from PIL import Image

20 import pandas as pd

21 from torchvision.datasets.folder import IMG_EXTENSIONS

46


22 from torchvision.datasets import ImageFolder

23 from sklearn.metrics import confusion_matrix

24 import glob , pickle

25 import seaborn as sn

26 import argparse

27 import cv2

28

29 start_time= time.time()

30

31 plt.ion() # interactive mode

32

33 # parser.add_argument ("-f", "--fff", help="a dummy argument to fool

ipython", default ="1")

34 # set the epochs number of iteration

35 # parser.add_argument(’--dataset_path ’, type=str , default =’./data/’,

36 # help=’training and train/test dataset ’) # path

for test and train dataset

37

38 parser = argparse.ArgumentParser(description=’COVID -19 Positive

Detection from X-ray Images ’)

39 parser.add_argument(’--test_covid_path ’, type=str , default=’C:/ Users/

Borel/x-rays -datasets/train/covid/’,

40 help=’Positive COVID -19 test samples directory ’)

41 parser.add_argument(’--test_non_covid_path ’, type=str , default=’C:/

Users/Borel/x-rays -datasets/test/non’,

42 help=’Negative COVID test samples directory ’)

43 parser.add_argument(’--epochs ’, type=int , default =100,

44 help=’number of epochs to train (default: 100)’)

45 parser.add_argument(’--trained_model_path ’, type=str , default=’C:/

Users/Borel/covid_resnet18_epoch2.pt’,

46 help=’The path and name of trained model’)

47 parser.add_argument("-f", "--fff", help="a dummy argument to fool

ipython", default="1")

48 parser.add_argument(’--cut_off_threshold ’, type=float , default= 0.2,

49 help=’cut -off threshold. Any sample with

probability higher than this is considered COVID -19 (default: 0.2)’

)

50 parser.add_argument(’--batch_size ’, type=int , default =20,

47


51 help=’input batch size for training (default: 20)’

)

52 parser.add_argument(’--num_workers ’, type=int , default=0,

53 help=’number of workers to train (default: 0)’)

54 parser.add_argument(’--learning_rate ’, type=float , default =0.0001 ,

55 help=’learning rate (default: 0.0001) ’) # a faster

learning rate of .01%

56 parser.add_argument(’--momentum ’, type=float , default =0.9,

57 help=’momentum (default: 0.9)’) #set the momentum

of 90%

58 args = parser.parse_args ()

59

60 # Data augmentation and normalization for training

61 data_transforms = {

62 ’train ’: transforms.Compose ([

63 transforms.Resize (224) ,

64 transforms.RandomResizedCrop (224) ,

65 transforms.RandomHorizontalFlip (),

66 transforms.ToTensor (),

67 transforms.Normalize ([0.485 , 0.456 , 0.406] , [0.229 , 0.224 ,

0.225])

68 ]),

69 ’test’: transforms.Compose ([

70 transforms.Resize (224) ,

71 transforms.CenterCrop (224) ,

72 transforms.ToTensor (),

73 transforms.Normalize ([0.485 , 0.456 , 0.406] , [0.229 , 0.224 ,

0.225])

74 ])

75 } # function to transform the train and test data

76

77 data_dir= "C:/Users/Borel/x-rays -datasets/" # path to the local

directory of the data

78

79 #data_dir = args.dataset_path

80

81 image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir , x),

data_transforms[x])

48


82 for x in [’train’, ’test’]} #apply the

data_transform func to dataset in the folder train and test

83

84 dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x],

batch_size= args.batch_size ,

85 shuffle=True , num_workers

= args.num_workers) # load the images under the folders in batches

86 for x in [’train’, ’test’]}

87

88 dataset_sizes = {x: len(image_datasets[x]) for x in [’train ’, ’test’]}

# define the size of the images

89 class_names = image_datasets[’train ’]. classes ## 0: child , and 1:

nonchild # this shows the folders [’covid ’, ’non ’]

90 class_names_test = image_datasets[’test’]. classes

91 dataset_sizes

92

93 def imshow(inp , title= None):

94 """ Imshow for Tensor.""" #show image

95 inp = inp.numpy().transpose ((1, 2, 0))

96 mean = np.array ([0.485 , 0.456, 0.406])

97 std = np.array ([0.229 , 0.224 , 0.225])

98 inp = std * inp + mean

99 inp = np.clip(inp , 0, 1)

100 plt.imshow(inp)

101 if title is not None:

102 plt.title(title)

103 plt.pause (0.001) # pause a bit so that plots are updated

104

105 plt.imshow ()

106

107 # Get a batch of training data and visualize a few images

108 inputs , classes = next(iter(dataloaders[’train’]))

109

110 # Make a grid from batch

111 out = torchvision.utils.make_grid(inputs)

112

113 imshow(out , title=[ class_names[x] for x in classes ])

114

49


115 def train_model(model , criterion , optimizer , scheduler , batch_size ,

num_epochs= 20): #construct the model using train dataset

116 since = time.time()

117

118 best_model_wts = copy.deepcopy(model.state_dict ())

119 best_acc = 0.0

120 train_acc= list()

121 test_acc= list()

122

123 for epoch in range(num_epochs):

124 print(’Epoch {}/{} ’.format(epoch+1, num_epochs))

125 print(’-’ * 10)

126

127 # Each epoch has a training and validation/test phase

128 for phase in [’train’, ’test’]:

129 if phase == ’train’:

130 scheduler.step()

131 model.train () # Set model to training mode

132 else:

133 model.eval() # Set model to evaluate/test mode

134

135 running_loss = 0.0

136 running_corrects = 0

137 running_prec= 0.0

138 running_rec = 0.0

139 running_f1 = 0.0

140

141 # Iterate over data.

142 cur_batch_ind= 0

143 for inputs , labels in dataloaders[phase]:

144 #print(cur_batch_ind ," batch inputs shape:", inputs.

shape)

145 #print(cur_batch_ind ," batch label shape:", labels.

shape)

146 inputs = inputs.to(device)

147 labels = labels.to(device)

148

149

150 # zero the parameter gradients

50


151 optimizer.zero_grad ()

152

153 # forward

154 # track history if only in train

155 with torch.set_grad_enabled(phase == ’train’):

156 outputs = model(inputs)

157 _, preds = torch.max(outputs , 1)

158 loss = criterion(outputs , labels)

159

160 # backward + optimize only if in training phase

161 if phase == ’train’:

162 loss.backward ()

163 optimizer.step()

164

165 # statistics

166 running_loss += loss.item() * inputs.size (0)

167 running_corrects += torch.sum(preds == labels.data)

168

169 cur_acc= torch.sum(preds == labels.data).double ()/

batch_size

170 cur_batch_ind +=1

171 print("\npreds:", preds)

172 print("label:", labels.data)

173 print("%d-th epoch , %d-th batch (size=%d), %s acc= %.3

f \n" %( epoch+1, cur_batch_ind , len(labels), phase , cur_acc ))

174

175 if phase==’train’:

176 train_acc.append(cur_acc)

177 else:

178 test_acc.append(cur_acc)

179

180 epoch_loss= running_loss / dataset_sizes[phase]

181 epoch_acc = running_corrects.double () / dataset_sizes[

phase]

182

183 print(’{} Loss: {:.4f} Acc: {:.4f} \n\n’.format(

184 phase , epoch_loss , epoch_acc))

185

186 # deep copy the model

51


187 if phase == ’test’ and epoch_acc > best_acc:

188 best_acc = epoch_acc

189 best_epoch= epoch

190 best_model_wts = copy.deepcopy(model.state_dict ())

191

192 time_elapsed = time.time() - since

193 print(’Training complete in {:.0f}m {:.0f}s’.format(

194 time_elapsed // 60, time_elapsed % 60))

195 print(’Best test Acc= %.3f at Epoch: %d’ %(best_acc ,best_epoch) )

196

197 # load best model weights

198 model.load_state_dict(best_model_wts)

199 return model , train_acc , test_acc

200

201 def visualize_model(model , num_images= 64):

202 was_training = model.training

203 model.eval()

204 images_so_far = 0

205 fig = plt.figure ()

206

207 with torch.no_grad ():

208 for i, (inputs , labels) in enumerate(dataloaders[’test’]):

209 inputs = inputs.to(device)

210 labels = labels.to(device)

211

212 outputs = model(inputs)

213 _, preds = torch.max(outputs , 1)

214

215 for j in range(inputs.size()[0]):

216 images_so_far += 1

217 ax = plt.subplot(num_images /8, 8, images_so_far)

218 ax.axis(’off’)

219 ax.set_title(’predicted: {}’.format(class_names[preds[

j]]))

220 imshow(inputs.cpu().data[j])

221

222 if images_so_far == num_images:

223 model.train(mode=was_training)

224 return

52


225 model.train(mode=was_training)

226

227 #1. ConvNet as fixed feature extractor

228

229 # Here , we need to freeze all the network except the final layer.

230 # We need to set requires_grad == False to freeze the parameters so

that the gradients are not computed in backward ()

231

232 ## Load the pretrained model and reset final fully connected layer

233 model_conv = torchvision.models.resnet18(pretrained=True) #load the

pretrained ResNet18 model

234 for param in model_conv.parameters ():

235 param.requires_grad = False

236

237 # Parameters of newly constructed modules have requires_grad=True by

default

238

239 # Here the size of each output sample is set to 2.

240 # Alternatively , it can be generalized to nn.Linear(num_ftrs , len(

class_names)).

241 num_ftrs = model_conv.fc.in_features

242 model_conv.fc = nn.Linear(num_ftrs , 2)

243

244 model_conv = model_conv.to(device)

245 criterion = nn.CrossEntropyLoss ()

246 #criterion = nn.BCELoss ()

247

248 # Observe that only parameters of final layer are being optimized as

249 # opoosed to before.

250 optimizer_conv = optim.SGD(model_conv.fc.parameters (), lr= args.

learning_rate , momentum= args.momentum)

251

252 # Decay LR by a factor of 0.1 every 7 epochs

253 exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv , step_size=7,

gamma =0.1)

254

255 # Train and evaluate

256 if __name__ == "__main__":

53


257 model_conv , train_acc , test_acc = train_model(model_conv ,

criterion , optimizer_conv , exp_lr_scheduler , args.batch_size ,

num_epochs= args.epochs)

258 model_conv.eval()

259 torch.save(model_conv , ’./ covid_resnet18_epoch%d.pt’ %args.epochs

) # save the model here for inferences

260

261 end_time= time.time()

262 print("total_time tranfer learning=", end_time - start_time)

263

264 visualize_model(model_conv) #visualize the model

265

266 # 2. Finetuning the convnet

267

268 # The model updates the parameters and use the new parameters. We will

use the new parameters from one used initially in the pre -trained

model

269 # We need to set requires_grad == TRUE to use updated parameters , the

gradients are computed in backward ()

270 model_ft = models.resnet18(pretrained=True)

271 num_ftrs = model_ft.fc.in_features

272

273 # Here the size of each output sample is set to 2.

274 # Alternatively , it can be generalized to nn.Linear(num_ftrs , len(

class_names)).

275 model_ft.fc = nn.Linear(num_ftrs , 2)

276

277 model_ft = model_ft.to(device)

278

279 criterion = nn.CrossEntropyLoss ()

280

281 # Observe that all parameters are being optimized

282 optimizer_ft = optim.SGD(model_ft.parameters (), lr= args.

learning_rate , momentum= args.momentum)

283

284 # Decay LR by a factor of 0.1 every 7 epochs

285 exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft , step_size=7,

gamma =0.1)

286

54


287 # Train and evaluate

288 if __name__ == "__main__":

289 model_ft , train_acc , test_acc = train_model(model_ft , criterion ,

optimizer_ft , exp_lr_scheduler , args.batch_size , num_epochs= args.

epochs)

290 model_ft.eval()

291 torch.save(model_ft , ’./ covid_resnet18_epoch%d.pt’ %args.epochs )

#save the model for inference part.

292

293 visualize_model(model_ft)

A.2 Inferences

1

2 # Utility function to find sensitivity and specificity for different

cut -off thresholds

3 def find_sens_spec( covid_prob , noncovid_prob , thresh):

4 sensitivity= (covid_prob >= thresh).sum() / (len(covid_prob)+1e

-10)

5 specificity= (noncovid_prob < thresh).sum() / (len(noncovid_prob)

+1e-10)

6 print("sensitivity= %.3f, specificity= %.3f" %( sensitivity ,

specificity))

7 return sensitivity , specificity

8

9 class_names = [’covid ’,’non’]

10 fig , axs = plt.subplots (3,5, figsize =(100, 100), dpi =100)

11 fig.subplots_adjust( hspace =-0.5, wspace =0.2)

12 axs = axs.ravel()

13 for i, img in enumerate(glob.glob(’C:/ Users/Borel/x-rays -datasets/

common -disease /*’)):

14 image = cv2.imread(img)

15 axs[i].axis(’off’)

16 axs[i]. imshow(cv2.cvtColor(image , cv2.COLOR_BGR2RGB))

17 axs[i]. set_title(img[-20:-3], fontsize= 50)

18 #plt.savefig (’./chest -xrays -images.png ’)

19 fig.savefig(’chest -xrays -images.png’) # sample display of positive

55


and negative covid -19 images

20

21 # Test on trained model

22 model_name= args.trained_model_path

23 model= torch.load(model_name , map_location=’cpu’)

24 #model.eval()

25

26 # loading new images

27 imsize= 224

28 loader = transforms.Compose ([ transforms.Resize(imsize),

29 transforms.CenterCrop (224) ,

30 transforms.ToTensor (),

31 transforms.Normalize ([0.485 , 0.456 ,

0.406] , [0.229 , 0.224 , 0.225])

32 ])

33

34 def image_loader(image_name):

35 """ load image , returns cuda tensor """

36 image = Image.open(image_name).convert("RGB")

37 image = loader(image).float()

38 image = Variable(image , requires_grad=True)

39 image = image.unsqueeze (0) #this is for VGG , may not be needed

for ResNet

40 return image

41

42 sm = torch.nn.Softmax ()

43 # Get the predicted probabilities of all samples

44 test_covid = glob.glob("%s*" %args.test_covid_path) #84

45 test_non = glob.glob("%s*" %args.test_non_covid_path) #2000

46

47 covid_pred= np.zeros ([len(test_covid) ,1]).astype(int)

48 non_pred = np.zeros ([len(test_non) ,1]).astype(int)

49

50 covid_prob= np.zeros ([len(test_covid) ,1])

51 non_prob = np.zeros ([len(test_non) ,1])

52

53 for i in range(len(test_covid)):

54 cur_img= image_loader(test_covid[i])

55 model_output= model(cur_img)

56


56 cur_pred = model_output.max(1, keepdim=True)[1]

57 cur_prob = sm(model_output)

58 covid_prob[i,:]= cur_prob.data.numpy ()[0,0]

59 print("%03d Covid predicted label :%s" %(i, class_names[int(

cur_pred.data.numpy ())]) )

60

61 for i in range(len(test_non)):

62 cur_img= image_loader(test_non[i])

63 model_output= model(cur_img)

64 cur_pred = model_output.max(1, keepdim=True)[1]

65 cur_prob = sm(model_output)

66 non_prob[i,:]= cur_prob.data.numpy ()[0,0]

67 print("%03d Non -Covid predicted label :%s" %(i, class_names[int(

cur_pred.data.numpy ())]) )

68

69 # Find sensitivity and specificity

70 thresh= 0.1

71 sensitivity_40 , specificity= find_sens_spec( covid_prob , non_prob ,

thresh)

72 # Find sensitivity and specificity

73 thresh= 0.2

74 sensitivity_40 , specificity= find_sens_spec( covid_prob , non_prob ,

thresh)

75 # Find sensitivity and specificity

76 thresh= 0.3

77 sensitivity_40 , specificity= find_sens_spec( covid_prob , non_prob ,

thresh)

78

79 # Derive labels based on probabilities and cut -off threshold

80 covid_pred = np.where( covid_prob >thresh , 1, 0)

81 non_pred = np.where( non_prob >thresh , 1, 0)

82

83 # Derive confusion -matrix

84 covid_list= [int(covid_pred[i]) for i in range(len(covid_pred))]

85 covid_count = [(x, covid_list.count(x)) for x in set(covid_list)]

86

87 non_list= [int(non_pred[i]) for i in range(len(non_pred))]

88 non_count = [(x, non_list.count(x)) for x in set(non_list)]

89

57


90 y_pred_list= covid_list+non_list

91 y_test_list= [1 for i in range(len(covid_list))]+[0 for i in range(len

(non_list))]

92

93 y_pred= np.asarray(y_pred_list , dtype=np.int64)

94 y_test= np.asarray(y_test_list , dtype=np.int64)

95

96 cnf_matrix = confusion_matrix(y_test , y_pred)

97 np.set_printoptions(precision =2)

98

99 # Plot normalized confusion matrix

100 df_cm = pd.DataFrame(cnf_matrix , index = [i for i in class_names],

101 columns = [i for i in class_names ])

102

103 ax = sn.heatmap(df_cm , cmap=plt.cm.Blues , annot=True , cbar=False , fmt=

’g’, xticklabels= [’COVID -19 Negative ’,’COVID -19 Positive ’],

yticklabels= [’COVID -19 Negative ’,’COVID -19 Positive ’])

104 ax.set_title("Confusion matrix")

105 plt.savefig(’./ confusion_matrix.png’) #dpi = 200

106

107 # plot the predicted probability distribution

108 bins = np.linspace(0, 1, 30)

109 plt.subplot (211)

110 plt.hist(covid_prob , bins , color= ’#70 cab9’, histtype = ’bar’, label=’

Probabilities of COVID -19 positive at thresh =0.1 ’)

111 plt.ylim ([0 ,10])

112 plt.legend(loc=’upper right’)

113 plt.subplot (212)

114 plt.hist(non_prob , bins , color= ’red’, label=’Probabilities of COVID

-19 negative at thresh =0.1 ’)

115 plt.legend(loc=’upper right’)

116 plt.savefig(’./ scores_histogram -thesis -best.png’) #dpi = 200

117

118 # ROC Curve and AUC

119 from sklearn.metrics import roc_curve

120 from sklearn.metrics import roc_auc_score

121 from matplotlib import pyplot

122

123 y_test_res18= [1 for i in range(len(covid_prob))]+[0 for i in range(

58


len(non_prob))]

124 y_pred_res18= [covid_prob[i] for i in range(len(covid_prob))]+[

non_prob[i] for i in range(len(non_prob))]

125

126 auc_res18 = roc_auc_score(y_test_res18 , y_pred_res18)

127 ns_fpr_res18 , ns_tpr_res18 , _ = roc_curve(y_test_res18 , y_pred_res18)

128

129 plt.figure ()

130 pyplot.plot(ns_fpr_res18 , ns_tpr_res18 , color=’blue’, linewidth=3,

label=’ResNet18 , AUC= %.3f’ %auc_res18)

131 pyplot.ylim ([0 ,1.05])

132 pyplot.xlabel(’False Positive Rate’)

133 pyplot.ylabel(’True Positive Rate’)

134 pyplot.title("ROC Curve")

135 # show the legend

136 pyplot.legend(loc=’lower right’)

137 plt.savefig(’./ ROC_covid19 -thesis -best.png’) #dpi = 200

138

139 #accuracy

140 #precision

141 #recall

142

143 from sklearn.metrics import accuracy_score

144 from sklearn.metrics import precision_score

145 from sklearn.metrics import recall_score

146 from sklearn.metrics import f1_score

147 from sklearn.metrics import cohen_kappa_score

148 from sklearn.metrics import confusion_matrix

149

150 precision = precision_score(y_test_res18 , y_pred)

151 print(’precision: %f’ % precision)

152

153 accuracy = accuracy_score(y_test_res18 , y_pred)

154 print(’accuracy: %f’ % accuracy)

155

156 recall = recall_score(y_test_res18 , y_pred)

157 print(’recall: %f’ % recall)

158

159 f1 = f1_score(y_test_res18 , y_pred)

59


160 print(’f1: %f’ % f1)

161

162 kappa = cohen_kappa_score(y_test_res18 , y_pred)

163 print(’Cohens kappa: %f’ % kappa)

164

165 auc = roc_auc_score(y_test_res18 , y_pred_res18)

166 print(’ROC AUC: %f’ % auc)

167

168

169 matrix = confusion_matrix(y_test_res18 , y_pred)

170 print(matrix)

171 #precision_score(y_test_res18 , y_pred_res18 , average=None)

172 #auc = roc_auc_score(y_test_res18 , y_pred)

173 #print(’ROC AUC: %f’ % auc)

174

175 end_time= time.time()

176 tot_time= end_time - start_time

177 print("\nTotal Time:", tot_time)

60


Appendix B

Similarity Report

1/33

Document Information

Analyzed document Msc_Graduate-Thesis_Statistical_Science_Laban_Bore.pdf (D114144369)

Submitted 2021-10-03 22:56:00

Submitted by

Submitter email borelaban@gmail.com

Similarity 8%

Analysis address library.strath@analysis.urkund.com

Sources included in the report

URL: https://su-

plus.strathmore.edu/bitstream/handle/11071/6789/Consumer%20credit%20risk%20modelling%20usi

ng%20machine%20learning%20algorithms.pdf?sequence=3&isAllowed=y

Fetched: 2021-06-15 08:15:58

6

Face recognition report.pdf
Document Face recognition report.pdf (D107947409)

3

URL: https://arxiv.org/pdf/2007.14777

Fetched: 2021-05-31 00:43:41
1

Image_Classification_using_DL__A_Survey(3).pdf
Document Image_Classification_using_DL__A_Survey(3).pdf (D108434188)

1

URL: https://www.researchgate.net/publication/350202823_Deep-Chest_Multi-

Classification_Deep_Learning_Model_for_Diagnosing_COVID-

19_Pneumonia_and_Lung_Cancer_Chest_Diseases

Fetched: 2021-10-03 23:01:00

1

URL: https://jeremyweidner.github.io/Stop_Sign_Detection.pdf

Fetched: 2020-12-20 22:37:19
1

URL: https://www.medrxiv.org/content/medrxiv/early/2021/02/08/2021.02.06.21251271.source.xml

Fetched: 2021-05-05 08:21:08
4

RR.pdf
Document RR.pdf (D109057024)

2

Xiaoyu Chen_Dissertation_xc866.pdf
Document Xiaoyu Chen_Dissertation_xc866.pdf (D112371002)

11

URL: https://arxiv.org/pdf/2011.05543

Fetched: 2021-08-06 14:22:39
3

final.pdf
Document final.pdf (D72406536)

1

URL: https://arxiv.org/pdf/1807.10406

61


Appendix C

Ethical Approval Letter

62


	Abbreviations
	Introduction
	Background to the study
	Problem Statement
	Research Objectives
	Significance of research

	Literature review
	Introduction

	Methodology
	Introduction
	Data Sources
	Data Preparation
	X-Rays Features and Variables

	Convolution Neural Network Architectural
	Convolution Layers
	Image Analysis
	Filters/Kernel

	Stride and Padding
	Padding
	Stride

	Activation Function
	ReLU

	Pooling
	Fully Connected Layers
	Training CNN
	Loss Function
	Softmax Layers

	Proposed CNN: ResNet18

	Results
	Chest X-Rays Exploration
	Model Analysis and Performance Metrics
	Model Analysis
	Performance Evaluation
	Confusion Matrix
	Receiver Operating Characteristics, Area Under the Curve
	Confidence Interval


	Discussion and Conclusion
	Discussion
	Conclusion

	References
	Appendix Python Codes
	Trained model
	Inferences

	Appendix Similarity Report
	Appendix Ethical Approval Letter