Electronic Theses and Dissertations 2022 A Predictive analytics model for pharmaceutical inventory management. Musimbi, Patience Musanga Strathmore School of Computing and Engineering Strathmore University Recommended Citation Musimbi, P. M. (2022). A Predictive analytics model for pharmaceutical inventory management [Strathmore University]. http://hdl.handle.net/11071/13188 Follow this and additional works at: http://hdl.handle.net/11071/13188 This work is availed for free and open access by Strathmore University Library. It has been accepted for digital distribution by an authorized administrator of SU+ @Strathmore University. For more information, please contact library@strathmore.edu SU + @ Strathmore University Library http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 http://hdl.handle.net/11071/2474 https://su-plus.strathmore.edu/browse/author?value=Musimbi,%20Patience%20Musanga http://hdl.handle.net/11071/13188 http://hdl.handle.net/11071/13188 https://su-plus.strathmore.edu/ https://su-plus.strathmore.edu/ https://su-plus.strathmore.edu/ https://su-plus.strathmore.edu/ https://su-plus.strathmore.edu/ https://su-plus.strathmore.edu/ https://su-plus.strathmore.edu/ A Predictive Analytics Model for Pharmaceutical Inventory Management By Musimbi Patience Musanga 136507 Master of Science in Information Technology 2022 A Predictive Analytics Model for Pharmaceutical Inventory Management By Musimbi Patience Musanga 136507 Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Technology at Strathmore University. School of Computing and Engineering Sciences Strathmore University Nairobi, Kenya. October 2022 This thesis is available for Library use on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement. ii Declaration and Approval Declaration I declare that this work has not been previously submitted and approved for the award of a degree by this or any other University. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made in the thesis itself. © No part of this thesis may be reproduced without the permission of the author and Strathmore University. Student’s Name: Musimbi Patience Musanga Sign: ________________________ Date: ____________________ Approval The thesis of Musimbi Patience Musanga was reviewed and approved for examination by the following: Dr. Allan Omondi, School of Computing & Engineering Sciences, Strathmore University Dr. Julius Butime. Dean, School of Computing & Engineering Sciences, Strathmore University Dr. Bernard Shibwabo, Director of Graduate studies, Strathmore University iii Abstract Inefficient inventory management is a factor that affects pharmacies in Kenya. The unpredictable nature of weather patterns during the traditional long and short rain seasons has resulted in seasons starting earlier or later than expected. Seasonal diseases such as flu may spike up when the temperatures decrease or when the rainy seasons begin, causing an increase in sales of drugs that cure and prevent the flu and vice versa. Due to this unpredictability, pharmacies may fail to stock up or down for different seasons due to unpreparedness and not knowing what to stock and when to stock. Ineffective drug management has a significant financial impact on pharmacies. Inventory management ensures that needed drugs or medicines are always available, in sufficient quantities, of the right type and quality, and are used rationally. An effective drug management process ensures the availability of drugs in the right type and amount in accordance with needs, thereby avoiding drug shortages and excesses. This research proposed a predictive analysis tool that would predict the required drugs or medicines prior to when they are needed, based on sales and seasonality. Another parameter for predictive analysis for this research was the period of the year when a certain disease could be common. This research discussed stocking and inventory management of pharmaceutical products and how predictive analytics with machine learning algorithms could be applied to improve the inventory management process in a pharmacy’s context. The purpose of the study was to examine the inefficient stocking of medicines in pharmacies and use predictive analysis to predict future stock. It reviewed various previous methods used for pharmaceutical inventory management and proposed the SARIMAX model with time series analysis for stock prediction. The result was a model that predicted the quantity of drugs to be stocked for the next six weeks. The six-week prediction model had a Root Mean Squared Error (RMSE) of 5.5. Key words: ARIMA model, Machine learning, inventory management, SARIMAX, time series. iv Table of Contents Declaration and Approval ................................................................................................... ii Abstract .............................................................................................................................. iii List of Figures ..................................................................................................................... x List of Tables ..................................................................................................................... xi List of Models ................................................................................................................... xii Abbreviations/Acronyms ................................................................................................. xiii Operational Definition of Terms ...................................................................................... xiv Chapter 1: Introduction ....................................................................................................... 1 1.1 Background to the study ............................................................................................... 1 1.2 Problem Statement ............................................................................................... 2 1.3 Objectives ..................................................................................................................... 3 1.3.1 General Objective .......................................................................................................3 1.5 Scope and Limitation .................................................................................................... 3 1.6 Justification ................................................................................................................... 4 Chapter 2: Literature Review .............................................................................................. 5 2.1 Introduction ................................................................................................................... 5 2.2 Theoretical Framework ........................................................................................ 5 2.2.1 History of Predictive analytics ...................................................................................6 2.2.2 ARIMA .......................................................................................................................8 2.2.3 SARIMAX...................................................................................................................9 2.3 Empirical Framework ................................................................................................... 9 2.3.1 Pharmacy Inventory Management in Kenya ............................................................9 2.3.2 Climate in Kenya .........................................................................................10 2.4 Previously used Models .............................................................................................. 11 v 2.4.1 Auto-Regressive Integrated Moving Average (ARIMA) and Long Short-Term Memory Models (LSTM) ..................................................................................................11 2.4.2 Support Vector Machine (SVM) and Artificial Neural Networks (ANN) .............12 2.5 Current Methods of Inventory Management............................................................... 13 2.5.1 Perpetual Inventory Systems ...................................................................................13 2.5.2 Automatic Dispensing Systems ................................................................................13 2.5.3 RFID and Barcode Technology ..............................................................................14 2.5.4 Other Related Methods ............................................................................................15 2.6 Medical Inventory Prediction Model .......................................................................... 15 2.7 Conceptual framework ................................................................................................ 16 Chapter 3: Research Methodology.................................................................................... 18 3.1 Introduction ................................................................................................................. 18 3.2 Research Design.......................................................................................................... 18 3.3 System Analysis and Design ....................................................................................... 18 3.4 Target Population ........................................................................................................ 19 3.5 System Development .................................................................................................. 20 3.5.1 Data Collection .........................................................................................................20 3.5.2 Data Analysis............................................................................................................21 3.5.3 Data Pre-processing .................................................................................................21 3.5.4 Model Training ........................................................................................................21 3.5.5 Model Validation ......................................................................................................21 3.5.6 Deployment ...............................................................................................................22 3.6 Research Quality ......................................................................................................... 22 3.7 Ethical Considerations ................................................................................................ 22 Chapter 4: System Analysis, Designs and Architecture ................................................... 23 vi 4.1 Introduction ................................................................................................................. 23 4.2 Data analysis ............................................................................................................... 23 4.3 Requirement analysis .................................................................................................. 24 4.3.1 Functional Requirements ........................................................................................25 4.3.2 Non-Functional Requirements ................................................................................25 4.4 System Architecture .................................................................................................... 26 4.5 System Design ............................................................................................................ 26 4.5.1 Context Diagram ......................................................................................................27 4.5.2 Data Flow Diagram .................................................................................................28 4.5.3 Entity relationship diagram .....................................................................................30 4.6 Wireframes .................................................................................................................. 30 4.6.1 Log in page ...............................................................................................................30 4.6.2 Admin Dashboard ....................................................................................................32 4.6.3 Medication Page .......................................................................................................32 Chapter 5: System Implementation and Testing ............................................................... 34 5.1 Introduction ................................................................................................................. 34 5.2 Development requirements ......................................................................................... 34 5.2.2 Hardware requirements ...........................................................................................34 5.2.3 Software requirements .............................................................................................35 5.3 Model Architecture ..................................................................................................... 35 5.4 Model Development.................................................................................................... 36 5.4.1 Dataset overview .......................................................................................................36 5.4.2 Pre-processing ..........................................................................................................36 5.4.3 Correlation ...............................................................................................................37 5.4.4 Time series Decomposition ......................................................................................38 vii 5.4.5 Stationarity analysis .................................................................................................39 5.4.6 Autocorrelation ........................................................................................................40 5.5 Training the model ...................................................................................................... 40 5.5.1 Time series Forecasting ...........................................................................................40 5.6 ARIMA model ............................................................................................................ 41 5.7 SARIMAX model ....................................................................................................... 41 5.8 Validating the model ................................................................................................... 43 5.8.1 Model Performance Results ....................................................................................43 5.8.2 Root Mean square ....................................................................................................43 5.8.3 Forecasting Inventory with deployed model ...........................................................44 Chapter 6: Discussion ....................................................................................................... 45 6.1 Introduction ................................................................................................................. 45 6.2 Challenges faced in stocking pharmaceutical inventory ............................................. 45 6.3 Previously Used Methods for Inventory Management ............................................... 45 6.4 To identify how pharmacies currently stock ............................................................... 45 6.5 Design of the Model using Predictive Analytics ........................................................ 46 6.6 Validation of the Model .............................................................................................. 46 6.7 Advantages of the Developed Model .......................................................................... 46 6.8 Research Contributions ............................................................................................... 46 6.9 Challenges Encountered.............................................................................................. 47 Chapter 7: Conclusion and Recommendations ................................................................. 48 7.1 Conclusion .................................................................................................................. 48 7.2 Recommendations ....................................................................................................... 48 7.3 Suggestions for Future Research ................................................................................ 49 References ......................................................................................................................... 50 viii Appendices ........................................................................................................................ 55 Appendix A: Gantt Chart .................................................................................................. 55 Appendix B: Sample drug and weather data collected ..................................................... 56 Appendix C: Ethical Approval .......................................................................................... 58 Appendix D: Data Collection Approval ........................................................................... 59 Appendix E: Sample code ................................................................................................. 60 Appendix F: Similarity index............................................................................................ 62 ix Acknowledgements I would like to express my sincere gratitude to the Lord God Almighty for good health, strength, time and grace to undertake this study. I would also like to sincerely thank my immediate supervisor, Dr. Allan Omondi for his constant support, patience and dedication throughout this study. His guidance and commitment were key to achieving the research objectives and this is greatly appreciated. I would also like to thank the thesis coordinator, Dr. Omwenga who assisted with coming up with timely milestones that helped in achieving the research objectives. My father, Mr. James Musanga also offered great support and he is greatly appreciated. Lastly, I would like to thank my collegues and coursemates for their insight and critique during this research. x List of Figures Figure 2.1: Proposed model ...............................................................................................16 Figure 2.2: Conceptual framework of the study ................................................................17 Figure 3.1: Iterative Methodology .....................................................................................19 Figure 3.2: Sample Drug sales data collected ..................................................................219 Figure 3.3: Sample weather data collected ......................................................................219 Figure 4.1: System architecture .........................................................................................26 Figure 4.2: Context diagram ..............................................................................................28 Figure 4.3: Level 1 Data Flow Diagram ............................................................................29 Figure 4.4: Entity Relationship Diagram ...........................................................................30 Figure 4.5: Log in page ......................................................................................................31 Figure 4.6: Admin Dashboard............................................................................................32 Figure 4.7: Predicted medication page...............................................................................33 Figure 5.1: Merged dataset ................................................................................................37 Figure 5.2:Visualization of data .........................................................................................38 Figure 5.3: Time series decomposition view .....................................................................39 Figure 5.4: Autocorrelation ................................................................................................40 Figure 5.5: SARIMAX forecast .........................................................................................42 Figure 5.6: Zefcoln forecast ...............................................................................................43 Figure 5.7: Accuracy report ...............................................................................................43 xi List of Tables Table 2.1: A comparison of prediction methods ................................................................15 Table 5.1: Hardware requirements ...................................................................................150 Table 5.2: Software requirements ....................................................................................151 xii List of Models 1. ARIMA model 2. SARIMAX model xiii Abbreviations/Acronyms AI - Artificial Intelligence ANN - Artificial Neural Network ARIMA- Auto-Regressive Integrated Moving Average BN - Bayesian Network CART - Classification and Regression Trees DFD- Data Flow Diagram DSS - Decision Support System LSTM - Long Short-Term Memory ML - Machine Learning MSE- Mean Square Error NDC- National Drug Code PPB - Pharmacy and Poisons Board RFID - Radio-Frequency Identification RMSE- Root mean squared error SARIMAX- Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors SDLC - Software Development Life Cycle SSAD- Structured System Analysis and Design SVM - Support Vector Machine WHO - World Health Organization xiv Operational Definition of Terms Algorithm - A procedure that is used to find patterns within data and learn from the data. They are implemented in code and are run on data. Dispense – To prepare and distribute medication and other necessities to the sick, as well as to fill a medical prescription. Forecast- The process of using historical data as input to make informed estimates for prediction. Medicine/Drug - Medicines are chemicals or compounds that are used to treat, stop, or prevent disease, relieve symptoms, or aid in the diagnosis of illnesses. Model – This is where the algorithm's output is stored. It represents what was learned from the training of the algorithm on the data and contains a specific set of algorithm features. 1 Chapter 1: Introduction 1.1 Background to the study Drug management in pharmacies is an important component of hospital and pharmacy management. It is also the largest component that absorbs funding apart from medical services such as medical surgery and other types of treatment. According to WHO (WHO, 2015), good drug management constitutes rational selection, efficient quantification and forecasting, procurement, storage, and distribution of drugs. Drug management ensures that needed drugs or medicines are always available, in sufficient quantities, of the right type and quality, and are used rationally (International Finance Corporation, UKAID, & IQVIA, 2020). Ineffective drug management has a significant financial impact on pharmacies. An effective drug management process ensures the availability of drugs in the right type and amount in accordance with needs, thereby avoiding drug shortages and excesses. The pharmacy and poisons ACT CAP 244 laws of Kenya require the Pharmacy and Poisons Board (PPB) to regulate medical products and health technologies. All parties involved in the distribution of medical products and health technologies are responsible for ensuring that the quality of the products and the integrity of the distribution chain are maintained throughout the distribution process from the manufacturing to dispensing of the product to the end user (Pharmacy and Poisons Board, 2019). According to article 43 (1) (a) of Kenya’s 2010 constitution, every person has the right to the best possible healthcare. These standards of health are only attainable if the quality of medical products and health technologies in the market are of the right quality and are dispensed correctly. The aim of this paper was to shed light on the quantification and forecasting of medicine for stocking purposes and how predictive analytics with machine learning algorithms can be applied to improve this process. It is therefore worthwhile to define these concepts prior to the main discussion. Firstly, stocks refer to the quantity of finished products that are ready to be sold to an end user. Forecasting on the other hand is defined as the process of using historical data as input to make informed estimates for prediction. This process can be carried out in a clinic, hospital, health center or community pharmacy setting (Management Sciences of Health, 2012). While medicine administration is aimed at 2 improving patient care, lack of medication to administer to the patient can result in severe harm to a patient. All resources preceding it may go to waste if the patient does not end up receiving the right medication at the right time. This would increase the overall operating costs through wastage and shelving. According to Mukherjee (2017), Artificial Intelligence (AI) and Machine Learning (ML) have been critical in the pharmaceutical industry and consumer healthcare business. For example, machine learning has been used to recognize diseases where it has been noticed that different types of medicines and drugs have been developed to treat cancer. Additionally, for a great analytical treatment, customized treatment is considered a more effective method in improving patient health outcomes because it is based on the health patterns. Finally, a plethora of studies speculate a widespread increase in the use of microdevices and biosensors, creating an opportunity for enhanced diagnosis, monitoring, and treatment of numerous diseases. The use of microdevices and biosensors promises to harness a huge amount of patient data, drug trends and seasonality of diseases. Unfortunately, knowledge and tools lack information and timely insights from collected big data. Thus, predictive analytics comes into play in analyzing the data to form medical insights resulting in stock efficiency in drug management. 1.2 Problem Statement The provision of the right medication at the right time is a cornerstone of patient well- being. With the rise of prevalent diseases in different areas at different seasons, most pharmacies lack the ability to stock their shelves prior to the patients’ needs due to the unknown demand for the drugs. Additionally, stocking without insight on what drugs are needed leads to understocking of the necessary drugs or overstocking of unpopular drugs (Okoneko, Khrustsky, Veber, Egorova, & Antropova, 2019). Consequently, understocking leads to lack of the required drugs while overstocking leads to excess inventory which may lead to expiry of the excess drugs. This lowers the customer(patient) satisfaction rate, unwanted inventory costs caused by overstocking and may lead to patients being harmed or worse, death. Therefore, using predictive analytics to gain insight on the seasonality 3 patterns and drug sales to predict what kinds of drugs to stock at a given time would be essential for both pharmacies and patients. 1.3 Objectives 1.3.1 General Objective The purpose of this study was to apply predictive analytics to predict how to fine tune inventory management by monitoring drug sales and seasonal variation and advising on how to better stock a certain pharmaceutical product in order to control diseases. 1.3.2 Specific Objectives (i) To investigate the challenges faced in stocking pharmaceutical inventory. (ii) To review the previously used methods for stocking pharmaceutical inventory. (iii) To identify how pharmacies currently stock their inventory to best suit the needs of their customers. (iv) To design a model to predict future pharmaceutical inventory using predictive analytics. (v) To validate the designed model. 1.4 Research Questions (i) What challenges do pharmacies currently face while stocking their inventory? (ii) What methods for stocking pharmaceutical inventory have been used previously? (iii) How do pharmacies currently stock their inventory to best suit the needs of their customers? (iv) How can the proposed model predict future pharmaceutical inventory using predictive analytics be designed? (v) How can the designed model be validated? 1.5 Scope and Limitation This study aimed at finding a solution using predictive analytics, that could impact on enhancing inventory management in pharmacies. Data collected for this research was collected from local pharmacies and online repositories and was limited to the number of 4 sales per day and weather data for each day. As discussed, the research would assist in inventory management, in the context of the pharmaceutical sector. Reasons for collecting data online included cutting back research costs and time. However, focusing on this scope had its setbacks. The research seemed to only apply to areas characterized by high number of traffic and unprecedented demand. Therefore, the application of the findings may be less applicable to pharmacies located in less busy regions. 1.6 Justification This study examined the inefficiency in stocking of medicines in pharmacies, which was problematic. Using sales and weather data, it aimed at developing a model that would be used by pharmacists in medical institutions to categorize medicines mechanically and aid pharmacists in stocking and administering medicines appropriately. The system’s target was to benefit pharmacies and pharmacists by predicting the quantity of drugs to stock so that pharmacies in future would neither be understocked nor overstocked, therefore maximizing profits. It also aimed to give a solution to aid in ensuring that patients receive the right medication at the right time, thus increasing customer satisfaction rate and of course less health complications or even deaths due to lack of drugs or administering expired drugs. 5 Chapter 2: Literature Review 2.1 Introduction Inventory control has become an important component in supply chain management. One of the critical success factors in inventory management is accurate prediction. Many researchers have used different approaches to generate forecast of product demand for inventory control purpose. According to (Kerkkanen, Korpela, & Huiskonen, 2009), demand forecasting is commonly applied in companies that operate in consumer markets. Projections that are based on historical demand are typically very accurate when demand patterns are comparatively smooth and continuous. Success stories about demand forecasting typically report lower inventory levels and improved customer service. In medicine, many problems have benefited from predictive analytics approaches. Large enough medical datasets have been available for a long time, but despite thousands of studies applying machine learning algorithms on medical data being done, very few have made a meaningful contribution to pharmaceutical care. This chapter provided a literature review of what had been previously used in this research’s context, which is predictive analysis for inventory management to improve pharmaceutical care. Section 2.2 discussed the predictive analysis, computational learning as a fundamental principle of machine learning with more emphasis on regression, machine learning and predictive analytics. Section 2.3 is the empirical literature which discussed the history of inventory management in Kenya and climatology in Kenya. Section 2.4 discussed the previously used methods and algorithms for inventory management while section 2.5 demonstrated different traditional methods in which medical inventory management has been conducted previously. Finally, section 2.6 gave a summary of the proposed model, how it would be tested, its implementation and section 2.7 presented the conceptual framework. 2.2 Theoretical Framework Drug scarcity is a complex issue that affects every aspect of the health-care system. According to (Baumer, et al., 2015), drug shortages have a wide-ranging impact, with more than half of health-care practitioners believing that shortages have influenced practice and resulted in subpar patient care. Replacement of drugs in the absence of the required drugs may have a negative impact thus resulting in medication errors (WHO, 2015). Excess drug 6 stock, on the other hand, causes issues for hospitals in the sense that money is wasted, and the excess drugs expire and become unsuitable for human consumption. Procurement of drugs that is not based on patient needs results in drug stockpiles accumulating (Nursalam, Saafi, & Munir, 2020). If pharmacy management does not consider larger storage space, the drugs become damaged and expire due to inactivity. Regular orders in small quantities can be placed to reduce storage costs. It should be noted, however, that out of stock occurs because purchase costs outside of planning can be high due to the high value of drugs (Management science of health, 2012). Even though much research has been conducted on the planning of stocks in pharmacies, many pharmacies and medical institutions still use traditional methods, which results in improper planning. Improper planning results in budget waste, stagnation, and stockouts. There are traditional inventory systems that have been used and other previous inventory management systems. In this section, they are discussed as part of the empirical literature. 2.2.1 History of Predictive analytics This research was based on computational learning theory, which seeks to comprehend the fundamental principles of learning as a computational process (Sally, 2010). This field seeks to understand, at a precise mathematical level, what capabilities and information are fundamentally required to successfully learn various types of tasks, as well as the basic algorithmic principles involved in training computers to learn from data and improve performance through feedback. This theory aids in the development of better automated learning methods as well as understanding fundamental issues in the learning process itself. One exciting aspect of computational learning theory is the development of algorithms that quickly learn even in the presence of a large amount of distracting information. There are studies showing intelligent ways of predicting inventory requirements that have been proposed, bearing in mind that there are different concepts to put in consideration when ordering and managing inventory. Branches of computational theory i.e., Predictive analytics, machine learning and regression were discussed then selected for this research. Predictive analytics is the analysis of current and historical facts to determine the likelihood of future events using data and statistical techniques from data mining, predictive modelling, regression, and machine learning (Nysce, 2007; IBM, 2021). It is divided into 7 three disciplines. First, the predictive models which evaluate the likelihood that a specific unit in a different sample has a similar performance. Second, the descriptive model that establishes the relationships in the data required for classification and third, the decision model that relates the data, the decision, and the result of the forecast. It is forward-looking; hence it uses past events to anticipate the future. Although predictive analytics has been around for decades and has been used for various applications, it has recently begun to gain popularity since most businesses are employing it to get insights about the future, mainly due to advancement of technologies and dependency of data (Korn, 2011). Many industries including banking and finance, energy industry and even the government and public sector are using predictive analytics to gain insights for future use (Ukhalkar, 2018). This research focuses more on regression and machine learning techniques which have been described below. Regression analysis is a method that estimates relationships among variables. It focuses on developing mathematical equations as a model for representing interactions between various variables. It is intended for continuous data with a normal distribution and is mostly used to determine specific factors such as price (Predictive Analytics: What it is and why it matters, 2021). Regression examines how the value of the dependent variable changes when the values of the independent variables change in a modelled relation (Armstrong, 2012). Regression analysis is mainly used for prediction and forecasting. In some situations, it is also used to infer causal relationships between dependent and independent variables as mentioned earlier. There is a wide variety of models of regression models that can be applied when carrying out predictive analytics. They include linear regression models, logistic regression model, duration analysis and Classification and Regression Trees (CART) and discrete choice models. Machine learning is a term that is used to refer to automated detection of meaningful patterns in data (Shai & Ben-David, 2014). It is a kind of AI that allows a system to learn from data and improve through experience without the need for explicit programming (Hurwitz & Kirsch, 2018). It employs several algorithms that learn from data in order to improve, describe, and predict outcomes. Machine learning has become popular in performing predictive analytics thanks to its techniques that have outstanding performance 8 in handling large datasets and noisy data (Linda, Joseph, & Ed, 2021). This involves training algorithms and neural networks to analyze data and outputting findings. There are two types of learning. Supervised learning and unsupervised learning. In this context, supervised learning creates predictive models using data that contains the results being predicted while unsupervised learning does not use previously known labels to train its models (Julianna Delua, 2021). It employs descriptive statistics to investigate the natural patterns and relationships that emerge from the data. Machine learning employs a variety of approaches, including Decision Tree Learning, Support Vector Machines (SVMs), Artificial Neural Networks (ANN), and Bayesian Networks, among others (Educba, 2021). This research attempted to use predictive analytics to come up with a prediction of the required inventory. Time series approach was used for forecasting the future behavior of variables using time as an input parameter with the Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors (SARIMAX) model that is well suited for prediction of the value of an independent variable according to seasons was used. 2.2.2 ARIMA The ARIMA model is characterized by three terms, i.e., p, d, q, where, (i) p is the order of the AR term- It refers to the number of lags of Y to be used as predictors (ii) q is the order of the MA term- the number of lagged forecast errors that should go into the ARIMA Model (iii) d is the number of differencing required to make the time series stationary- the minimum number of differencing needed to make the series stationary Yt =α+β1Yt-1 + β2Yt-2 +.. βpYt-p1€t + Ф1€t-1 + Ф2€t-2 +.. Фq€t-q 9 2.2.3 SARIMAX ARIMA however, doesn’t use seasonal differencing. To employ seasonality, we use SARIMAX which uses seasonal differencing with exogenous variables, in other words, it uses external data in this case, weather data like amount of rainfall and humidity to forecast. Therefore, if we include external data, the model will respond much quicker to its affect than if we rely on the influence of lagging terms. SARIMAX formula is given by: Θ(L)pθ(Ls)PΔdΔDsyt=Φ(L)qϕ(Ls)QΔdΔDsϵt+∑i=1nβixit 2.3 Empirical Framework 2.3.1 Pharmacy Inventory Management in Kenya Pharmacies are key players in providing access to medicines and other pharmaceutical products in Kenya (Toroitich , Dunford, Armitage, & Tanna, 2022). Their influence is reflected in the growing interest to include them in the provision of essential pharmaceutical products and services. Though the scope of pharmacies varies, it would usually include registered and unregistered pharmacies which are governed by regulations similar to those of other health service providers. Some of these regulations include personnel qualification, structural design features of the premises, provisions for enough and good medicine, good medicine storage and good dispensing practices. However, studies have shown. This is due to challenges they face while stocking inventory. Some of the challenges faced in stocking pharmacies to be poor regulatory compliers in Kenya (Wafula, Abuya, Amin, & Goodman, 2014). According to (Deidre, Karrar, & Jayasree, 2018), one of the reasons being the lack of mechanism to maintain the availability of pharmaceutical stock pharmaceutical inventory include but are not limited to: i. The need to maintain the availability of products which are essential to human health or life itself without overstocking. ii. The need to maximize sales. iii. Strict regulations of the value chain, from research and development to marketing. iv. Perishable medical materials. v. Fragile production processes. vi. Sometimes relying on very small number of supply options. 10 With these challenges, optimizing inventory becomes more difficult for pharmacies and pharmaceutical companies. Calculating how much of a product to stock and how often therefore becomes a very important step in optimizing inventory. Because medications and biologics are expensive to make and cannot be held for long periods of time, it would seem logical that pharmacies would benefit from a lean, just-in-time (JIT) inventory strategy. However, COVID-19, has illustrated the problems that supply chain uncertainty poses to that model. At the same time, the perishable nature of pharmaceuticals, as well as their loss of potency as effective dates approach, jeopardize resiliency methods that rely on keeping safety stocks on hand. Finding the correct balance for optimizing inventory based on the product portfolio as well as historical demand and shipment patterns will be a constant task. While stocking, most pharmacies are affected by the bullwhip effect where small fluctuations in demand at the retail level can cause progressively larger fluctuations in demand due to the high shipment values involved, perishability of the product, and often the urgency in medicines where they’re needed quickly. Some of the causes are chronic, like infrastructure congestion or labor shortages, viruses, and change in weather patterns. 2.3.2 Climate in Kenya Kenya has a variety of different types of climates, such as a tropical rainforest climate in the southwest and a tropical monsoon climate in central Kenya and the southeast. Most places in Kenya have a rainy season and a dry season which depends on the location (World Bank Group, 2022). Kenya’s temperatures vary, with the highlands experiencing considerably cooler temperatures than the coastal and lowland regions. Kenya’s average annual precipitation is typically 680 mm. In general, the warmest period is from February to March, while the coolest is from July to August. There is the long rains period from March to May, and the short rains period from October to December. In Nairobi, Highs hover around 23/24 °C (74/75 °F) in the coolest months (June, July and August) and around 27/28 °C (80/82 °F) in the warmest months (January, February and March), while lows drop to around 12/13 °C (54/55 °F) from June to September and go up to 14/15 °C (57/59 °F) from January to April. In July and August, the sky is often cloudy, even though there is little rain, and sometimes at night it can be even cold, in fact, the temperature can drop 11 to around 5 °C (41 °F). However, Kenya’s climate is changing. Rainfall patterns have changed, with the long rainy season becoming shorter and dryer and the short rainy season longer and wetter ( Government of the Republic of Kenya, 2018). Different climates may relate to different types of illnesses. For example, during the cold season, Flu is most common. According to (Emukule, et al., 2016), there are multiple flu epidemics occurring each year and lasting a median duration of 2 months, with the first epidemic occurring between the months of February and March and the second one between July and November. Humidity is independently and negatively associated with flu. Combinations of low temperature and low humidity are significantly associated with increased flu. This research combines climatical patterns, drug sales and dates to predict the quantity of drugs to be stocked in pharmacies to prevent or cure these illnesses. 2.4 Previously used Models Existing researches in the field of pharmacology identify several most effective and accurate methods: a linear regression (LR), random forest (RF) method, construction of a time series prediction using a neural network (NN), Auto-regressive integrated moving average method (ARIMA), long short-term memory model (LSTM), the use of support vector regression (SVR) and the LevenbergMarquardt algorithm (LMA)Invalid source specified.. A few of them are discussed in this section. 2.4.1 Auto-Regressive Integrated Moving Average (ARIMA) and Long Short-Term Memory Models (LSTM) The Auto-Regressive Integrated Moving Average (ARIMA) model is a model for time series prediction which can capture a suite of different standard temporal structures in time- series data (Adam Hayes, 2021). It uses correlation between current observations and past observations. For example, (Matsumoto & Ikeda, 2015) conducted an examination of demand forecasting by time series for auto parts manufacturing using time series analysis of actual shipment data from an independent remanufacturer. (Fattah, Ezzine, Aman, Moussami, & Lachhab, 2018) also used historical demand information to develop several ARIMA models using Box-Jenkins time series procedure, to forecast future demand in food manufacturing. Furthermore, ARIMA and LSTM techniques establish rolling forecast 12 models, which significantly improve accuracy and efficiency of demand and inventory forecasting. The forecast models, developed through historical data, are evaluated, and verified by the root mean squares and average absolute error percentages in the actual case application (Wang, Chien, & J.C.Trappey, 2021). ARIMA and LSTM models predict the top five products and validate the actual data and prediction results with Root mean squared error (RMSE) to evaluate the prediction model’s performance. Consequently, LSTM has the smallest forecast error in the short-term forecast. However, its disadvantage is that the time-series data must be stable after differentiation. Another disadvantage is that only linear relationships can be captured in essence and not nonlinear relationships (Lu, Chunxue, & Neal, 2022). It is also only suitable for short term predictions. 2.4.2 Support Vector Machine (SVM) and Artificial Neural Networks (ANN) Artificial Neural Networks have previously been used, but they require large training datasets (Gutierrez, Solis, & Mukhopadhyay, 2008). The concept of estimating time series structural components across multiple frequencies and optimally extrapolating and combining them, with empirical results promising for long-term forecasting, was proposed (Kourentzes, Petropoulos, & Trapero, 2014). Artificial neural networks (ANN) have shown better performance in classification and regression issues in the pharmaceutical sector during the past few decades, and they have attracted a lot of interest in time series forecasting techniques. However, ANN had certain drawbacks, including a long development time and a large amount of data that was needed. Due to the frequent updating of medications and the dearth of data on historical sales of individual preparations, it is required to develop an effective model for predicting pharmaceutical sales using one of the machine learning techniques (Keny, Nair, Nandi, & Khachane, 2021). SVMs and neural networks were later proposed as forecasting method combinations with improved forecasting performance (Petropoulos, Nikolopoulos, Spithourakis, & Assimakopoulos, 2013). In the automobile industry, SVM has previously been used for demand forecasting of automobile parts (Agarwal & Jayant, 2019). Both neural networks (NNs) and support vector machines (SVMs) are common machine learning approaches with applications in prediction based on times series data. Neural Networks have been used successfully for 13 pattern classification and recognition, weather forecasting, data mining and knowledge discovery, and in time series prediction tasks such as financial market prediction, stock prices and foreign exchange forecasting (Lucas & James, 2010). However, they both have disadvantages. For SVMs, minor fluctuations in training data causes decrease in predictive ability while for ANNs, the predictions become worse as the noise variation increases. 2.5 Current Methods of Inventory Management 2.5.1 Perpetual Inventory Systems Perpetual inventory systems are systems that continuously record the quantity of a specific medication as prescriptions are filled through a point-of-sale system (Gupta, 2020). After each prescription is filled and dispensed to the patient, the medication used for the prescription is removed from the inventory to ensure that the quantity on hand in the computer is always current. Deliveries and returns are also recorded as they happen automatically. Perpetual systems are designed to automatically update available quantities as prescriptions are filled, as well as to generate automated and manual reports that allow pharmacy staff to analyze and monitor inventory (Katie Ingersoll, 2017). They can frequently track turnover rates, predict future drug needs, alert pharmacy staff when potential errors are detected, and even automatically order more medication based on predefined reorder points. When medication levels are low, many pharmacies use periodic automatic replacement (PAR) levels in perpetual inventory systems to automatically order more medication. When a medication stock level in the perpetual inventory system is reduced to the pre-set minimum level, the computer system automatically orders enough medication to reach the maximum level, resulting in a simplified ordering system and reduced workload for pharmacy technicians. 2.5.2 Automatic Dispensing Systems Using Automated pharmacy dispensing systems is also a method used for dispensing medicines. Automated pharmacy dispensing systems are management systems that allow 14 for storing and dispensing medicines near the point of use (Nicole, Clifford, Michele, & Kieran, 2014). They offer computer-controlled medication storage, dispensing, and tracking. The Pyxis and Omnicell machines, for example, are commonly used in hospitals to maintain stock of prescription medications to assist patients with their medication needs. To improve the efficiency of medication distribution, automated pharmacy dispensing systems have been recommended. They enable a more streamlined medication dispensing system while also increasing the pharmacy's ability to track system users and the items they add or remove, as well as provide reports on which drugs need to be refilled in the cabinets. They provide secure medication storage on patient care units as well as electronic tracking of controlled drugs (Ingersoll, 2015). However, their ability to reduce medication errors is dependent on a variety of factors, including how users design and implement the systems. To enhance and maintain pharmaceutical care evolution, automated dispensing should be improved (Berdot, et al., 2019). Other emerging technologies that will help hospital pharmacies’ efficacy and possibly decrease the likelihood of adverse drug effects are recommended (Tsao, Lo, Babich, Bansback, & Shah, 2014). 2.5.3 RFID and Barcode Technology Radio Frequency Identification (RFID) is a technology that allows objects to be tracked by connecting them to the internet. A bar code, on the other hand, is a method of representing data in a visual, machine-readable format. Many medications have barcodes on their packaging to facilitate product identification in a computer system. The barcode includes the product's National Drug Code (NDC) number, which tells the computer the product's name and package size. While barcode applications require line-of-sight identification, RFID tags are robust and do not require it (Peak Technologies, 2019). This technology contributes to the elimination of the need for human intervention. The technology uses programmable tags that contain information such as destination, weight, and a time stamp. RFID allows for warehouse space optimization and efficient goods tracking, which reduces costs and improves customer service. RFID tags can also communicate in real time and provide accurate information (Laquanda, Kamal, & Peebles, 2017). The use of RFID technology in the management of hospital supplies has the potential to significantly reduce 15 hospital inventory levels, as inventory is always a cost to any business. The main advantage of RFID technology is the ability to track goods in real time throughout the supply chain. Real-time delivery time tracking enables Just-in-Time (JIT) manufacturing and retailing. JIT assists hospital purchasing committees in making strategic decisions (Joseph, Joshin, & Kumar, 2013). 2.5.4 Other Related Methods Different methods have been proposed regarding machine learning. Some that have been earlier used to predict market stocks. In table 2.1 below, there is a comparison of the prediction methods. Table 2.1: A comparison of prediction methods 2.6 Medical Inventory Prediction Model Each year at least millions of people get unwell due to lack of medication, as estimated by the National Academy of Science (Kamalanabanand & Premkumar, 2018). Concerning this Method Advantage Disadvantage Parameters used Support vector machine (SVM) for stock prediction Difficult to lose accuracy even when applied to a sample from outside the training sample Minor fluctuations in training data causes decrease in predictive ability Consumer investment, net revenue, net income, unemployment rate Artificial neural network Lower prediction error The more the increased noise variation, the more the worse the prediction. Stock closing price Hidden Markov Model Used for optimization Evaluation, decoding and learning. Stock market trend ARIMA Efficient and robust Only suitable for short term predictions. Stock price 16 problem, a model that finds the ideal decision variables that affect the target variable while parsing the relevant features has been proposed. In this study, both ARIMA and SARIMAX were computed and compared according to accuracy. The proposed algorithm adopted SARIMAX model and addressed the problem by showing how an algorithm learned from data can optimize large-scale data and come up with a prediction. It demonstrated the prevalence of data-driven AI, which can be used autonomously in purely data-driven systems or in collaboration with domain knowledge in hybrid systems. Structured data Unstructured data Figure 2.1: Proposed model 2.7 Conceptual framework The study’s aim was to apply predictive analytics to predict how to fine tune inventory management by monitoring drug sales and seasonality and advising on how to better stock a certain pharmaceutical product in order to control prevalent diseases. Most patients go to the pharmacy after attending the outpatient clinic while some are just walk-in customers. This model was expected to give the highest accuracy in comparison to other models. It would also help speed up patient waiting time for drug dispensing since the pharmacist would be expected to already have what they are requesting for. Changing the pharmacy (Dates, sales drugs) Medical records/r eports Seasonal diseases, data trends Classificat -ion based on sales & seasonalit -y Time Series (Based on Time) SARIMAX User Interface 17 workflow would increase patient satisfaction and improve the overall quality of care to the patients. In this model, the dependent variable is what the research is aiming to achieve. In this case, it is to enhance pharmaceutical inventory management. The independent variable on the other hand, is the predictive analytics which captures the machine learning and regression techniques, which is a correlation to the required output. These variables form the foundation upon which predictive analysis can begin. A series of tests would be done on the data set and the resulting performance of the algorithm will be computed. An analysis of these results will then be analyzed to see how the algorithm performs when applied to a data set. Figure 2.2: Conceptual framework of the study shows the conceptual framework of the model. Figure 2.2: Conceptual framework of the study 18 Chapter 3: Research Methodology 3.1 Introduction The research methodology that was used is outlined. The research design chosen and the population selected were also presented. This chapter covers the research design in section 3.2 and in section 3.3, it covers the system analysis and design. Section 3.4 describes the target population while section 3.5 gives a brief description of the model development process. Sections 3.6 and 3.7 present a discussion of the research quality and ethical approvals respectively. 3.2 Research Design Research design is generally a framework on how various aspects of the research will be organized and conducted to combine relevance to the research objectives. This study combines both correlational research and applied design. This is because it gave the researcher the opportunity to describe the relationship between two measure variables (Cresewell, 2011), that is, if the weather related to the quantity of medicines purchased by patients at a certain period within the year. Correlational research was also selected because it is used to test the strength of association between variables and because other variables may play a role in the relationship. Conclusions can also be generalized to other populations or settings confidently with the use of this research design. This study is intended to help pharmacies receive stock quantity recommendations to best stock their pharmacies for future seasons. 3.3 System Analysis and Design The final product of this study was a model that would be integrated in the pharmacy’s current inventory system, that demonstrated key system functionality. For system analysis and design, Structured System Analysis and Design (SSAD) would be used since its focus was more on processes and procedures of the system. The design diagrams that were drawn included Data Flow Diagrams (DFDs), context diagrams and entity relationship diagrams. The approach selected to develop the application was the iterative methodology. The iterative methodology is used to design Software Development Life Cycle (SDLC) models that allow for creation of iterations that have the design, development, testing and review 19 phases. This means that each iteration would be reviewed in order to identify further requirements. This continued until the final product was achieved. Figure 3. shows the iterative methodology. Figure 3.1: Iterative Methodology (Adapted From (Trivedi & Ashwani)) 3.4 Target Population This research targeted a study population constituting two pharmacies in Nairobi, Ruaka town. Data was also acquired from public repositories, i.e., Visualcrossing. Access to the data was requested while adhering to local requirements. 20 3.5 System Development To develop the model, analysis of data collected was done. This process involved determining the factors influencing overstocking and understocking of pharmaceutical inventory in pharmacies. Data retrieved was cleaned and transformed to weekly time series consisting of cumulative sales among different pharmaceutical products. Data was analysed using python and graphically visualized to come up with inferences. The system was developed using the below steps. i) Data collection ii) Data pre-processing iii) Model training and fitting iv) Model validating v) Deployment 3.5.1 Data Collection This paper utilized data from local pharmacies that contained drug sales data and an online repository that contained weather data. Drug sales data was collected from two pharmacies in Ruaka town. Both have been in operation for the last 10 years. A brief discussion with the owners of the pharmacies who were both qualified and experienced proved that there was a defined pattern with the sales data at different times of the year. For instance, in July in all the years, there was the cold season with a high number of flu infections in different years. Data that was extracted from the drug dataset included the date of drug purchase, drug name and total quantity sold per drug. The dataset contained 994 records of drugs that were purchased for different illnesses for a period of 3 years. Data extracted from the weather dataset included the date, average temperature, average precipitation(rainfall) and humidity. The online repository mentioned above was Visual Crossing, https://www.visualcrossing.com/ where a corpus of over 1558 records was retrieved from a database of historical weather data from the year 2011 to the year 2013 for Nairobi city. This data was used to conduct data analysis for the purpose of this study. The accuracy of forecasts was determined by considering how well the model performed on new data. https://www.visualcrossing.com/ 21 3.5.2 Data Analysis To get an in-depth understanding of the effectiveness of the already available models, analysis of data was done. This process involved determining the factors influencing effectiveness of the previous models and establishing the challenges in their implementation. It also included determining the factors influencing overstocking and understocking of pharmaceutical inventory in pharmacies and hospitals. Data was cleaned and transformed to weekly time series consisting of cumulative sales among different drugs. Data was analyzed using python and graphically visualized to come up with inferences. Data used in this research was quantitative data since it could be quantified or measured. Therefore, it was analyzed and presented using graphs to understand the findings clearly. The research made use of inferential statistics, i.e., it analyzed relationships between variables. It accounted for sampling errors and included assumptions made regarding population distribution parameters. Correlation tests investigated the relationship between the variables and estimated the magnitude of the relationship. 3.5.3 Data Pre-processing This involved systematically searching and arranging the data collected in a clearly understood way. Data used in this research was quantitative data since it could be quantified or measured. Therefore, it was analyzed and presented using graphs to understand the findings clearly. The research made use of inferential statistics, i.e., it applied autocorrelation to analyze relationships between variables and make comparisons. It accounted for sampling errors and included assumptions made regarding population distribution parameters. Correlation tests investigated the relationship between the variables and estimated the magnitude of the relationship. 3.5.4 Model Training The model was trained and fitted with training data using the parameters for the regular ARIMA model (p,d,q), as well as the seasonal ARIMA model (p,d,q,s), i.e., the order and the seasonal order. 3.5.5 Model Validation Validation was done using Root Mean Squared Error (RMSE) which is defined as the residual squared difference between the predicted values and the actual. It was achieved 22 using structured walk-through where predicted outcomes were compared with observed outcomes The formula of RMSE is given by: RMSE = √[ Σ(Pi – Oi)2 / n ] 3.5.6 Deployment The model was deployed using a web application developed using Python programming language while importing Flask as a library. The front end was developed using HTML 5 and Javascript. 3.6 Research Quality The system was subjected to functional and non-functional tests, compatibility, and integration tests to determine whether everything worked within the stipulated requirements. Based on the feedback received, the model was then improved accordingly until the required product was realized. The research quality was measured in different dimensions that include integrity, inclusiveness, and relevance. It was considered as quality research once these dimensions were met. 3.7 Ethical Considerations The data used was collected from pharmacies and open data repositories. This data is highly private and was treated with a high degree of confidentiality and solely used for the intended purposes. Additionally, all literature obtained from other sources such as journals, periodicals, books, etc., was referenced and cited aptly in this paper. Application for an ethical approval was made to the institution, Strathmore university, for institutional ethical approval. 23 Chapter 4: System Analysis, Designs and Architecture 4.1 Introduction In this study, chapter four reports on the system analysis, design, and architecture. Considering the requirements collected in chapter three through the available datasets. This chapter clarifies the functionality of the developed system and the iteration between different components. Sections 4.2 gives more insight on what type of datasets were used and the functional and non-functional requirements of the system. Section 4.3 describes the system architecture. Sections 4.4 presents the system design with diagrams. Lastly, section 4.5 shows the system wireframes and how they will appear. 4.2 Data analysis This involved systematically searching and arranging the data collected in a clearly understood way. Data used in this research was quantitative data since it could be quantified or measured. Therefore, it was analyzed and presented using line plots to understand the findings clearly. From the data collected, an analysis was done, and it was discovered that a high humidity and precipitation, led to more sales of flu drugs. Hence more people got flu during that period of the year, between the fifth month and the nineth month of the year. As depicted in the diagram, the total amount of drugs bought per week increased when there was high humidity and high precipitation. Pharmacists were then required to stock up during that period because of the high demand. 24 Figure 4.1: Data analysis 4.3 Requirement analysis Requirement analysis involved the review of the functional, non-functional and operational requirements in order to ensure the model took into account all the stakeholders’ needs as per the initial objectives of this study. Initial requirements involved understanding the basics of the product requirements, especially the application’s user interface in question. Considering the previous systems being used, the study intended to make the Graphical User Interface (GUI) more user- friendly to reach our goal. The system was expected to meet the functional requirements, meaning its actual performance from the end user’s point of view would be as required. It was also required to meet the operational requirements of the organization, i.e., users’ authentication and sign in, selecting the type of drug being forecasted for and loading of the csv data to be used for forecasting; hence it would speak to the needs of the organization, i.e., the need for pharmacies to be well stocked in preparedness for its customers. Technical requirements defined the technical needs, that is, the system would be installed in already existing equipment or a few new ones, thus reducing costs. Various 25 types of transition requirements, such as data conversion and migration, user access and security rights, user acceptance training, user preparation and transition, pilot testing, and infrastructure transition, were considered to meet the transitional requirements. 4.3.1 Functional Requirements These are requirements that concern results or behaviours that are provided by a function of the system. They specify a function that a system or system component that must be made available to the users of the system. It must be independent of design and implementation aspects. The functional requirements are listed below. i. The system should allow for uploading of raw dataset as a csv file for training and testing. ii. The system should allow the admin to enter medicine factors on behalf of the pharmacist. iii. The system should provide drug inventory recommendations to be stocked based on previous sales and seasonal characteristics. iv. The system should allow for documentation of previous purchase records for future forecasting. 4.3.2 Non-Functional Requirements Non-functional requirements define the desired qualities of the system to be developed and often influence the system architecture more than functional requirements do. They describe the non-behavioural aspects of a system, capturing the constraints under which the system must operate. The non-functional requirements are as follows: i. Availability- The system should be dependable, thus is expected to be functional around the clock. i. Transparency- The system shows how specific results were obtained to reduce issues with trust and transparency. ii. Security and privacy- The system should address privacy concerns when using the data acquired. It will also be safeguarded against deliberate and intrusive faults from both internal and external sources. iii. User friendly- The system should have a user-friendly interface, hence making it easier for learning and interacting with it. 26 iv. Integrity- The system data should be maintained accurately and authentically, without corruption. v. Confidentiality- Data used should be private and confidential. The system will protect this sensitive data by allowing only authorised access to the data. vi. Efficiency- The system should be able to handle the capacity and throughput within the specified amount of time. 4.4 System Architecture The system architecture for the predictive analytics model for pharmaceutical inventory management is shown below in Figure 4.. It explains the general interaction of various components to achieve system functionality. Raw data was pre-processed to create training and testing datasets. The predictive analytics model was then converted to a format that was to be embedded into the already existing system. The interface displays predictions as per users’ requests. Figure 4.2: System architecture 4.5 System Design The system design described how the system would be designed to meet inventory needs. The logical design pertained to an abstract representation of the data flow, the input, and the output. Physical design on the other hand, involved design of interfaces and processes to generate suitable specifications for the end product. To come up with the ultimate 27 system, the research was keen to follow the major tasks performed during the system design process, which included initializing the design definition to plan for and identify technologies that will implement the system’s elements and their physical interfaces. Secondly, the study established design characteristics relating to the architectural characteristics. In addition to that, it assessed alternatives for obtaining system elements and manage the design. To understand the model, various pictorial and graphical representations were used in the design stage. This section of this chapter therefore typically showcases the context diagram, data flow diagram and the entity relationship diagram. 4.5.1 Context Diagram Context diagrams illustrate boundaries of the system, its environment and the entities that interact with it, that is the inputs and outputs from the system to its different entities. In the proposed model, the main entities that interact with the system are the user who is the pharmacist and administrator. The administrator maintains the required inventory prediction factors which are the previous sales data and the seasonal weather patterns. The user (pharmacist) enters the csv file that contains previous sales data and time of year. The model then calculates and makes a prediction on what quantities of medicines are required 28 to be stocked for the next six months. The administrator frequently updates the required factors for inventory prediction. Figure 4. illustrates the context diagram. Figure 4.3: Context diagram 4.5.2 Data Flow Diagram Processes and entities of a system demonstrating how data flows from the entities through the processes are illustrated by the data flow diagram. It captures the storage of data from the processes. The data flow diagram allows users to have a better understanding of the system. The level 1 DFD gives a more detailed view by illustrating the various processes contained in the module, data stores and entities. Arrows depict the flow of data among various components of the DFD. Process 1.0 depicts an administrator adding medicine information together with factors affecting inventory into the system which is saved in the database. The user who is the pharmacist uploads a csv file which is also stored in the csv database for future use 2.0. The previous inventories are maintained as shown in process 3.0. In process 4.0, the administrator frequently updates the required factors that are required in determining the predictions of inventory. Feedback is sent to the user with information concerning the medicine quantities to be stocked from the pharmaceutical inventory prediction model. Figure 4. below presents the level 1 data flow diagram. 29 Figure 4.4: Level 1 Data Flow Diagram 30 4.5.3 Entity relationship diagram Figure 4.5: Entity relationship diagram 4.6 Wireframes 4.6.1 Log in page This is the page where the user will sign into the system, indicating their name, employee number and entering their password. This will ensure that only authorised employees can log in to the system, meaning, only pharmacists with experience will be able to log in. If a user is not able to log in, they will be requested to sign up at the bottom. Figure 4. below shows the log in page. 31 Figure 4.6: Log in page 32 4.6.2 Admin Dashboard This is where sales data is represented graphically to show the performance of different categories at specific times and seasons, and people from what locations are buying what drugs. Figure 4. below show the admin dashboard. Figure 4.7: Admin Dashboard 4.6.3 Medication Page Here, the illness or condition is entered, and the drug to be stocked for the illness is selected. Stock prediction is then displayed when the pharmacist clicks “Predict”. Figure 4. displays the final stage of dispensing the medication. 33 Figure 4.8: Predicted medication page 34 Chapter 5: System Implementation and Testing 5.1 Introduction This chapter focused on the implementation and testing of the model. The model development section, 5.2, discussed the hardware and software requirements for model development. Section 5.3 reviewed the systematic approach of how data was analysed and visualized. In section 5.4 to 5.6, the models were fitted and trained after which they were validated in section 5.7. 5.2 Development requirements The model was developed using Jupyter Notebook which makes it easier to work on projects by allowing for running of code sections instead of running a whole python code. Python extensions were also installed. Python packages such as Plotly were used primarily in this study. A pre-processor was also used, which was adapted to medication data from other sources. To store data, MySQL was used because of its high performance and scalability. 5.2.2 Hardware requirements Table 5.1: Hardware requirements Hardware Specifications Central Processing Unit (CPU) Intel(R) Core (TM) i7-6820HQ CPU @2.70GHz (8 CPUs), ~2.7GHz Memory 8GB RAM Disk 256 Solid State Drive Integrated Graphics chipset Intel(R) HD Graphics 530 35 5.2.3 Software requirements Python 3.10.8 was used in programming since it is a general-purpose language and is primarily a language used for building machine learning models. Python is readable and has a good structure. It is also the most suitable for deep learning. Python Pandas was used for data processing since it is a tool for distributed computing and is faster and more convenient. It was used to enable faster data analysis and visualizations Table 5.2: Software requirements 5.3 Model Architecture The autoregressive moving-average (ARMA) model is used in ARIMA to evaluate and predict equally spaced univariate time series data, transfer function data, and intervention data. A value in a response time series is predicted by an ARIMA model as a linear combination of its own past values, past errors, and the present and past values of other time series. In more detail, the terms AR give a description of the series based on its p past observations with auto-regressive coefficients, where p is the minimum number of prior observations required to predict the value of the series at the present. The MA component in a regression model displays a rolling average of past error terms rather than focusing on historical observations. Additionally, the integrated part is characterized by a parameter d, represents the order of differentiation. Library Version Pandas 1.4.1 Numpy 1.22.3 Jupyter 1.00 Matplotlib 3.5.1 Flask 2.0.3 Plotly 5.9.0 36 5.4 Model Development The model was developed using python language on Jupyter notebooks. Data was pre- processed and manipulated using Pandas library. Python was selected because it was best fit for machine learning program implementation and because it was easier to understand the python code. The sections below describe how the model was developed. 5.4.1 Dataset overview Data retrieved was structured data. It had previously been used to conduct another research. However, it contained some information that was not relevant for the creation of the prediction model. 5.4.2 Pre-processing The drug dataset contained data on the dates, drug name and quantity of drugs sold. During the pre-processing, some of these columns, including the condition and rating were dropped from the dataset. The weather dataset consisted of all climate data accumulated for 3 years. The drug dataset was then merged with the weather dataset to form one dataset that was then used for the prediction. The figure below shows a sample of the data contained in the final dataset. Seasonality had the biggest effect in terms of drug volumes; therefore, the sales column was decomposed estimate seasonal effects that were used to create and present seasonally adjusted values. Dates with missing values were dropped. A few types of drugs were then selected randomly to perform the forecast on. This included Centrizine syrup, Coldcaps and Zefcoln. 37 Figure 5.1: Merged dataset 5.4.3 Correlation During visualization of the data, humidity data was plotted on a line graph showing humidity against days of the year. Another line graph was plotted to show how humidity and precipitation affects the patterns in which drugs are sold. As shown in the diagrams below, during the months between May and September, the total weekly stock sold seemed to go higher. High precipitation and humidity meant a higher purchase of drugs in pharmacies. 38 Figure 5.2: Correlation 5.4.4 Time series Decomposition Trends and seasonality were explored in time series decomposition view. This was useful when determining uptake of residuals in data, based on the decomposed data. This was used to imply predictability since when decomposition is used, higher residuals always mean a lower predictability and vice versa. The figure below shows trends in the drug sales. 39 As the trend rises, so do the sales. Figure 5.3: Time series decomposition view 5.4.5 Stationarity analysis Data was calculated for stationarity using the Augmented Dickey-Fuller (ADF) test. The ADF test indicates the stationarity of a time series. Time series forecasting models like Vector Autoregressive model are dependent on time series stationarity hence need for the test. The p-value for Centrizine syrup and Zefcoln was not significant enough hence it was concluded that the time series was not stationary. It was however made stationary by differencing. Coldcaps time series were however stationary since the p-value was less than 0.05. 40 5.4.6 Autocorrelation Autocorrelation was done for ‘week’, ‘total weekly stock’, ‘average humidity’ and ‘average precipitation’. Autocorrelation analysis illustrates the potential for time series data prediction. It is used to summarize the strength of a relationship with an observation in a time series with observations at prior time steps. In the autocorrelation graphs, Count and humidity show a close autocorrelation Figure 5.4: Autocorrelation 5.5 Training the model The model was designed using the train data and tested using test data. This was done to identify stationary data. The corpus was first split into two with a ratio of 80:20. The training dataset which was 80% while the test data was 20%. Both SARIMAX and ARIMA models were fitted and compared using the Root Mean Squared Error (RMSE) to find out which one of the two had the best performance. 5.5.1 Time series Forecasting 41 ARIMA method was used to carry out short-term (rolling forecast) and long-term forecasting based on test data. Before each forecast was made, the process of optimizing hyper-parameters (p, d, q) of ARIMA model was carried out. Then, with optimal set of parameters, rolling forecast and long-term forecasting was carried out. Initial p and q parameters were first determined. SARIMAX was used since the dataset had seasonal cycles. 5.6 ARIMA model The ARIMA model was fitted with p, d, q value of 1, 1, 1 respectively. It predicted total weekly stock. Figure 5.8 shows a graphical representation of how the forecast was made. Model 1: ARIMA model 5.7 SARIMAX model The SARIMAX Model was fitted with p, d, q value of 1, 0, 1 respectively and a seasonal value of 1,1,1. Below is a figure that shows how the SARIMAX model was fitted. Model 2: SARIMAX model SARIMAX predicted total weekly stock and gave an accuracy of 6.231343822062039. Figure 5.5 shows a graphical representation of how the forecast was made. 42 Figure 5.5: SARIMAX forecast When a sample drug is selected, e.g Zefcoln, fitted with SARIMAX , the total weekly current and predicted stock of the drug over 6 weeks looks as shown in figure 5.6 below. 43 Figure 5.6: Zefcoln forecast 5.8 Validating the model The testing data was used to validate the model. Model validation was conducted by assessing the error rates of the model based on validation data the model had not encountered during the training phase. The validation data consisted of approximately 20% of the collected dataset. The model was tasked to forecast the quantity of drugs for stocking in the next six weeks. 5.8.1 Model Performance Results The experiment proceeded to determine the performance of each model by calculating the error rate of the models is determined by comparing the predicted drug quantity for stocking to the previous/expected drug quantity. 5.8.2 Root Mean square Accuracy for the model was tested using RMSE. RMSE is the residual squared difference between the predicted values and the actual. The RMSE value for the ARIMA model was 6.2 while that of SARIMAX was 5.5 which was a bit high owing to the fact the outliers were not removed from the data. The accuracy of the SARIMAX model was 88% as shown in the figure below, hence the model was good for prediction. 44 Figure 5.7: Accuracy report 5.8.3 Forecasting Inventory with deployed model A backend Restful Application Programming Interface (API) was developed. The backend was developed using Python programming language while importing Flask as a library. This was done using Visual Studio Code. The API provides an interface to the model. One can be able to upload a csv data containing past sales data containing the drug name, quantity and date sold. The data is fed into the model and a six-week prediction of the quantity of the drugs is presented to the user. The frontend application is a web application that is user-friendly. It acts as an interface between the pharmacist and the API backend. Upon successfully logging in, the pharmacist can trigger the prediction for the coming weeks. 45 Chapter 6: Discussion 6.1 Introduction This chapter discussed the results of the research in relation to the objectives. These discussions were a culmination of the tests and results obtained from the implementation and testing as presented in chapter 5. This research focused on the criteria used for stocking inventory for pharmacies. After reviewing different methods, the researcher chose the time series method which was best suited for seasonal data analysis and combined with SARIMAX forecasting. 6.2 Challenges faced in stocking pharmaceutical inventory The first objective was to investigate the challenges faced in stocking pharmaceutical inventory. Based on the literature reviewed, the researcher found that optimizing inventory is difficult for pharmacies stating a myriad of challenges they face while stocking. The research also revealed that factors that affect stocking were seasons, trends of drugs and weather patterns among many others. 6.3 Previously Used Methods for Inventory Management The second objective was to review the previously used methods i.e., ARIMA model (Adam Hayes, 2021), SVM (Gutierrez, Solis, & Mukhopadhyay, 2008) and ANN, and current methods such as the use of perpetual inventory systems (Gupta, 2020) and automatic dispensing systems (Nicole, Clifford, Michele, & Kieran, 2014). From the literature review, it was discovered that methods used to stock drugs only depended on if the drug is currently unavailable or if it needs restocking. Findings from the literature review also revealed that drugs could be reordered but not based on the trends of the drug, seasons, or prevalence of a disease, meaning they would still be understocked or overstocked. 6.4 To identify how pharmacies currently stock The third objective was to determine what methods pharmacies currently use to plan for future inventory stocking. Based on the literature reviewed, the researcher found that most pharmacies stock inventories through guesswork and personal intuition. Some pharmacies 46 still use traditional methods, which results in improper planning, consequently resulting in budget waste, stagnation, and stockouts. 6.5 Design of the Model using Predictive Analytics The fourth objective was to develop a prediction model for stocking pharmaceutical inventory. Research findings revealed that the developed model predicted drugs to be stocked six weeks ahead depending on the sales made and the seasonality of the drug. 6.6 Validation of the Model The fifth objective was to validate the proposed model that predicts drugs to be stocked based on seasonality and sales. The validation was achieved using structured walk-through where predicted outcomes were compared with observed outcomes. Testing of the model was also done by uploading csv files containing sales and weather data of previous months and quantity of drugs to be stocked for the next six weeks was displayed. The drugs were determined by the season(weather) and the previous sales during that season in the previous years. 6.7 Advantages of the Developed Model The most significant advantage of the developed model was that it used sales and weather data to predict, hence it was precise in giving insights. It also requested for input of the required drugs by the pharmacist so that this data could be stored for future purposes e.g., making refined predictions. The developed model required little human interaction as the pharmacist only interacted with it if they needed information on the quantity needed for a particular drug, while the administrator only interacted with it when entering the required factors to determine the prediction, i.e., weather patterns and previous drug sales. For the user interface, it gave a dashboard that shows all drugs for the condition entered and alerts for restocking or reorders of drugs depending on the users setting. The final product was a system that could be accessed through the web and was not dependent on operating systems on different devices. It could therefore be integrated with systems that run on the web. 6.8 Research Contributions The developed model provided a solution for estimating future stock of drugs in pharmacies. The model provided insights to pharmacists with regards to expected demand 47 for the next 6 weeks based on previous demands for that season for the previous years. The model could be used in pharmacies and hospitals to plan and budget for future stock to avoid wastage of resources, maximize on profits and help in ensuring optimal pharmaceutical care for patients. 6.9 Challenges Encountered The most significant challenge was obtaining data. Most pharmacies and pharma companies do not disclose their sales information. However, I got two pharmacies to provide data. Some of the data obtained online lacked most of the factors needed or was too small to predict with. It was also time consuming going through the dataset as data was not structured in an understandable way. Secondly, different drugs had different seasonality patterns hence the model fitted could not work for all of them. It had to be fine-tuned i.e., the p, d, q parameters had to be changed to predict for different drugs. Other challenges included the loss of a laptop and loss of already acquired data. 48 Chapter 7: Conclusion and Recommendations 7.1 Conclusion This research’s main objective was to come up with a solution to solve the problem of stocking of pharmacies without knowledge of what and when to stock which is brought about by factors such as climatic conditions during different seasons. The research discussed models that had been used previously. The research also performed a literature review on previous research, shortcomings and implementations done in the past. This research focused on the sales and weather patterns to improve the decision-making process on stocking. The main deliverable was a model that predicts pharmaceutical inventory for stocking purposes. The final deliverable was a web application showing the prediction of a drug for the next six weeks. This deliverables’ impact in the pharmaceutical industry includes the fact that it is much easier to navigate through the web application hence saving time while still making the right predictions in stocking. This involved the use of already existing datasets documenting daily drug purchases for the previous years and previous weather data as a basis for the forecasts. It also involved merging of the two datasets and splitting them into train and test data. 7.2 Recommendations The model for prediction of inventory for stocking in pharmacies was a suitable solution for pharmacies. However, there were some recommendations since more could still be done in this area. The following recommendation is made with regards to the research: i. Data was quite difficult to find. Recommendations were made to local pharmacies to keep records of their sales data and what methods they used before to stock inventory to allow for improvement of the model. ii. Pharmacies should be able to accept the model to be integrated into their current systems to allow for easy inventory management. 49 iii. The model can be fit to predict for more or less than six weeks in preparedness for future demand, therefore, other retail businesses can also use the developed model for their inventory management. 7.3 Suggestions for Future Research Future research suggestions include the following: i. The research couldn’t include drug trends. Therefore, tweets could be crawled from twitter to get the current trends of a drug instead of only relying on the entered data. This will help strengthen the reliability of the model. ii. The model parameters couldn’t be automated for this research. However, in future the model fitting process could be automated to fine tune the parameters used for each drug being forecasted. iii. The model can be fit to predict what types of medication and quantity to stock instead of manually entering the drug type. iv. More web application features can be developed to cater for other pharmacy functionalities. 50 References Government of the Republic of Kenya. (2018). National Climate Change Action Plan 2018- 2022. Nairobi: Ministry of Environment and Forestry. Adam Hayes. (2021, 10 12). Autoregressive Integrated Moving Average(ARIMA). Investopedia. Agarwal, A., & Jayant, A. (2019). Support Vector Machine Model for Demand Forecasting in Automobile parts industry. Research Journal of Aplied Sciences, Engineering and Technology, 33-49. Armstrong, J. S. (2012). Illusions in Regression analysis. International Journal of Forecasting, 689-694. Baumer, A. M., Clark, A. M., Witmer, D. R., Geize, S. B., Vermeulen, L. C., & Deffenbaugh, H. J. (2015). National survey of the impact of drugs shortages in acute care hospitals. Pubmed. Berdot, S., Blanc C, Chevalier , D., Bezie, Y., Le, L., & Sabatier, B. (2019). Impact of drug storage system: a quasi-experimental study with and without an automated-drug dispensing cabinet. International Journal for Quality in Health Care, 225-230. Cresewell, J. (2011). Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research. Pearson. Deidre, C., Karrar, K., & Jayasree, K. I. (2018). Shortages, stockouts and scarcity: Issues facing the security of antibiotic supply and the role for pharmaceutical companies. Access to medicine Foundation. Doctors and Health practitioners in Nairobi, K. (2021). Allianz Care. Retrieved from Allianz Worldwide Care, Internationa Medical Provider Finder: https://apps.allianzworldwidecare.com/poi/hospital-doctor-and-health-practitioner- finder?PROVTYPE=HOSPITALS&CON=Africa&COUNTRY=Kenya&CITY=Nairobi 51 Embrey, M. A. (1982). MDS-3: MANAGING ACCESS TO MEDICINES AND HEALTH TECHNOLOGIES. Kumarian Press. Emukule, G., Mott, J., Spreeuwenberg, P., Viboud, C., Commanday, A., Muthoka, P., . . . Paget, W. (2016). Influ