Predicting the Success of Early-Stage African Startups Using Machine Learning Maureen Wambui Ndung’u 133482 Submitted in partial fulfillment of the requirements for the Degree of Bachelor of Business Science in Financial Engineering at Strathmore University Strathmore Institute of Mathematical Sciences Strathmore University Nairobi, Kenya January 2025 This Research Project is available for Library use on the understanding that it is copyright material and that no quotation from the Research Project may be published without proper acknowledgement. i DECLARATION I declare that this work has not been previously submitted and approved for the award of a degree by this or any other University. To the best of my knowledge and belief, the Research Project contains no material previously published or written by another person except where due reference is made in the Research Project itself. © No part of this Research Project may be reproduced without the permission of the author and Strathmore University Maureen Wambui Ndung’u [Name of Candidate] [Signature] 31st January 2025 [Date] This Research Project has been submitted for examination with my approval as the Supervisor. Edwin Adoyo Obonyo [Name of Supervisor] [Signature] 31st January 2025 [Date] Strathmore Institute of Mathematical Sciences Strathmore University ii ABSTRACT Africa's share of global venture funding is estimated to be around 1%; meaning that only a very small portion of worldwide venture capital investment goes towards African startups. This presents a challenge for entrepreneurs, investors, and policymakers seeking to foster innovation and economic growth. This study aims to bridge this gap by leveraging machine learning models to predict the success of African startups based on key factors: business operating status, number of funding rounds, and business age. Unlike prior research, which has predominantly focused on Western markets and defined success through acquisitions or IPOs, this study specifically examines African startups, addressing the continent’s unique entrepreneurial landscape. The research utilizes CrunchBase data spanning from 2000 to 2024, encompassing 28,851 startups, applying three machine learning models—Logistic Regression, Support Vector Machines, and Random Forest—to evaluate startup success. The dataset was split into training and validation sets, ensuring robust model performance assessment. Results indicate an exceptionally high accuracy of 99-100%, with strong sensitivity but lower specificity, highlighting potential dataset imbalance. Despite this, the machine learning models outperform traditional probability-based approaches by capturing non-linear relationships and complex interactions between startup success factors. This provides a more nuanced and data-driven approach to early-stage business evaluation compared to simplistic probabilistic models. The findings offer practical implications for investors by enabling more informed decision- making, for entrepreneurs by identifying key success drivers, and for policymakers by informing strategies that enhance startup ecosystems in Africa. Future work should focus on balancing the dataset, incorporating additional predictive features, and expanding testing to ensure greater generalizability. This study contributes to the growing body of research on startup success prediction, offering a tailored approach for the African market and providing valuable tools for practitioners in the entrepreneurial and investment space. iii ACKNOWLEDGMENT This research would not have been possible without the support and guidance of many individuals. I would like to express my deepest gratitude to my supervisor, Dr. Edwin Obonyo, for his invaluable insights, encouragement, and constructive feedback throughout this project. I am also grateful to Dr. Marian Chatoro for her support with Chapters 4 and 5. A special thank you goes to Matilda Bosire for her expertise and assistance in reviewing the machine learning methodology. Special thanks to my family and friends for their unwavering support and belief in my abilities. To my professors at Strathmore University and my peers, who inspired me with their commitment to academic excellence, I am truly grateful. Finally, I acknowledge the use of Crunchbase as a comprehensive data source, which significantly contributed to the depth and rigor of this study. This project is a testament to the collective efforts of all who supported me, and I am deeply appreciative of their contributions. iv Table of Contents DECLARATION ......................................................................... Error! Bookmark not defined. ABSTRACT ................................................................................................................................. ii ACKNOWLEDGMENT ............................................................................................................ iii Preliminary .............................................................................................................................. vii List of Figures ...................................................................................................................... vii List of Tables ........................................................................................................................ vii Chapter 1: Introduction ............................................................................................................ 1 1.1 Background ...................................................................................................................... 1 1.2 Problem Statement ........................................................................................................... 6 1.3 Research Objectives ......................................................................................................... 7 1.3.1 General Objective ...................................................................................................... 7 1.3.2 Specific Objectives .................................................................................................... 7 1.4 Research Questions .......................................................................................................... 7 1.5 Justification Of Study ....................................................................................................... 7 1.6 Significance Of Study ...................................................................................................... 8 1.7 Scope Of The Study ......................................................................................................... 9 Chapter 2: Literature Review .................................................................................................. 11 2.1 Startup Success Factors .................................................................................................. 11 2.1.1 Product-Market Fit .................................................................................................. 11 2.1.2 Financing ................................................................................................................. 12 2.1.3 Headquarters Location ............................................................................................. 13 2.1.4 Team Composition .................................................................................................. 14 2.1.5 Business Strategy ..................................................................................................... 14 2.2 Predicting The Success Of Businesses Using Machine Learning .................................. 16 2.2.1 Logistic Regression ................................................................................................. 16 2.2.2 Random Forest ......................................................................................................... 17 2.2.3 Support Vector Machine .......................................................................................... 18 2.2.4 Gradient Boosting .................................................................................................... 19 2.2.5 Neural Networks ...................................................................................................... 21 2.2.6 Naive Bayes ............................................................................................................. 22 2.2.7 Decision Trees ......................................................................................................... 23 2.2.8 K-Nearest Neighbours ............................................................................................. 24 2.3 The Relationship Between Startup Success Factors And Machine Learning ................ 25 2.4 Gaps Found In The Literature ........................................................................................ 27 2.5 Conceptual Framework .................................................................................................. 29 Chapter 3: Methodology .......................................................................................................... 31 3.1 Introduction .................................................................................................................... 31 v 3.2 Research Design ............................................................................................................. 31 3.3 Population And Sampling .............................................................................................. 31 3.4 Data Collection ............................................................................................................... 32 3.4.1 Data Collection For Objective 1 .............................................................................. 32 3.4.2 Data Collection For Objective 2 .............................................................................. 33 3.5 Data Analysis ................................................................................................................. 33 3.5.1 Identifying Success Factors ..................................................................................... 33 3.5.2 Model Selection ....................................................................................................... 34 3.5.3 Model Training And Validation .............................................................................. 37 3.5.4. Model Evaluation ................................................................................................... 37 Chapter 4: Results And Analysis .............................................................................................. 40 4.1 Dataset Overview: Focus on African Startups ............................................................... 40 4.1.1 Data Preparation and Feature Selection ................................................................... 41 4.1.2 Data Transformation and Encoding ......................................................................... 41 4.1.3 Success Criteria ....................................................................................................... 42 4.1.4 Dataset Overview After Implementing the Success Criteria ................................... 43 4.1.5 Advanced Data Transformation and Model Optimization Techniques ................... 44 4.2 Objective 1: To identify the critical factors that influence startup success. ................... 47 4.2.1 Correlation Analysis and Key Insights .................................................................... 47 4.3 Objective 2: Performance Analysis of the Machine Learning Predictive Models ......... 48 4.3.1 Performance Analysis for the Dependent Variable: Success_Status ....................... 48 4.3.2 Performance Analysis for the Dependent Variable: Success_Age .......................... 51 4.3.3 Performance Analysis for the Dependent Variable: Success_Rounds .................... 53 4.3.4 Further Performance Analysis: F1 Score and Matthew’s Correlation Coefficient (MCC) ............................................................................................................................... 56 Chapter 5: Conclusion ............................................................................................................. 59 5.1 Introduction .................................................................................................................... 59 5.1.1 Discussion on the Analysis of Key Success Factors ............................................... 59 5.1.2 Discussion of Machine Learning Model Results ..................................................... 60 5.2 Limitations and Challenges ............................................................................................ 63 5.2.1 Class Imbalance ....................................................................................................... 63 5.2.2 Survivorship Bias .................................................................................................... 64 5.2.3 Insufficient Feature Granularity .............................................................................. 64 5.3 Future Actions for Reconsideration ............................................................................... 65 5.3.1 Data Enhancements ................................................................................................. 65 5.3.2 Alternative Success Definitions ............................................................................... 66 5.3.3 Exploring Alternative Models ................................................................................. 66 5.3.4 Threshold Tuning .................................................................................................... 67 5.3.5 Incorporating Temporal Features ............................................................................ 67 5.3.6 Segmenting Aggregated Variables .......................................................................... 68 5.3.7 Adding Qualitative Features .................................................................................... 68 5.4 Contribution of predictive machine learning to startup success .................................... 69 vi Bibliography ............................................................................................................................ 71 vii Preliminary List of Figures Figure 1: Relationships of the Crunchbase's datasets .............................................................. 27 Figure 2: Conceptual Framework. ........................................................................................... 30 Figure 3: Continent Visualization ............................................................................................ 40 Figure 4: Distribution of Startups per Country ........................................................................ 41 Figure 5: Data Distribution of Original Variables ................................................................... 42 Figure 6: Data Distribution of Derived Variables .................................................................... 43 Figure 7: Heatmap of Data Variables ...................................................................................... 47 List of Tables Table 1: Past Research on Predicting Startup Success Using Logistic Regression ................. 17 Table 2: Past Research on Predicting Startup Success Using Random Forest ........................ 17 Table 3: Past Research on Predicting Startup Success Using Support Vector Machines ........ 19 Table 4: Past Research on Predicting Startup Success Using Gradient Boosting ................... 20 Table 5: Past Research on Predicting Startup Success Using Neural Networks ..................... 21 Table 6: Past Research on Predicting Startup Success Using Naive Bayes ............................ 22 Table 7: Past Research on Predicting Startup Success Using Decision Trees ......................... 23 Table 8: Past Research on Predicting Startup Success Using K-Nearest Neighbours ............. 24 Table 9: Dataset overview after applying the success criteria ................................................. 44 Table 10: PCA Components Analysis ..................................................................................... 46 Table 11:Results for success_status Using Validation Data .................................................... 49 Table 12:Results for success_status Using Test Data .............................................................. 50 Table 13:Results for success_age Using Validation Data ....................................................... 52 Table 14:Results for success_age Using Test Data ................................................................. 53 Table 15:Results for success_rounds Using Validation Data .................................................. 54 Table 16:Results for success_rounds Using Test Data ............................................................ 55 Table 17:Further Results for success_status Using Validation Data ....................................... 56 Table 18:Further Results for success_age Using Validation Data ........................................... 57 Table 19: Further Results for success_rounds Using Validation Data .................................... 58 1 Chapter 1: Introduction 1.1 Background The success of early-stage startups has been a subject of extensive study, highlighting their importance to economic growth, innovation, and employment generation (Mehmeti & Musabelli, 2024). These studies span various geographical contexts, including Europe, where research has shown that early-stage companies are crucial drivers of the economy (Mehmeti & Musabelli, 2024). In Asia, the importance of startups has been recognized, particularly in the technology sector, where they contribute significantly to the digital economies (Misra, Jat, & Mishra, 2021). Similarly, in North America, research underscores the role of startups in fostering innovation and competition within markets (Mehmeti & Musabelli, 2024). In the Middle East, the government’s support for startups has been linked to the rapid growth of tech- based ventures, contributing to the region's global economic standing (Żbikowski & Antosiuk, 2021). Common themes in these studies include the critical role of innovation, the importance of access to capital, the challenges of market entry, and the significant impact of government policies on the success of early-stage companies (Vasquez, Santisteban, & Mauricio, 2023). Machine learning has increasingly been employed to predict the success of startups across various countries, leveraging a range of algorithms and data sources to enhance predictive accuracy. Ensemble models, which combine multiple machine learning algorithms to improve prediction accuracy, have demonstrated significant effectiveness in forecasting startup success. For instance, studies by Ross, Das, Sciro, & Raza (2021) utilized ensemble methods to merge data from Crunchbase and patent databases. These models often outperform individual algorithms by aggregating their strengths and mitigating their weaknesses. The use of ensemble techniques such as Random Forest, Gradient Boosting, and eXtreme Gradient Boosting (XGBoost) has shown high accuracy rates and robust performance in various studies (Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia, 2019; Krishna, Agrawal, & Choudhary, 2016; Ünal & Ceasu, 2019; Corea, Bertinetti, & Cervellati, 2021; Bangdiwala, Mehta, Agrawal, & Ghane, 2022). Key outcomes include improved prediction accuracy and enhanced ability to handle complex, high-dimensional data. Hybrid intelligence methods, which integrate human expertise with machine learning models, address the complexities and uncertainties inherent in startup predictions. Dellermann et al. (2017) explored these methods by combining collective human judgments with machine learning algorithms. This approach allows for the incorporation of intuitive insights from 2 experts alongside analytical rigor provided by algorithms. The results highlighted that hybrid models could effectively capture nuanced patterns that purely algorithmic or human-based approaches might miss (Dellermann, Lipusch, Ebel, Popp, & Leimeister, 2017). This integration has led to better identification of success factors and more informed decision- making for investors. Deep learning models have been employed to analyze large and complex datasets, such as those from Crunchbase and Kaggle (Ferrati, Chen, & Muffatto, 2021; Potanin, Chertok, Zorin, & Shtabtsovsk, 2023). These models, including neural networks and their variants, have shown promise in predicting startup success. For example, Ferrati et al. (2021) developed a deep learning model with high recall rates, indicating its effectiveness in identifying successful startups. Deep learning's ability to handle unstructured data and learn from intricate patterns has contributed to more precise predictions and deeper insights into startup performance. The application of machine learning models has yielded several key insights. Models such as Random Forest and XGBoost have demonstrated impressive accuracy and precision, with some studies reporting precision levels exceeding 90% (e.g., Bangdiwala et al., 2022), indicating their effectiveness in reliably predicting startup success. Additionally, hybrid and ensemble models have shown a superior ability to handle uncertainty and complexity by integrating various sources of information and analytical approaches, offering a more detailed assessment of potential success. Furthermore, machine learning has proven crucial in identifying critical success factors, with features such as funding stages, company age, and market trends emerging as significant predictors. For instance, Ünal and Ceasu (2019) highlighted that XGBoost and Random Forest prioritized funding details and company age as important features in their analyses. The growing application of machine learning in startup success prediction underscores its transformative potential. By utilizing high-quality data sources and advanced analytical techniques, machine learning offers a powerful tool for enhancing predictive accuracy and informing investment decisions. The integration of various models and methodologies continues to advance the field, promising even more refined and actionable insights for stakeholders in the startup ecosystem. Startup success is a multidimensional concept, often defined subjectively depending on the perspective of the founder, venture capitalists, or investors (Baskoro, Prabowo, Meyliana, & Gaol, 2022). Researchers have identified various criteria to define this success. Some measure 3 success by whether a startup is acquired (Cholil, et al., 2024) or has issued an IPO (initial public offering) or attained unicorn status (Potanin, Chertok, Zorin, & Shtabtsovsk, 2023). Others assess it based on the startup's operational status, acquisition, or IPO issuance (Ünal & Ceasu, 2019). Additionally, success can be defined by achieving an IPO or undergoing a merger and acquisition (M&A) (Gangwani & Zhu, 2024; Thirupathi, Alhanai, & Ghassemi, 2022; Bangdiwala, Mehta, Agrawal, & Ghane, 2022). Securing Series A funding is also considered a milestone of success (Te, et al., 2022; Dellermann, Lipusch, Ebel, Popp, & Leimeister, 2017; Sharchilev, et al., 2018), as is profitability (Tomy & Pardede, 2018; Vasquez, Santisteban, & Mauricio, 2023). Alternatively, repeated financing rounds can indicate a startup's success (Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021; Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia, 2019). There is a growing need to apply machine learning to predict the success of early-stage African businesses, given the unique challenges and opportunities present in the continent (McKenzie & Sansone, 2019). African startups often face distinct obstacles, such as limited access to capital, infrastructural challenges, and diverse market conditions, which necessitate a tailored approach to predicting their success (African Scalecraft, n.d.). The use of machine learning in this context could help identify patterns and factors that are specific to African startups, thereby enabling more accurate predictions and better decision-making for investors and entrepreneurs. Moreover, the integration of localized data sources and contextual knowledge with machine learning algorithms could provide insights that are more relevant to the African market, ultimately contributing to the growth and sustainability of startups in the region (Gichohi, 2023). African startups operate in a context marked by a youthful demographic, rapid urbanization, a growing middle class, and increasing mobile phone penetration, which collectively offer a fertile ground for entrepreneurial activities (United Nations, 2021; United Nations, 2023). There are notable trends in the African entrepreneurship space, with several emerging tech- enabled companies significantly shaping the landscape. The significance of digital technology in the global economy renders it a strategic sector, both economically and politically (Smart Africa, 2020). For example, in 2023, the total capitalization of Google, Apple, Microsoft, Meta, Amazon, and NVIDIA is more than 3 times the total GDP of the entire African continent (Smart Africa, 2020; STATISTA, 2024). This highlights the significant impact of ICT on the global 4 economy and underscores its role as a powerful driver of economic and social development in developing countries (Smart Africa, 2020). In recent years, technology has become a central focus in Africa’s private capital investment sector, fundamentally altering traditional investment paradigms and industry trends (AVCA, 2024). Another notable trend is the increasing interest of international investors in African startups. Global venture capital firms, development finance institutions, and corporate investors are recognizing the potential of Africa's innovation ecosystem and are making significant investments (AVCA, 2024). Local investors are also playing a crucial role in supporting African startups. African venture capital firms, angel investors, and corporate venture arms are providing not only funding but also mentorship and strategic support. Initiatives such as the African Business Angels Network (ABAN) and various startup incubators and accelerators are fostering a vibrant entrepreneurial ecosystem by connecting startups with investors and resources. The exit environment for African startups has been difficult, with a 48% decrease in exits in 2023 compared to the previous year (AVCA, 2024). Mergers and acquisitions (M&A) are becoming an increasingly prevalent exit strategy for these startups (EAVCA, 2024; AVCA, 2024). The proportion of sales to private equity buyers rose to 33% in 2023 from 23% in 2022 (AVCA, 2024). This strategy is growing in popularity among fund managers aiming to expand their platform countries. Although initial public offerings (IPOs) remain relatively rare in Africa, they represent another potential exit route. Strengthening capital markets and regulatory frameworks could make IPOs a more attractive option for startups. Jumia’s successful IPO on the New York Stock Exchange in 2019 has set a precedent, showing that African startups can garner significant global investor interest. Investment and entrepreneurial activity vary significantly across different regions of Africa (AVCA, 2024). Southern Africa, with South Africa at its core, continues to be a major hub for startups and investment (AVCA, 2023). West Africa, home to vibrant startup ecosystems in countries like Nigeria and Ghana, has traditionally attracted significant venture capital investment (AVCA, 2023). North Africa, which includes countries like Egypt, Morocco, and Algeria, has also seen growing entrepreneurial activity (AVCA, 2024). East Africa, with Kenya as a leading hub, continues to attract investment and foster innovation (EAVCA, 2023). The unique environment of African startups is shaped by a complex interplay of socio- economic and political factors (Ajayi-Nifise, Tula, Asuzu, Mhlongo, & Ibeh, 2024), along with 5 emerging trends in technology and investment (AVCA, 2024). Understanding these dynamics is crucial for predicting the success of early-stage businesses in Africa. Despite the challenges, the resilience and adaptability of African entrepreneurs, coupled with increasing technological advancements and a youthful demographic, present promising opportunities for growth and innovation in the continent's startup ecosystem (United Nations, 2021; Raj, 2023). By leveraging machine learning approaches, stakeholders can better predict and enhance the success of African startups, driving economic development and fostering sustainable growth. 6 1.2 Problem Statement Understanding the location of a company is crucial for predicting startup success, as entrepreneurial ecosystems vary significantly across different regions (Ferrati & Muffatto, 2020). Most existing research on startup success has predominantly focused on developed economies such as North America, Europe, and Asia, often neglecting emerging markets like Africa (Azeem & Khanna, 2023). For instance, the study by Ünal and Ceasu (2019) on using machine learning to predict startup success, excluded data from Africa and Oceania due to zero or near-zero variance, effectively overlooking these regions' unique dynamics and potential. This geographical bias in research not only diminishes the relevance of predictive models for African startups but also misses out on valuable insights specific to the African entrepreneurial ecosystem. By incorporating African data into predictive models, we can develop a more nuanced understanding of local success factors, which is essential for accurate predictions and effective support for startups in the region. Failing to address this gap risks perpetuating a cycle of underinvestment and missed opportunities in Africa’s burgeoning startup sector. Moreover, the global venture capital (VC) market has been seen to continue underinvesting in the African VC ecosystem compared to other regions, as noted by Truman (2023). This underinvestment is further exacerbated by the tendency of American venture capital and private equity to predominantly fund white foreign founders, leaving African entrepreneurs at a disadvantage (The Guardian, 2020; Data Driven VC, 2024). The reliance on traditional investment methods, which often lack inclusivity and fail to address biases, limits the growth potential of African startups. Machine learning models present an opportunity to counteract these biases by providing data-driven insights that can encourage more equitable investment practices. By leveraging machine learning, we can create more inclusive and accurate predictive models that better reflect the realities of the African startup landscape. Without these advancements, Africa risks continued underrepresentation and inequality in global venture capital, stifling innovation and hindering economic development. For African startups, this lack of tailored predictive models exacerbates challenges such as limited access to capital and support, which are critical for scaling businesses in emerging markets (BIC Africa, 2021; Azeem & Khanna, 2023). Furthermore, studies reveal that without localized data, predictive models may continue to perpetuate biases, leading to unequal investment opportunities and hindering the growth of the African venture capital ecosystem (Turman, 2023; Ganesan, Mahalingam, Nathan, Ware, & Weinberg, 2023). Machine learning 7 models, when adapted to include African-specific data, can mitigate these issues by offering more accurate predictions and promoting fairer investment practices. This is not merely a technological update but a necessary step towards fostering a more equitable entrepreneurial environment that can drive sustainable development and innovation across the continent (Data Driven VC, 2024). 1.3 Research Objectives 1.3.1 General Objective The main objective of this study is to develop a model that predicts the success of early-stage business in Africa using supervised machine learning. 1.3.2 Specific Objectives 1. To identify and analyze key success factors for early-stage African startups. 2. To develop a supervised machine learning model that predicts the success of these businesses based on identified factors. 1.4 Research Questions 1. What are the critical success factors for early-stage African startups? 2. How can these success factors be quantified, weighted, and incorporated into a supervised machine learning predictive model? 1.5 Justification of Study Investing in startups presents inherent risks, particularly in emerging markets like Africa, where the entrepreneurial landscape is both dynamic and under-researched. Despite the continent's rich reservoir of innovation and entrepreneurial talent, many African startups struggle to secure the necessary funding to scale their ventures. This difficulty often stems from the uncertainty surrounding their potential for success and the lack of predictive tools tailored to the unique characteristics of the African market. The absence of reliable predictive models exacerbates the risk for investors, who may face substantial losses from investing in ventures that fail to thrive. This uncertainty hampers the growth of promising startups and stifles the broader economic development of the region. To address this challenge, the development of a robust predictive tool specifically designed for African startups is essential. By analyzing data from startups established between 2000 and 2023 across all 54 African countries and diverse industries, this study aims to create a predictive model that captures the distinctive factors influencing startup success in Africa. Such a model 8 would provide valuable insights into which startups are likely to succeed, thereby reducing the risk of investment and enhancing the confidence of investors. Accurate prediction of startup success is crucial not only for mitigating investment risks but also for fostering a more vibrant and inclusive entrepreneurial ecosystem. With more precise and tailored predictions, investors, including angel investors and venture capitalists, will be better positioned to support high-potential startups. This increased confidence can lead to greater investment in early-stage ventures, providing African entrepreneurs with the financial backing they need to scale their businesses and contribute to economic growth. As a result, this research not only benefits investors and entrepreneurs but also contributes to the broader economic and social advancement of Africa. 1.6 Significance Of Study The research on predicting the success of early-stage African businesses using a machine learning model holds substantial significance across multiple domains. This study aims to bridge the gap in understanding the unique success factors that drive the growth and sustainability of startups in the African context. By identifying these factors and developing a robust predictive model, the research will provide insights and tools that can benefit various stakeholders, including African entrepreneurs and startups, investors and venture capitalists, policymakers and economic planners, academia and researchers, and business support organizations and incubators. This paper will be immensely beneficial for African entrepreneurs and startups. By understanding the critical success factors identified through this study, entrepreneurs can make informed decisions, adopt best practices, and strategically plan their business operations to enhance their chances of success. The predictive model developed will serve as a tool for self- assessment, enabling startups to evaluate their potential for success based on historical data and identified success factors. Investors and venture capitalists will gain a deeper understanding of the elements that contribute to the success of early-stage African businesses. The predictive model will provide a data-driven approach to assessing the viability and potential of investment opportunities. This can lead to more informed investment decisions, reduced risk, and optimized allocation of resources. By identifying promising startups, investors can better support innovation and economic growth in Africa. 9 Policymakers and economic planners will benefit from insights into the success factors of African startups, which can inform the development of policies and programs that foster entrepreneurship growth. The research findings can guide the creation of supportive regulatory frameworks, financial incentives, and infrastructure development initiatives. Ultimately, this can lead to a more conducive environment for business development, economic diversification, and job creation. The academic community and researchers will find this study valuable as it contributes to the body of knowledge on entrepreneurship and SME success in the African context. The research methodology, findings, and predictive model can serve as a foundation for further studies, facilitating scholarly discourse and the advancement of research in this field. Additionally, the study can be incorporated into academic curricula, enhancing the education of future entrepreneurs and business leaders. Business support organizations and incubators play a crucial role in nurturing startups. The insights from this research will enable these organizations to tailor their support services, mentorship programs, and resources to address the specific needs of African entrepreneurs. The predictive model can be used as a diagnostic tool to identify areas where startups require assistance, thereby enhancing the effectiveness of incubation programs and increasing the overall success rate of supported businesses. This research has the potential to make a significant impact on the African entrepreneurial ecosystem by providing actionable insights, reducing investment risks, informing policy, advancing academic research, and strengthening support structures for startups. The development and validation of a machine learning model to predict business success will not only empower individual entrepreneurs but also contribute to the broader economic development of the African continent. 1.7 Scope Of The Study This study will examine African startups established between 2000 and 2024. By examining startups founded during this 24-year period, the study will capture the evolution of the African entrepreneurial landscape, from its early days of mobile technology adoption (GSMA, 2023) to the present era of digital transformation (World Bank Group, 2024) and increased global investment (AVCA, 2024). This broad timeframe allows for the identification of key factors that have influenced startup success over the years, providing valuable insights into both historical and contemporary dynamics in the African context. 10 The research will encompass all industries to ensure a broad understanding of startup success factors across various sectors. This inclusive approach aims to identify common success factors as well as industry-specific dynamics within the African entrepreneurial ecosystem. Provided that a startup is based in Africa, it will be considered in the analysis. This approach ensures a comprehensive examination of the startup landscape across the entire continent, capturing the diverse entrepreneurial environments and regional variations within Africa. 11 Chapter 2: Literature Review 2.1 Startup Success Factors The success of startups is influenced by multiple factors that determine their ability to survive, scale, and sustain a competitive edge. This section examines key success factors highlighted in recent literature, including product-market fit, financing, headquarters location, and team composition. 2.1.1 Product-Market Fit Product-market fit is a fundamental determinant of a startup's success, representing the alignment between a product's offerings and market demands. Initially popularized by Andreessen in the early 2000s, this concept has evolved, with recent studies emphasizing the iterative process required to achieve and maintain this fit. Meijer (2019) argues that startups must engage in a continuous cycle of building, measuring, and learning to adapt their products to market needs. The lean startup methodology promotes launching a "minimum viable product" (MVP), a simplified version aimed at testing market demand with minimal effort and cost (Ries, 2011; Meijer, 2019; Maurya, 2016). This approach enables startups to gather critical customer feedback and refine their product to better achieve product-market fit (PMF) (Dennehy, Kasraian, O’Raghallaigh, & Conboy, 2016). The Lean Startup Process underscores the importance of connecting deeply with the target audience, converting customer insights into actionable strategies. Maurya (2016) emphasizes that continuous customer feedback is essential for refining a product to meet evolving market needs. This iterative collaboration between the company and its customers accelerates the refinement of the MVP, ultimately enhancing its alignment with PMF. The timing of product- market fit is also crucial (Gross, 2015). Introducing a product too early can result in a mismatch between the product's capabilities and market needs, while late entry can lead to missed opportunities and market saturation (Gurbuz, 2018). Ahmad et al. (2024) highlight the role of agile methodologies in enabling startups to iterate and pivot rapidly, thereby increasing the chances of achieving timely product-market fit. This agility allows startups to capitalize on emerging opportunities while avoiding strategies that may no longer be viable. Kartika (2024) expands on the concept by examining the role of product innovation and scalability in achieving product-market fit. While innovation is essential for differentiating a product in a crowded market (Faster Capital, 2024), scalability ensures that the product can 12 meet growing demand without compromising quality (Spacenco & Mandari, 2020). However, achieving product-market fit is not without challenges. Studies suggest that an overemphasis on innovation can lead to products that are too advanced for current market conditions (Pampillo, 2023; True Digital, 2017), while excessive focus on market demands can stifle innovation. Startups must carefully navigate these trade-offs to achieve and sustain product- market fit. 2.1.2 Financing Access to adequate financing is another critical factor that influences startup success. Literature consistently emphasizes the importance of securing sufficient capital to support growth, attract talent, and scale operations. Kaplan and Lerner (2016) find that startups with greater access to funding are more likely to survive and achieve significant milestones, such as subsequent funding rounds or exits through acquisitions or IPOs. Startups secure funding from a variety of sources, such as angel investors, venture capital (both traditional and corporate), crowdfunding, friends and family, bootstrapping, grants, and debt financing (Janaji, Ibrahim, & Ismail, 2021). Marullo, Casprini, Di Minin, and Piccaluga (2018) assert that access to venture capital significantly enhances a startup's chances of success. Venture capital-backed startups are more likely to achieve critical milestones, such as expanding into new markets or launching new products, due to the combination of financial resources and strategic guidance provided by venture capital firms (Zeng, 2023). However, excessive dependence on venture capital can cause founders to lose control, as investors may advocate for aggressive growth strategies that conflict with the startup's long-term goals (Sulillari, 2023; Stripe, 2024). Therefore, maintaining a balance between securing necessary funding and preserving strategic autonomy is crucial (LinkedIn Community, 2023). Crowdfunding has also emerged as a viable financing option, offering both capital and market validation (Cornelius & Gokpinar, 2021). Mollick and Robb (2016) found that successful crowdfunding campaigns provide necessary funds and generate early customer engagement, which can be critical for refining products and achieving product-market fit. However, they caution that the pressure to deliver on promises made during the campaign can strain a startup's resources. Sauvage, Zeisberger, and Varadan (2022) suggest that startups carefully evaluate whether a Corporate Venture Capital (CVC) fund aligns with their strategic goals, as CVCs offer unique benefits and risks compared to traditional venture capital and angel investors. Financial 13 management also plays a crucial role in sustaining a startup's growth. Ampong (2024) emphasizes the importance of closely monitoring cash flow, managing burn rate, and making strategic investments that align with long-term goals. Effective resource management, including strategic allocation of human and technological resources, is essential for maximizing output (Mahmudur, 2023; Symeonidou, Leiponen, Autio, & Bruneel, 2022). 2.1.3 Headquarters Location The geographical location of a startup's headquarters significantly influences its access to critical resources, including capital, talent, and markets (Guzman, 2018). Startups located in established entrepreneurial ecosystems, such as Silicon Valley or London, often enjoy distinct advantages (Guzman & Stern, 2015; Ahluwalia & Kassicie, 2024). Research shows that location choice is relevant for entrepreneurship, as proximity to venture capital firms and skilled labor pools can facilitate easier access to funding, networking opportunities, and mentorship, all of which are crucial for early-stage growth (Yu & Artz, 2019; Stam, 2015; Díaz-Santamaría & Bulchand-Gidumal, 2021). Geographical factors, such as infrastructure and resources, also play a significant role in shaping venture capital activities. A well-established infrastructure-comprising advanced transportation systems, modern communication technologies-and a favourable business climate, promotes entrepreneurial growth and attracts venture capital investments (Zeng, 2023). Areas with prestigious universities and research institutions often attract top talent, boosting the chances that startups in these regions will secure venture capital funding due to the abundance of highly skilled human resources (Zeng, 2023; Kézaia & Skalac, 2024). Conversely, startups in regions with less developed ecosystems may face challenges in accessing these resources (Nims, 2023). The impact of location on startup success also extends to regulatory environments. Zeng (2023) emphasizes that sound laws and regulations provide a secure foundation for business operations, ensuring smooth growth by minimizing legal uncertainties and reducing regulatory obstacles. Additionally, cultural fit can influence a startup's operations and success. Bojadjiev, Mileva, Misoska, and Vaneva (2023) demonstrate that startups aligning with local cultural norms are more likely to gain traction in those markets, affecting everything from marketing strategies to product development. Hemmert et al. (2019) further note that variations in entrepreneurship across countries are shaped by market conditions and cultural values, which differ based on the entrepreneurial ecosystem. 14 2.1.4 Team Composition The composition of the founding team is another critical factor in startup success. The literature suggests that the skills, experience, and diversity of the founding team are pivotal in determining a startup's ability to navigate challenges and seize opportunities. Mol (2019) argues that a successful startup team requires more than prior experience and industry-specific skills; shared entrepreneurial passion and a collective strategic vision are equally important. While experience enhances decision-making, alignment in vision and passion drives team performance. Mol (2019) finds that teams with high levels of experience but lacking in passion and vision tend to underperform in areas such as innovation and customer satisfaction, whereas teams with strong alignment in soft skills perform significantly better. D’Acunto, Tate, and Yang (2019) argue that diverse founding teams—those with members from various backgrounds, industries, and skill sets—are more likely to succeed. Diversity enhances creativity and problem-solving abilities, particularly in the early stages of a startup when innovation and adaptability are crucial. The experience of the founding team also significantly impacts a startup's chances of success. Mol (2019) highlights that experienced entrepreneurs are better equipped to make strategic decisions, avoid common pitfalls, and build resilient businesses, as they are more likely to recognize patterns and trends that inform critical business decisions. However, team dynamics can present challenges. Conflicts within the founding team can hinder a startup's progress. Faster Capital (2024) notes that misalignments in vision, strategy, or decision-making can lead to disputes that distract from the startup's objectives and erode team cohesion. Effective communication, clear role definitions, and shared goals are essential for maintaining a productive team dynamic. Access to influential networks and mentors is another critical aspect of team composition. Daradkeha and Mansoora (2023) argue that startups with strong networks can access valuable resources, insights, and opportunities unavailable to less connected teams. Mentorship provides guidance and support that helps startups overcome challenges and accelerate growth (Zeng, 2023). Kabatunzi (2022) adds that founders with a strong personal brand, characterized by leadership, vision, resourcefulness, and resilience, are more likely to lead successful businesses (Mol, 2019; Elsafty, Abadir, & Shaarawy, 2020; Indrianti, Sasmoko, Abdinagoro, & Rahim, 2024). 2.1.5 Business Strategy The effectiveness of a startup's business strategy is often shaped by various external and internal influences. According to Teece, (2018), strategic decisions in startups are significantly 15 influenced by market conditions, technological advancements, and regulatory environments. Startups that can effectively adapt their strategies in response to these influences are better positioned to capitalize on emerging opportunities and mitigate risks. For instance, the ability to pivot—a concept popularized by Ries (2011) in The Lean Startup—allows startups to change their business models or product offerings in response to market feedback, thereby enhancing their chances of success. Moreover, Bradley, Hirt, & Smit (2018) highlight that startups with a clear understanding of their competitive landscape are more likely to develop strategies that differentiate them from competitors, thus securing a competitive edge. Marketing and distribution are pivotal components of a startup's business strategy, directly impacting customer acquisition, retention, and overall market positioning. Research by Chaffey & Ellis-Chadwick (2022) suggests that startups must prioritize digital marketing strategies to reach broader audiences and create stronger brand recognition. The use of data analytics and customer insights enables startups to tailor their marketing efforts, thereby increasing the effectiveness of their campaigns and improving return on investment (ROI). Additionally, startups that invest in omnichannel distribution strategies, integrating both online and offline channels, are more likely to succeed in today's highly competitive markets (Kotler, Kartajaya, & Setiawan., 2017). Furthermore, Nagle & Müller (2018) emphasize the importance of value-based pricing strategies in aligning product offerings with customer perceptions of value, which can enhance profitability and customer satisfaction. In line with this, Ries (2011) argues that a startup's marketing strategy should be closely aligned with its product development process to ensure that the product-market fit is achieved early on, which is essential for sustained success. The ultimate goal of a startup's business strategy is to create value for stakeholders while securing a competitive advantage in the market. According to Porter & Heppelmann (2018), startups can achieve this by leveraging innovative technologies and business models that disrupt traditional industries. The capacity to continuously innovate and adjust to evolving market conditions is vital for sustaining a competitive advantage (Kaniawati, Sukma, & Oktaviani, 2024). Moreover, Teece (2018) argues that startups should focus on building sustainable business models that not only generate immediate profits but also ensure long-term viability. This involves optimizing operational efficiency, managing supply chains effectively, and investing in technology infrastructure to scale the business. For instance, startups that adopt 16 lean operations and agile methodologies are better equipped to respond to market changes and customer needs, thereby enhancing their operational efficiency (Womack & Jones, 2015). In terms of branding and positioning, startups that successfully differentiate themselves from competitors through unique value propositions and strong brand identities are more likely to achieve market dominance (Keller & Swaminathan, 2020). This is particularly important in highly competitive industries where brand loyalty can be a significant driver of growth. 2.2 Predicting The Success of Businesses Using Machine Learning Over the past decade, numerous machine learning models have been developed and applied to the prediction of startup success. These models typically leverage large datasets such as Crunchbase, CB Insights, and other similar repositories to predict whether a startup will succeed in terms of acquisition, repeated funding rounds, IPOs, or profitability. This section reviews eight common machine learning models used for this purpose, with a specific focus on the top three that will be used in this study: logistic regression, random forest and support vector machines (SVM). Each model is discussed in terms of its advantages, disadvantages, and overall performance in the prediction of startup success. 2.2.1 Logistic Regression Logistic Regression is a fundamental classification algorithm used to predict binary outcomes, such as success or failure (James, Hastie, Witten, & Tibshirani, 2021). It operates by modeling the probability of a class through a logistic function, also referred to as the sigmoid function, which produces values ranging from 0 to 1 (Pan, Gao, & Luo, 2018). This model is both simple and interpretable, offering a clear understanding of how features contribute to the predicted probability (James, Hastie, Witten, & Tibshirani, 2021). Its computational efficiency is another advantage, making it a suitable choice for straightforward classification problems. However, logistic regression assumes a linear relationship between the features and the log-odds of the outcome, which may restrict its ability to capture more complex patterns (James, Hastie, Witten, & Tibshirani, 2021). Additionally, it can be sensitive to outliers and may perform poorly if the data does not meet its assumptions. 17 Table 1: Past Research on Predicting Startup Success Using Logistic Regression Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivit y F1 Score Krishna et al. (2016) Crunchbase Acquired Logistic Regression Dellermann, Lipusch, Ebel, Popp, & Leimeister (2017) Crunchbase, Mattermark, and Dealroom Series A funding Logistic Regression not aim of study Pan, Gao, & Luo (2018) Crunchbase M&A or IPO Logistic Regression 0.7254 0.442 Shah & Mcgaugh (2019) Crunchbase Acquired Logistic Regression 0.859 0.76 Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021 Ukrainian Dealroom Repeated Funding rounds Logistic Regression 0.6 0.45 0.486 Żbikowski & Antosiuk (2021) Crunchbase and Web-based information Operating with Series B financing, acquired or IPO Logistic Regression 0.86 0.21 0.33 Bangdiwala, Mehta, Agrawal, & Ghane (2022) Crunchbase IPO or M&A Logistic Regression 0.925 2.2.2 Random Forest Random Forest are an ensemble learning technique that builds multiple decision trees and merges their outputs to produce a final prediction (Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021). By using bootstrapping and feature randomness, Random Forest create diverse trees that collectively improve predictive accuracy and robustness (James, Hastie, Witten, & Tibshirani, 2021). This approach is less likely to overfit compared to individual decision trees and can manage both classification and regression tasks (James, Hastie, Witten, & Tibshirani, 2021; Krishna, Agrawal, & Choudhary, 2016). However, the complexity of Random Forest reduces their interpretability compared to single trees and demands more computational resources and memory (Cholil, et al., 2024). Training can also be slower, especially with a large number of trees. Table 2: Past Research on Predicting Startup Success Using Random Forest 18 Authors Data Source Definition of success Machine Learning Model Accurac y Sensitivity F1 Score Krishna et al. (2016) Crunchbase Acquired Random Forest Classifier Dellermann, Lipusch, Ebel, Popp, & Leimeister (2017) Crunchbase, Mattermark, and Dealroom Series A funding Random Forest Classifier Not aim of the study Pan, Gao, & Luo (2018) Crunchbase M&A or IPO Random Forest Classifier 0.843 0.391 Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia (2019) Crunchbase Acquired, IPO or repeat funding round Random Forest Classifier 0.818 Ünal & Ceasu (2019) Crunchbase Operating, acquired or IPO Random Forest Classifier 0.941 Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2022 Ukrainian Dealroom Repeated Funding rounds Random Forest Classifier 0.57 0.358 0.399 Bangdiwala, Mehta, Agrawal, & Ghane (2022) Crunchbase IPO or M&A Random Forest Classifier 0.9243 Cholil, et al. (2024) Kaggle Acquired Random Forest Classifier 0.8393 2.2.3 Support Vector Machine Support Vector Machines (SVMs) are robust classifiers that identify the hyperplane which optimally divides data into distinct classes (James, Hastie, Witten, & Tibshirani, 2021). The goal is to maximize the margin between these classes. They perform well in high-dimensional spaces and can manage non-linear classification using kernel functions (Żbikowski & Antosiuk, 2021). Additionally, SVMs are resilient to overfitting, especially when there are many dimensions compared to the number of samples (Tomy & Pardede, 2018). Nevertheless, SVMs can be memory-intensive and require careful parameter tuning, such as the choice of kernel and regularization parameters (James, Hastie, Witten, & Tibshirani, 2021). The model’s interpretability is also limited, particularly with complex kernels (Felgueiras, Batista, & Carvalho, 2020). 19 Table 3: Past Research on Predicting Startup Success Using Support Vector Machines Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivity F1 Score Dellermann, Lipusch, Ebel, Popp, & Leimeister (2017) Crunchbase, Mattermark, and Dealroom Series A funding Support Vector Machine (SVM) Not aim of the study Tomy & Pardede (2018) Australian dataset Profitable Support Vector Machine (SVM) 0.7347 0.75 Arroyo, Corea, Jimenez- Diaz, & Recio-Garcia (2019) Crunchbase Acquired, IPO or repeat funding round Support Vector Machine (SVM) 0.817 Felgueiras et al. (2020) Crunchbase Support Vector Machine (SVM) 0.421 Żbikowski & Antosiuk (2021) Crunchbase and Web-based information Operating with Series B financing, acquired or IPO Support Vector Machine (SVM) 0.87 0.2 0.32 Vasquez, Santisteban, & Mauricio (2023) Australian dataset Profitability Support Vector Machine (SVM) 0.97 2.2.4 Gradient Boosting Gradient Boosting is an ensemble technique that constructs models in a sequential manner, with each new model aiming to address the errors of the previous ones (Cholil et al., 2024). By aggregating multiple weak learners, usually decision trees, Gradient Boosting develops a robust predictive model that frequently achieves high accuracy (Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia, 2019). This method is effective at capturing intricate patterns and feature 20 interactions. However, it is computationally intensive and may be susceptible to noisy data and outliers (James, Hastie, Witten, & Tibshirani, 2021). Additionally, Gradient Boosting has a potential risk of overfitting if not carefully tuned, particularly with an excessive number of boosting stages (Żbikowski & Antosiuk, 2021). Table 4: Past Research on Predicting Startup Success Using Gradient Boosting Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivity F1 Score Arroyo, Corea, Jimenez-Diaz, & Recio- Garcia (2019) Crunchbase Acquired, IPO or repeat funding round Gradient Tree Boosting (GTB) 0.822 Ünal & Ceasu (2019) Crunchbase Operating, acquired or IPO Extreme Gradient Boosting 0.945 Corea et al. (2021) Crunchbase + LinkedIn Acquired, IPO or repeat funding round Gradient Boosting Machine Precision: ~0.7 Żbikowski & Antosiuk (2021) Crunchbase and Web- based information Operating with Series B financing, acquired or IPO Extreme Gradient Boosting 0.86 0.17 0.28 Bangdiwala, Mehta, Agrawal, & Ghane (2022) Crunchbase IPO or M&A Gradient Tree Boosting (GTB) 0.9196 Thirupathi, Alhanai, & Ghassemi (2022) Crunchbase IPO or M&A Extreme Gradient Boosting 0.84 Vasquez, Santisteban, & Mauricio (2023) Australian dataset Profitability Gradient Boosting 0.91 Cholil, et al. (2024) Kaggle Acquired Extreme Gradient Boosting 0.881 Light Gradient Boosting Machine 0.881 Gradient Boosting 0.875 21 2.2.5 Neural Networks Neural Networks are composed of layers of interconnected nodes (neurons) that emulate the structure of the human brain (James, Hastie, Witten, & Tibshirani, 2021). They offer significant flexibility and are adept at learning intricate patterns and representations from data through backpropagation (Bangdiwala, Mehta, Agrawal, & Ghane, 2022). Neural Networks are particularly effective with large datasets and varied data types, such as images, text, and time- series (James, Hastie, Witten, & Tibshirani, 2021). They can automatically extract relevant features from raw data, which is a notable advantage. However, they require considerable data and computational power for successful training (Gichohi, 2023). The complexity of Neural Networks often renders them a "black box," making it difficult to interpret how decisions are derived. Table 5: Past Research on Predicting Startup Success Using Neural Networks Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivity F1 Score Dellermann, Lipusch, Ebel, Popp, & Leimeister (2017) Crunchbase, Mattermark, and Dealroom Series A funding Artificial Neural Network (ANN) Not aim of the study Ferrati et al. (2021) Crunchbase, United States and Patents Office and CB insights top investors Acquired or IPO Neural Network 0.93 Bangdiwala, Mehta, Agrawal, & Ghane (2022) Crunchbase IPO or M&A Neural Network 0.9186 Gichohi (2023) Crunchbase Funding rounds, attributes of entrepreneur, M&A Artificial Neural Network (ANN) 0.86 22 2.2.6 Naive Bayes Naive Bayes is a probabilistic classifier grounded in Bayes' theorem, assuming feature independence for simplicity (James, Hastie, Witten, & Tibshirani, 2021). It estimates the probability of a class based on the features and assigns the class with the highest probability (James, Hastie, Witten, & Tibshirani, 2021). This model is straightforward, fast, and works well with small datasets and text classification tasks. It can also manage missing data by disregarding absent features during training (Felgueiras, Batista, & Carvalho, 2020). However, the assumption of feature independence may be unrealistic, potentially leading to suboptimal performance if the features are dependent (James, Hastie, Witten, & Tibshirani, 2021). Additionally, Naive Bayes has limited capacity to model complex relationships between features. Table 6: Past Research on Predicting Startup Success Using Naive Bayes Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivity F1 Score Krishna et al. (2016) Crunchbase Acquired Naive Bayes Dellermann, Lipusch, Ebel, Popp, & Leimeister (2017) Crunchbase, Mattermark, and Dealroom Series A funding Naive Bayes Not aim of the study Tomy & Pardede (2018) Australian dataset Profitable Naive Bayes 0.7755 0.805 Felgueiras et al. (2020) Crunchbase Naive Bayes 23 2.2.7 Decision Trees Decision Trees partition data into subsets based on feature values, forming a tree-like structure of decisions (James, Hastie, Witten, & Tibshirani, 2021). Each internal node represents a test of a feature, each branch signifies an outcome, and each leaf node denotes a class label or regression result (Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021). They are straightforward to understand and visualize, and they effectively handle both numerical and categorical data. Additionally, Decision Trees offer insights into feature importance (Bangdiwala, Mehta, Agrawal, & Ghane, 2022). However, they are susceptible to overfitting, especially with deep trees and complex datasets (James, Hastie, Witten, & Tibshirani, 2021). Decision Trees can also be unstable, as minor changes in the data may result in different tree structures. Table 7: Past Research on Predicting Startup Success Using Decision Trees Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivity F1 Score Krishna et al. (2016) Crunchbase Acquired ADTrees Precision: 0.88-0.97 Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia (2019) Crunchbase Acquired, IPO or repeat funding round Decision Trees 0.746 Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2023 Ukrainian Dealroom Repeated Funding rounds Decision Trees (DT) 0.612 0.347 0.523 Bangdiwala, Mehta, Agrawal, & Ghane (2022) Crunchbase IPO or M&A Decision Trees (DT) 0.9243 24 2.2.8 K-Nearest Neighbours K-Nearest Neighbors (KNN) is an instance-based learning algorithm that classifies a sample based on the majority class among its k nearest neighbors in the feature space (James, Hastie, Witten, & Tibshirani, 2021). It is easy to understand and implement, requiring no training phase as all computations are performed during prediction. KNN can be applied to both classification and regression tasks (James, Hastie, Witten, & Tibshirani, 2021). However, it can be computationally intensive and slow, particularly with large datasets due to the need for distance calculations (James, Hastie, Witten, & Tibshirani, 2021). Additionally, KNN is sensitive to noisy data and irrelevant features, and its performance is greatly influenced by the choice of the parameter k and the distance metric (James, Hastie, Witten, & Tibshirani, 2021). Table 8: Past Research on Predicting Startup Success Using K-Nearest Neighbours Authors Data Source Definition of success Machine Learning Model Accuracy Sensitivity F1 Score Pan, Gao, & Luo (2018) Crunchbase M&A or IPO K-Nearest Neighbors 0.7333 0.464 Tomy & Pardede (2018) Australian dataset Profitable K-Nearest Neighbors 0.7143 0.777 In this research on predicting early-stage startup success, the selection of Logistic Regression, Random Forest and Support Vector Machines (SVM) as the top three models was driven by their complementary strengths and the ability to address each other’s limitations. Logistic Regression offers simplicity and interpretability, essential for understanding the impact of different features on predictions. Random Forest deliver robustness and high accuracy through ensemble learning, reducing the risk of overfitting seen in individual decision trees. SVMs are adept at managing high-dimensional data and complex relationships via kernel functions, making them effective for capturing intricate patterns The downsides of one model are effectively complemented by the strengths of the others; for example, the linear assumptions of Logistic Regression are offset by the non-linear capabilities of SVMs. Similarly, the computational intensity of Support Vector Machines is balanced by the efficiency of Logistic Regression, while the robustness of Random Forest can counterbalance the interpretability challenges posed by SVMs. This combination of models 25 allows for a comprehensive approach to predicting startup success, leveraging the unique strengths of each while addressing their individual limitations. 2.3 The Relationship Between Startup Success Factors And Machine Learning The application of machine learning models in the private capital space is rapidly gaining traction (Ferrati & Muffatto, 2020; Gerdin, 2022). These models are increasingly being used to predict startup success, forecast the timing of future funding rounds, and assess the likelihood of an investor committing to a specific company (Ferrati & Muffatto, 2020). Furthermore, machine learning models assist equity investors in their decision-making by categorizing companies into industries. This classification helps identify similar firms and aligns investors with companies within their sectors of interest (Ferrati & Muffatto, 2020). This emerging trend, often referred to as "data-driven venture capital," is praised for its efficiency, effectiveness, and inclusivity compared to traditional venture capital practices (Data Driven VC, 2024). Traditional venture capital process, as described by Gompers et al. (2020), follows a funnel-like process where the number of startups progressing to the next stage decreases exponentially, with only one out of every 101 deals considered ultimately closing (Gompers, Gornall, Kaplan, & Strebulaev, 2020). This approach is resource-intensive, requiring significant time, personnel, and financial investment, and may prove ineffective if a venture capital (VC) firm overlooks outlier opportunities (Data Driven VC, 2024). Additionally, traditional venture capital is prone to bias, resulting in a disproportionate allocation of capital (Data Driven VC, 2024). Globally, disparities in funding are evident, with venture capital in North America being 52 times greater than in the Latin American (LATAM) region (Data Driven VC, 2024). In Africa, most venture capital firms are led by individuals of American and European descent, resulting in a preference for funding white founders, often sidelining African entrepreneurs with equally fund-worthy startups (Obonyo & Zeisberger, 2024; Turman, 2023; The Guardian, 2020; Ganesan, Mahalingam, Nathan, Ware, & Weinberg, 2023). As noted, “Companies led by white males or Africans with strong ‘Western’ backgrounds attract more VC investment than comparable companies in the African startup ecosystem, as investors undervalue the need and advantages of local knowledge” (Turman, 2023). In response to the challenges and limitations inherent in the traditional venture capital process, data-driven venture capital introduces three significant changes (Gerdin, 2022). First, VC firms can source deals on a larger scale (Weibl & Hess, 2019). Second, the quality of scrutinized 26 deals is enhanced, as each company is scored and evaluated before being presented to the investment team (Corea, Bertinetti, & Cervellati, 2021; Weibl & Hess, 2019). Third, by automating the sourcing process, the investment team can focus more on value-added activities (Weibl & Hess, 2019). These changes would significantly address the downsides of the traditional venture capital process. Many researchers utilize Crunchbase data to train machine learning models for predicting startup success (Krishna, Agrawal, & Choudhary, 2016; Pan, Gao, & Luo, 2018; Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia, 2019; Ünal & Ceasu, 2019; Żbikowski & Antosiuk, 2021; Bangdiwala, Mehta, Agrawal, & Ghane, 2022; Felgueiras, Batista, & Carvalho, 2020; Thirupathi, Alhanai, & Ghassemi, 2022; Gichohi, 2023). However, the definition of startup success varies among stakeholders. Piskunova et al. (2021) outline different success metrics, including investment success (securing additional financing), customer success (achieving target user growth), market success (reaching sales targets or market share), adaptive success (surviving beyond five years), and financial success (achieving an IPO or acquisition, allowing founders and investors to exit and monetize their investments Financial success aligns with the "classic understanding" of startup success (Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021). It is widely recognized that the key milestone marking a venture-backed company as financially successful is the exit event (Ferrati & Muffatto, 2020). A venture-backed company can achieve an exit through two primary strategies: by conducting an Initial Public Offering (IPO) or by being acquired by a larger company through mergers and acquisitions (M&A). Crunchbase's datasets are extensive, including seventeen .csv files that cover five major areas (Ferrati & Muffatto, 2020). These datasets provide essential information, including company status (operating, closed, acquired, or IPO), total funding amounts, most recent funding dates, employee counts, and geographical details, making them highly valuable for machine learning classification models. 27 Figure 1: Relationships of the Crunchbase's datasets In reviewing the literature, 14 out of 20 studies used the company status variable as the dependent variable for predicting financial success, with the remaining studies focusing on repeat funding and profitability as alternative success metrics. With a well-constructed machine learning model, access to reliable and sufficient data, and a clearly defined target variable, it is possible to predict the success of any startup with greater accuracy. 2.4 Gaps Found In The Literature In the realm of predicting startup success, existing literature presents several critical gaps that undermine the comprehensiveness and accuracy of current models. This section highlights these gaps, underscoring the need for more nuanced approaches that incorporate varied datasets, sectoral diversity, geographical representation, and larger, more robust datasets, especially in the African context. Most studies in this area rely on cross-sectional data, which captures information about startups at a single point in time. This approach fails to account for the dynamic nature of startups, whose trajectories and outcomes evolve over time. The lack of longitudinal data, which tracks startups over extended periods, limits our understanding of how and why startups fail. Incorporating panel data that captures growth metrics, such as changes in employee count and funding rates, could significantly enhance the accuracy of predictive models by offering insights into the triggers of startup success or failure over time. The scarcity of longitudinal 28 studies tracking startup performance is a significant gap, as it overlooks the temporal dimensions critical to understanding the evolution of startups. Much of the research on startup success is centered around regions with well-established venture capital ecosystems, such as North America and Europe. There is a notable scarcity of studies focusing on Africa, where the venture capital scene is still emerging. The literature available often excludes African data points since they have zero or near-zero variance, as noted by Ünal & Ceasu (2019). This exclusion could suggest that African data points may act as outliers on the downward side when compared to those from other continents, further skewing the analysis. The underrepresentation of African startups in the literature is a significant gap, as it ignores the unique challenges and opportunities within this rapidly growing market. Furthermore, the nascent use of machine learning for predicting startup success in Africa exacerbates this gap, as existing models may not be well-suited to the African context. Another critical gap in the literature is the potential misrepresentation of data. For instance, AVCA (2024) suggests that the global mean sizes of funding rounds portray Africa as being on par with other continents. However, this may be misleading, as Africa has fewer companies, and the mean could create the illusion of parity with more developed markets. This misrepresentation could skew the understanding of Africa's startup ecosystem, leading to inaccurate conclusions and predictions. As of 2020, Crunchbase data indicated that only 4.4% of companies represented were closed businesses, despite a high known failure rate for startups (Ferrati & Muffatto, 2020). This discrepancy may be due to profiles being deleted upon failure, leading to an incomplete picture of startup success and failure. This underreporting of failures is a significant gap, as it prevents a full understanding of the factors contributing to startup demise, which is crucial for developing accurate predictive models. Beyond these points, the literature also fails to adequately address the differences in different entrepreneurial ecosystems. The one-size-fits-all approach often seen in existing models does not account for these variables, further limiting the applicability of the findings to diverse contexts, particularly in Africa. By addressing these gaps, future research can develop more accurate and context-specific models for predicting startup success, particularly in underrepresented regions and sectors. 29 Although this study relies on cross-sectional data, it will deepen its analysis by incorporating a comprehensive set of variables that capture a wide range of factors influencing startup success. To address potential data misrepresentation, the study will conduct a rigorous analysis that accounts for the unique economic and venture capital dynamics in Africa. Instead of relying solely on average funding sizes, the study will examine the distribution of funding amounts, the variance between startups, and the implications of these factors on the perceived success of African startups. Even within the cross-sectional framework, the study will include data on both successful and failed startups to address the gap of underreported failures and survivorship bias. By ensuring the dataset includes startups that did not succeed, the study will analyze the factors contributing to failure, providing a more balanced and realistic model of startup success. This approach will help develop predictive models that are more robust and reflective of the true dynamics within the startup ecosystem. 2.5 Conceptual Framework This conceptual framework is designed to explore the relationship between key success factors and the eventual success of startups, utilizing supervised machine learning models. It identifies critical variables, such as company status, financial metrics, industry, headquarters location, and founder details, as the primary inputs for predicting startup success. The framework demonstrates how these factors are analyzed through machine learning techniques, including Logistic Regression, Random Forest, and Support Vector Machines, to predict outcomes such as repeated funding, acquisition, IPO, and sustained operations beyond five years. This approach aims to enhance the understanding of startup success and improve investment decision-making processes by leveraging data-driven insights. 30 Figure 2: Conceptual Framework. Success Factors •Company status •Closed •Operating •Acquired •IPO •Financial factors •Last funding date •Last funding type •Last funding amount •Total funds raised •Headquarters location •Team Supervised Machine Learning Models •Logistic Regression •Random Forest •Support Vector Machine Startup Success •Repeated Funding (³ 3) •Acquired •IPO •Operated for more than 5 years 31 Chapter 3: Methodology 3.1 Introduction This section introduces the methodology employed in the study, detailing the research design, population and sampling methods, data collection, and analysis procedures. The study uses a purely quantitative approach, utilizing machine learning to analyze the success of African startups. 3.2 Research Design The research employed a quantitative research design to analyze the success of African startups using machine learning approaches. This design was chosen for its ability to manage large datasets while integrating numerical analysis with deeper contextual insights. By utilizing secondary data from Crunchbase, which includes startups from various continents with different funding rounds, the study focused on a comprehensive sample of startups actively operating within the African market. Crunchbase is a credible database for African data points due to its comprehensive coverage of global startups, including those in Africa, and its focus on emerging markets. The platform's data is regularly updated and verified daily, ensuring accuracy, and it provides detailed information on funding rounds, which is crucial for understanding the financial landscape of African startups. Widely recognized and used by investors and researchers, Crunchbase offers reliable and structured data that is well-suited for integration into machine learning models, making it a valuable resource for analyzing the unique dynamics of the African startup ecosystem. This approach provided detailed insights into the broader startup ecosystem in Africa. The cross-sectional nature of the data allowed for a snapshot analysis of the startups at a specific point in time. 3.3 Population And Sampling The population for this study comprised startups listed on Crunchbase globally, representing various continents and stages of funding. The sample selected includes 44,831 startups operating in the African market as of August 31, 2024, across all funding stages. This broad inclusion allows for a comprehensive analysis, incorporating quantitative metrics. Of these, 28,851 startups, founded between 2000 and 2024, will be used for exploratory data analysis. The representativeness of the sample was ensured by including a wide range of startups from various sectors and regions across Africa. This approach provided a comprehensive view of the startup landscape within the African market, minimizing sampling bias and enhancing the generalizability of the research findings. 32 3.4 Data Collection The data collected for this study were both quantitative and qualitative in nature, drawn from multiple datasets provided by Crunchbase, a comprehensive platform that aggregates information on organizations, investors, and related entities. The quantitative data encompassed numerical information such as financial metrics, funding rounds, valuations, investment amounts, and the number of investors involved. Qualitative data provided deeper contextual insights, including details on the startups' operating status, industry classifications, investor profiles, team structures, geographical locations, and significant events such as acquisitions and IPOs. The datasets utilized in this research included: 1. Organization: This dataset provided detailed information on each startup, including its name, country of operation, industry, status, and financial metrics such as total funding received. 2. Funding Rounds: This dataset detailed the various funding rounds that startup underwent, including the amount raised, the type of funding, and the investors involved. It also included temporal data such as the date of funding announcements. 3. Investors: This dataset provided profiles of investors, including their investment history, types, and geographical focus. 4. Acquisitions: Data on acquisitions were used to track exit events, detailing both the acquiring and acquired entities, as well as the transaction value. 5. IPOs: Information on startups that went public, including their IPO dates, share prices, and market valuations. 6. Events: This dataset captured significant events in the lifecycle of the startups, such as product launches, partnerships, and other notable occurrences. Each dataset is joined using unique identifiers such as UUIDs, allowing for a comprehensive analysis across multiple dimensions. 3.4.1 Data Collection For Objective 1 For the first objective, identifying and analyzing key success factors, the data was meticulously extracted from the aforementioned datasets. The focus was on gathering information that could elucidate the critical elements influencing the success of early-stage African startups. This involved selecting key variables such as the total funding amount, number of funding rounds, operating status and business age. 33 3.4.2 Data Collection For Objective 2 For the second objective—developing a supervised machine learning model to predict startup success—the data collected was tailored to feed into the models’ training processes. This involved preparing a dataset that included both the predictor variables (such as financial metrics, operating status and business age) and the target variable, the success or failure of the startup, as indicated by key outcomes like successful exits (acquisitions or IPOs), repeated funding or ongoing operations for more than 5 years. 3.5 Data Analysis The analysis involved a systematic approach to identifying success factors and developing a predictive model based on the data collected. 3.5.1 Identifying Success Factors To achieve the first objective of identifying critical success factors for early-stage African startups, a thorough and methodical approach was employed, combining both exploratory data analysis (EDA) and advanced statistical techniques. 3.5.1.1 Exploratory Data Analysis The initial phase involved conducting an EDA to understand the underlying patterns, distributions, and relationships within the dataset. This step was crucial for uncovering any potential correlations or trends that could influence startup success. The EDA process included: 1. Descriptive Statistics: Calculating measures such as mean, median, standard deviation, and interquartile ranges for independent variables. This provided a summary view of the data, helping to identify any outliers or anomalies. 2. Correlation Analysis: Evaluating the pairwise correlations between different variables, such as the relationship between the amount of funding received and the likelihood of success. This helped in pinpointing variables that have a strong linear relationship with startup success. 3. Visualization: Utilizing visual tools like histograms, box plots, and scatter plots to visually inspect the data. Heatmaps were also used to display the correlation matrix, making it easier to identify significant relationships at a glance. 4. Segment Analysis: Breaking down the dataset into different segments based on categorical variables such as geographical region. This segmentation allowed for a more granular analysis, revealing how specific factors might contribute to success in different contexts. 34 3.5.2 Model Selection To achieve the study's second objective, several supervised learning algorithms were considered, including Logistic Regression, Random Forest and Support Vector Machines (James, Hastie, Witten, & Tibshirani, 2021). These algorithms were selected for their proven effectiveness in handling classification tasks and their ability to manage complex relationships between predictor variables (Żbikowski & Antosiuk, 2021; Krishna, Agrawal, & Choudhary, 2016; Arroyo, Corea, Jimenez-Diaz, & Recio-Garcia, 2019; Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021; Pan, Gao, & Luo, 2018). A comprehensive comparative analysis of all these algorithms will be conducted, and the final model will be chosen based on performance metrics such as accuracy, precision, recall, and the F1 score (Pan, Gao, & Luo, 2018; Piskunova, Ligonenko, Klochko, Frolova, & Bilyk, 2021). Additionally, the selection process will consider each model's interpretability and its ability to generalize to new data. Following the guidelines from the textbook Introduction to Statistical Learning with Applications in R, each model will be implemented and evaluated in detail, and the best-performing model will be selected for the study (James, Hastie, Witten, & Tibshirani, 2021). 3.5.2.1 Logistic Regression Logistic regression is a fundamental classification technique used in machine learning, particularly when the dependent variable is categorical (Bai & Zhao, 2021). It is a linear model that predicts the probability that a given input belongs to a particular class. The model uses the logistic function, also known as the sigmoid function, to map the output of a linear combination of input features to a probability. Mathematically, the logistic function is defined as ℎ!(𝑥) = " "#$!"#$ ,where 𝜃 represents the model parameters (weights) and 𝑥 denotes the input feature vector. The output of this function, ℎ!(𝑥) is a value between 0 and 1, which can be interpreted as the probability of the input belonging to the positive class (James, Hastie, Witten, & Tibshirani, 2021). To make a prediction, logistic regression applies a decision boundary at 0.5. If the predicted probability ℎ!(𝑥)is greater than or equal to 0.5, the model predicts the input belongs to the positive class (label 1); otherwise, it predicts the negative class (label 0) (James, Hastie, Witten, & Tibshirani, 2021). The model’s parameters are optimized by minimizing the cost function, which is defined as the binary cross-entropy or log loss: 35 𝐽(𝜃) = − 1 𝑚,-𝑦(&) log 2ℎ!3𝑥(&)45 + 31 − 𝑦(&)4 log 21 − ℎ!3𝑥(&)457 ( &)" Here, 𝑚 is the number of training examples, 𝑦(&) is the actual label for the 𝑖 -th example, and ℎ!3𝑥(&)4 is the predicted probability for that example (James, Hastie, Witten, & Tibshirani, 2021). The parameters 𝜃 are iteratively updated using gradient descent to minimize the cost function. The update rule for gradient descent is given by 𝜃* ≔ 𝜃* − 𝛼 + +!% 𝐽(𝜃), 𝑤ℎ𝑒𝑟𝑒 𝛼 is the learning rate and 𝑗 indexes the parameters. This process continues until the cost function converges to a minimum, at which point the model is considered trained and ready to make predictions on new data (James, Hastie, Witten, & Tibshirani, 2021). 3.5.2.2 Random Forest Random Forest are a powerful ensemble learning method that builds multiple decision trees during training and outputs the mode of the classes (in classification) or the mean prediction (in regression) of the individual trees (Bai & Zhao, 2021). The basic idea behind random Forest is to create a "forest" of decision trees, where each tree is trained on a random subset of the data, and the final prediction is made by aggregating the predictions of all the trees. This process helps to reduce the variance and improve the overall accuracy of the model. Each decision tree in a random forest is built using a process called bagging, or bootstrap aggregating, where a random subsample of the training data is drawn with replacement (James, Hastie, Witten, & Tibshirani, 2021). This means that some training examples may appear multiple times in the same tree, while others may not appear at all. Additionally, at each split in the tree, a random subset of features is considered for determining the best split, which introduces further randomness and diversity among the trees. The splitting criterion used in classification trees is often the Gini impurity, defined as Gini(𝐷) = 1 − ∑ 𝑝,-. ,)" , 𝑤ℎ𝑒𝑟𝑒 𝑝, is the proportion of instances of class 𝑘 in the dataset 𝐷 , and 𝐾 is the number of classes (James, Hastie, Witten, & Tibshirani, 2021). The goal is to select the split that results in the highest reduction in impurity, thereby creating the most homogeneous child nodes (James, Hastie, Witten, & Tibshirani, 2021). Once all the trees are built, the random forest makes a prediction by combining the outputs of the individual trees. For classification tasks, the final prediction is made by majority voting, 36 where the class with the most votes is selected. For regression tasks, the final prediction is the average of the predictions from all the trees. The ensemble nature of random Forest allows them to achieve high accuracy and robustness, particularly in situations where individual decision trees might be overfit to the training data (James, Hastie, Witten, & Tibshirani, 2021). 3.5.2.3 Support Vector Machine Support vector machines (SVMs) are a class of supervised learning algorithms that are particularly well-suited for classification tasks in high-dimensional spaces (Bai & Zhao, 2021). The core idea behind SVMs is to find the optimal hyperplane that separates the data points of different classes with the maximum margin (James, Hastie, Witten, & Tibshirani, 2021). The margin is defined as the distance between the hyperplane and the nearest data points from either class, which are known as support vectors. SVM seeks to maximize this margin while correctly classifying the training data. The decision function for an SVM is given by 𝑓(𝑥) = 𝑤/𝑥 + 𝑏, where 𝑤 is the weight vector and 𝑏 is the bias term. The optimization problem that SVMs solve is to minimize the norm of the weight vector |𝑤|-, subject to the constraint that all data points are correctly classified with a margin of at least 1 (James, Hastie, Witten, & Tibshirani, 2021). Mathematically, this can be expressed as min 0,2 " - |𝑤|-, subject to 𝑦(&)3𝑤/𝑥(&) + 𝑏4 ≥ 1 for all 𝑖 , where 𝑦(&) is the label of the 𝑖 -th data point and 𝑥(&) is its feature vector (James, Hastie, Witten, & Tibshirani, 2021). In practice, data is often not perfectly separable, so SVMs introduce slack variables 𝜉& to allow for some misclassification (James, Hastie, Witten, & Tibshirani, 2021). This leads to the soft margin SVM, where the optimization problem becomes min 0,2,3 " - |𝑤|- + 𝐶 ∑ 𝜉&( &)" , subject to 𝑦(&)3𝑤/𝑥(&) + 𝑏4 ≥ 1 − 𝜉&and 𝜉& ≥ 0 for all 𝑖. The parameter 𝐶 controls the trade-off between maximizing the margin and minimizing the classification error (James, Hastie, Witten, & Tibshirani, 2021). For non-linearly separable data, SVMs can be extended using the kernel trick, which maps the input features into a higher-dimensional space where a linear separator can be found (James, Hastie, Witten, & Tibshirani, 2021). Common kernel functions include the linear kernel 𝐾(𝑥, 𝑥4) = 𝑥/𝑥4, the polynomial kernel 𝐾(𝑥, 𝑥4) = (𝑥/𝑥4 + 𝑐)5, and the radial basis function (RBF) kernel 𝐾(𝑥, 𝑥4) = exp(−𝛾|𝑥 − 𝑥4|-). The optimization problem can also be 37 reformulated in its dual form, where the solution depends only on the support vectors. In the dual formulation, the objective is to maximize ∑ 𝛼&( &)" − " - ∑ ∑ 𝛼&𝛼*𝑦(&)𝑦(*)𝐾3𝑥(&), 𝑥(*)4( *)" ( &)" , subject to 0 ≤ 𝛼& ≤ 𝐶 and ∑ 𝛼&𝑦(&)( &)" = 0, where 𝛼& are the Lagrange multipliers. This formulation allows SVMs to efficiently handle large datasets and complex decision boundaries (James, Hastie, Witten, & Tibshirani, 2021). 3.5.3 Model Training And Validation The dataset was divided into training and test sets, typically using an 70-30 split. The training set was used to build the model, while the test set provided an unbiased evaluation of the model’s performance. To enhance the model's predictive power, hyperparameter tuning was performed using grid search and cross-validation techniques (Pan, Gao, & Luo, 2018). These methods helped in finding the optimal parameters that improve model accuracy while avoiding overfitting. Regularization techniques were also employed to further mitigate overfitting, ensuring that the model performed well on unseen data (James, Hastie, Witten, & Tibshirani, 2021). 3.5.4. Model Evaluation Once trained, the model was evaluated using a range of metrics. Accuracy, precision, recall, and the F1 score provided a comprehensive view of the model's performance (Pan, Gao, & Luo, 2018). These metrics were interpreted in the context of the African startup ecosystem, with a focus on identifying the key factors that contribute to business success. The insights gained from this analysis not only validated the model but also offered valuable guidance for entrepreneurs and investors (James, Hastie, Witten, & Tibshirani, 2021). 38 3.5.4.1 Evaluation Metrics 1. Acc