Welcome to the "Completed Projects" section. Here, I proudly display projects that have been developed either during my training or as a result of my curiosity about specific subjects. This space is dedicated to showcasing my work specifically in the realms of data visualisation & dashboards and data analysis & machine learning. Each project exemplifies my technical skills, innovative thinking, and my continual journey towards mastering these fields. Importantly, all projects presented here utilise public datasets, demonstrating my ability to extract meaningful insights and create compelling visual narratives from widely accessible information.
Please note, to uphold strict privacy standards and ensure our clients' confidentiality, this section deliberately excludes any real client projects. My commitment to privacy is unwavering, guaranteeing that every project is managed with the highest level of discretion and respect for confidentiality. I invite you to explore this section to see the potential and possibilities we bring to every challenge, illustrated through our passion for data visualisation, dashboards, data analysis, and machine learning.
Credit Granting Risk Prediction Model for Financial Decision Making
This project showcases the development of a machine learning model aimed at predicting whether a financial institution should extend credit to new clients. It serves as a vital tool for financial decision-makers seeking to mitigate risks and optimize lending practices. In order to complete this project it was used the RStudio IDE based on the R language and many packages necessary to conclude all the tasks. These tasks included data munging, exploratory analysis, feature engineering, feature selection, and the development and evaluation of machine learning models. Various algorithms and predictors were tested to identify the most accurate models.
Undertaken as part of my coursework in "Big Data Analytics with R and Azure Machine Learning" at the Data Science Academy (www.datascienceacademy.com .br), this project represented an opportunity to apply newfound skills and knowledge in real-world scenarios. While instructors provided a solution to the problem at hand, my aim was to refine and innovate upon existing methodologies, leveraging advanced techniques in data manipulation and model selection.
The primary objective of this project was to develop an improved credit risk prediction model tailored to the specific needs of financial institutions. Through extensive data munging, exploratory analysis, and feature engineering, I sought to enhance the accuracy and reliability of the predictive model. Key algorithms, including Random Forest, Support Vector Machines, and Naive Bayes, were explored to identify the most effective approach.
In this Credit Grant Risk Prediction Project, I successfully applied machine learning techniques to tackle real-world challenges faced by financial institutions. I developed and refined a predictive model for assessing credit grant risk, achieving an accuracy of 88.7%, with a precision of 87.7% and a recall of 85.5%.
You can access the final report, the dataset and all files used in this project in my GitHub repository by clicking here.
Comprehensive Exploration of Sanitation Conditions Across Brazilian States
This project delves into an in-depth exploratory analysis of sanitation data across the diverse states of Brazil. The primary aim was to unveil insights into the relationship between the population served with water supply and sanitary sewage, while also considering the demographic variations across states.
The overarching goal of this project was to conduct a detailed examination of sanitation conditions in Brazil, leveraging data visualisation techniques to uncover patterns and disparities. A key focus was the creation of a scatter plot, illustrating the correlation between the proportion of the population served with water supply and sanitary sewage, with states represented by different colours and the size of dots reflecting population size. Additionally, supplementary analyses, predominantly graphical in nature, were conducted to provide further insights.
The exploratory analysis showed significant differences in sanitation across Brazilian states, with a scatter plot and additional graphs revealing patterns in water supply and sewage access discrepancies.
In conclusion, this project offers a comprehensive exploration of sanitation conditions in Brazil, utilising advanced data visualisation techniques to uncover key insights. By providing detailed explanations and transparent methodologies, it serves as a valuable resource for researchers, policymakers, and stakeholders invested in improving sanitation infrastructure and addressing disparities across regions. Through continued analysis and data-driven interventions, strides can be made towards achieving equitable access to sanitation facilities for all residents of Brazil.
Click here to access the final report and all files, including the dataset, in my GitHub repository, or click here to explore an interactive dashboard in Tableau Public, where you can perform analysis by region, state, or volume.
Financial Analysis of Hospital Costs Using SQL and R
This project harnesses the power of SQL and R to conduct an analysis of hospital cost data, obtained from a comprehensive national survey conducted by the US Agency for Healthcare. Specifically focusing on inpatient records for the paediatric age group (0 to 17 years) in Wisconsin, this research implemented a two-step analytical methodology to delve into a few questions concerning hospital costs and patient demographics.
TThe primary objective of this study was to demonstrate the seamless integration of SQL and R for executing descriptive statistical analyses, crafting simple hypothesis tests, and developing linear models. This approach not only facilitates a good understanding of healthcare expenditure but also identifies key drivers influencing these costs.
In the initial phase, SQL was employed to conduct an exploratory analysis, establishing a solid foundation for subsequent investigations. This was followed by a more advanced statistical exploration using R, where tools such as ANOVA tests and linear regression models were utilised to thoroughly examine patterns of expenditure, model hospitalisation costs, and evaluate the impact of various demographic factors on hospital expenses.
This comprehensive project not only demonstrated the effective utilisation of SQL and R in conducting a financial analysis within the healthcare sector but also addressed pivotal business questions. By employing advanced statistical analyses and predictive modelling, the study delivered significant insights into the factors that drive hospital costs, thus fostering informed decision-making in healthcare management and policy formulation. Using a similar approach with a larger and more robust dataset, stakeholders can better strategise resource allocation and cost management to improve healthcare outcomes.
Access the final report, the dataset and all files used, available in my GitHub repository, by clicking here here.
Exploratory analysis of Covid19 cases and deaths in Brazil
In this comprehensive project, an in-depth exploration and analysis of COVID-19 cases and deaths across Brazil from early 2020 to early 2022 were conducted using R. The main objective was to decipher the extensive data collected during the pandemic to uncover crucial patterns and insights. By focusing exclusively on R for data preprocessing and visual analysis, the project meticulously addressed the complex dynamics of the pandemic’s impact across diverse Brazilian regions.
The dataset, sourced from the official Brazilian health database, provided a robust foundation for this study. Initial steps of data preprocessing involved rigorous cleaning and preparation of the data. Key variables were selected, and any discrepancies in the data were rectified to ensure accuracy in the subsequent exploratory data analysis phase.
The exploratory phase delved deep into statistical techniques and visualisations to uncover temporal and regional trends in COVID-19 case and death rates. This phase was crucial in identifying significant regional differences in the impact of the virus and the effectiveness of public health responses. It also highlighted critical periods of infection spread and shifts in mortality rates, providing essential insights that could guide future public health strategies.
Conclusively, the project not only highlighted the stark regional disparities in the effects of COVID-19 but also emphasised the need for tailored public health responses. These findings are invaluable for policymakers and health professionals as they prepare for future public health crises. The comprehensive analysis, coupled with a detailed dictionary of variables and an innovative use of the R programming language, makes this work a significant reference point for ongoing and future epidemiological studies.
For more information about this project, click here and access my GitHub repository, where you will find the final report, all related files and the dataset used.