The project addressed the prevalent problem of patient no-shows in health care services. Patient no-shows lead to inefficient resources allocation and limited access to care. The dataset comprised 110k appointments from public healthcare institutions in a Brazilian city. The appointments occurred across 6 weeks in 2016.

The following tasks were undertaken to improve the data:

  • - Data Cleaning
  • - Feature Engineering
  • - Exploratory Data Analysis


The data was fitted to various types of supervised learning algorithms. The algorithms were evaluated using Binary Evaluators (Accuracy, Precision, Recall, and F1 Score). The algorithm that produced the best results was the Random Forest classifier (70% accuracy score).

The results obtained represent a starting point for hospitals and other healthcare services to develop an efficient patient no-show classifier.

Project Info


  • Client: Data Science with IBM
  • Languages: Python
  • Platform: Jupyter Notebook
  • Completed: Sep 2021