Diabetes Detector Web App

Machine Learning – a concept that most people struggle to comprehend and most of them who do comprehend would have many different perceptions about it. I myself have been working on Machine Learning and its algorithms for about 6 months and after some rigorous training in Machine Learning and its subsidiary subjects such as Statistics and Linear Algebra, I’ve finally started to get a hold of it (even though a lot of work is yet to be done).

We launched our first native app -SkillChem- yesterday and today here we are with our first web app. This app is a diabetes detector and detects whether a particular patient has diabetes depending upon various parameters such as glucose, insulin, diabetes pedigree function etc. According to the creators of the dataset, the diabetes pedigree function scores likelihood of diabetes based upon family history of the disease. This web app is based on Machine Learning and works on the Random Forest Classifier algorithm. 

We have deployed the app using Streamlit. It is an open source framework that allows data science teams to deploy web apps fairly easily. It’s one of the best hosting services I’ve used and it’s great for quick and easy deployment of web apps. The app is coded in python. 

 The web app uses interactive visual and graphical interpretations to display the outcome and compare the input parameters given by the user. We trained the dataset using 20% of it as our test size. The sidebar sliders help in changing the values of the parametres for determination of the result. The graphs compare the values of the patient with others ( both with diabetic and non-diabetic patients). It also provides the accuracy of the result. 

For those who want to learn about Machine Learning, its applications and algorithms, Professor Andrew Ng has a great course for it on coursera. He explains about the multiple machine learning algorithms and applications in depth. It’s a good course for people of all ages looking to expand their knowledge in Machine Learning. The course is completely free to study. The link for the course is : https://www.coursera.org/learn/machine-learning

This web app was a learning curve for us and has improved our knowledge about Machine learning significantly. We hope to deploy more apps in the future and share them with you. Feel free to add onto this project and don’t hesitate to drop by any suggestions. The link for the diabetes detector web app is as follows : 

https://share.streamlit.io/pranav-coder2005/diabetes_detector/main/diabetes_detector_app.py

About the dataset : This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

 

Disclaimer: This is just a learning project based on one particular dataset so please do not depend on it to actually know if you have diabetes or not. It might still be a false positive or false negative. A doctor is still the best fit for the determination of such diseases.

November is the diabetes awareness month and world diabetes day is observed on 14 November of each year. Diabetes awareness groups spread awareness about the disease throughout the month through several programs, charitable events, campaigns etc. The purpose of it is to spread awareness about this disease all around the world and to fight it together. World Diabetes Day was created in 1991 by International Diabetes Federation and the World Health Organization in response to growing concerns about the ever increasing health threat posed by diabetes. ‘The Nurse and Diabetes” was the theme was World Diabetes Day in 2020. The campaign aims to raise awareness around the crucial role that nurses play in supporting people living with diabetes. The blue circle is the universal symbol for diabetes. It was launched in 2006 to give diabetes a common identity. The symbol aims to support all existing efforts to raise awareness about diabetesLet’s show our support for diabetes awareness and help many patients around the world. 

 

Explanation of the Code and how you can make this yourself !

Here, I am going to go through the code in a very concise and simple manner so that people with even minimal experience in programming or data science can follow along and benefit it. This app has been coded in python and has been deployed on streamlit as mentioned before. I’ve also used the Random Forest Classifier Algorithm for this particular problem. 

Alright so lets finally get started. First up I’ve imported the python packages / libraries that I’ve used for this app. More information for them is available on the project template of SkillTools. 

import streamlit as st
import pandas as pd
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import plotly.figure_factory as ff
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import seaborn as sns
from PIL import Image

After this I have included a slight description of the app as a string which includes the dataset resource and the developers. After which we need to feed in our dataset and define some headings to that the users can know what this is.

df = pd.read_csv(r'diabetes.csv')
st.sidebar.header('Patient Data')
st.subheader('Training Dataset')
st.write(df.describe())

After this we need to train and test our data. For the purpose of this app, I’ve used the test size and train size as 20% and 80% respectively.

x = df.drop(['Outcome'], axis = 1)
y = df.iloc[:, -1]
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

Once we’re done with this, we need to define the user report and the user report data depending on the various parameters given in the training dataset. For this particular dataset the parameters are Glucose, Insulin, Blood Pressure, BMI, Age, Diabetes Pedigree Function (DPF), Number of pregnancies and Skin Thickness. We also need to mention the range of values of these parameters so that the user can change them using the sliders in the sidebar.

def user_report():
  glucose = st.sidebar.slider('Glucose', 0,250, 120 )
  insulin = st.sidebar.slider('Insulin', 0,850, 90 )
  bp = st.sidebar.slider('Blood Pressure', 0,300, 85 )
  bmi = st.sidebar.slider('BMI', 0,70, 22 )
  dpf = st.sidebar.slider('Diabetes Pedigree Function', 0.0,3.0, 0.8 )
  age = st.sidebar.slider('Age', 21,120, 55 )
  pregnancies = st.sidebar.slider('Pregnancies', 0,10, 1 )
  skinthickness = st.sidebar.slider('Skin Thickness', 0,100, 35 )

  user_report_data = {
      'glucose':glucose,
      'insulin':insulin,
      'bp':bp,
      'bmi':bmi,
      'dpf':dpf,
      'age':age,
      'pregnancies':pregnancies,
      'skinthickness':skinthickness,
         
  }
  report_data = pd.DataFrame(user_report_data, index=[0])
  return report_data



user_data = user_report()
st.subheader('Patient Data')
st.write(user_data)

Now here’s the part that we run the Random Forest Classifier Algorithm, fit the data and run the model based on the input dataset.

rf  = RandomForestClassifier()
rf.fit(x_train, y_train)
user_result = rf.predict(user_data)

Now we finally come to my most favourite part of these web apps: Visualizations. I have been experimenting a lot with a number of visualization libraries but some of them really stand out for me and I use them often in my apps. So here as a convention I’ve used blue colour for non diabetic patients and the colour red for diabetic patients.

st.title('Graphical Patient Report')



if user_result[0]==0:
  color = 'blue'
else:
  color = 'red'

We start off with glucose and code in its visualizations. Here I’ve basically plotted a seaborn scatterplot with age on the x axis and the values of the glucose parameter on the y axis. I have used the purple palette and have scaled the axes according to the data. A value of 0 represents a healthy case whereas a value of 1 represents an unhealthy case.

st.header('Glucose Value Graph (Yours vs Others)')
fig_glucose = plt.figure()
ax3 = sns.scatterplot(x = 'Age', y = 'Glucose', data = df, hue = 'Outcome' , palette='Purples')
ax4 = sns.scatterplot(x = user_data['age'], y = user_data['glucose'], s = 150, color = color)
plt.xticks(np.arange(0,100,5))
plt.yticks(np.arange(0,250,20))
plt.title('0 - Healthy & 1 - Unhealthy')
st.pyplot(fig_glucose)

Now that we are done with one parameter, we can very easily do this same for the other parameters as well. Just replace the above code snippet with that of the other parameters and you are set to go. I will leave this as an exercise for you’ll and if you have any queries regarding it, please do ask. After completing the visualizations for all the parameters, we are finally ready to display the outcome and the prediction. I have given the outcome in the form of a user report.

st.subheader('Your Report: ')
output=''
if user_result[0]==0:
  output = 'Congratulations, you are not Diabetic'
else:
  output = 'Unfortunately, you are Diabetic'
st.title(output)

Next, I have duly given the dataset credits to the respective owners and authorities in charge of this dataset and have adhered to its license which is Open Data Commons Public Domain Dedication and License (PDDL) in this case. I have also mentioned where I received the dataset from (UCI Machine Learning Repository) and have cited the original creators of this dataset for their commendable work.

//st.sidebar.subheader("""An article about this app: https://proskillocity.blogspot.com/2021/04/official-launch-of-our-first- web-app.html""")
//st.write("Dataset citation : Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988).  Using the            ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.")
//st.write("Original owners of the dataset: Original owners: National Institute of Diabetes and Digestive and Kidney Diseases   (b) Donor of database: Vincent Sigillito (vgs@aplcen.apl.jhu.edu) Research Center, RMI Group Leader Applied Physics Laboratory  The Johns Hopkins University Johns Hopkins Road Laurel, MD 20707 (301) 953-6231 © Date received: 9 May 1990")
//st.write("This dataset is also available on the UC Irvine Machine Learning Repository")
//st.write("Dataset License: Open Data Commons Public Domain Dedication and License (PDDL)")

To cap up this web app, I’ve given a disclaimer that I give for all my BioTechnology and medical applications of data science that this is an application based on one particular dataset so we cannot use it universally. I have also attached the logo of Skillocity at the end.

So that’s it from this web app and I’ll see you soon with another fun application of Machine Learning / Data Science and give some interesting insights. Hasta pronto !

Diabetes Detector Web App9 min read

7 thoughts on “Diabetes Detector Web App9 min read

  1. Utility olma konusu biraz göreceli bir kavram: adam vip locada
    maç izlemek için oylamaya katilir, antrenman ya da stadyum
    gezisi, antrenör ve futbolcular ile yemek tanışma gibi etkinlikler bazıları için rüya gibi bir utility
    denebilir. Ayrıca farklı cazip imkanlar sunulmaya devam edilecektir diye düşünüyorum.

  2. Eğer tiroid bezi yavaş çalışıyorsa, düşünce hızından, hareket hızına
    kadar vücuttaki her şey yavaşlar. Bununla birlikte, saç dökülmesi, cilt kuruluğu,
    kabızlık, görme bozukluğu gibi birçok sorunun kaynağı yine tiroid bezinin az çalışmasına bağlı olarak ortaya çıkabilecek sorunlar arasında yer almaktadır.

Leave a Reply

Your email address will not be published.

Scroll to top