Mesothelioma Disease Detector

I recently found another disease detection dataset on the UC Irvine Machine Learning Repository and decided to have a go at this one. However this dataset is not used that often for disease detection as I have never heard of it before and never even came across it before discovering it while reading a research paper. I’d like to shed some light on this dataset and present the Mesothelioma Disease Detector Web App.

 
This web app detects if you have Mesothelioma or not depending upon various parameters such as Platelet Count, Blood Lactic Dehydrogenise, Alkaline Phosphatise, Total Protein, Albumin, Glucose, Pleural Lactic Dehydrogenise, Pleural Protein, Pleural Albumin, Pleural Glucose and C-reactive Protein. I’ll explain more about these parameters later on in the article. This web app works on the Random Forest Classifier algorithm and is majorly coded in python.  Mesothelioma is a type of cancer that occurs in the thin layer of tissue that covers the majority of our internal organs (mesothelium). Mesothelioma is an aggressive and deadly form of cancer.
 
We have deployed the app using Streamlit. It is an open source framework that allows data science teams to deploy web apps fairly easily. It’s one of the best hosting services I’ve used and it’s great for quick and easy deployment of web apps. The app is coded in python. 
 
The web app uses interactive visual and graphical interpretations to display the outcome and compare the input parameters given by the user. We trained the dataset using 20% of it as our test size. The sidebar sliders help in changing the values of the parametres for determination of the result. The graphs compare the values of the patient with others ( both with patients having mesothelioma and patients not having mesothelioma). 
 

This web app was a learning curve for us and has improved our knowledge about Machine learning significantly. We hope to deploy more apps in the future and share them with you. Feel free to add onto this project and don’t hesitate to drop by any suggestions. The link for the Mesothelioma Disease Detector web app is as follows : 

https://share.streamlit.io/skillocity/mesothelioma-/main/app.py

About the dataset: Malignant mesotheliomas (MM) are very aggressive tumors of the pleura. These tumors are connected to asbestos exposure,

However it may also be related to previous simian virus 40 (SV40) infection and quite possible for genetic predisposition.
Molecular mechanisms can also be implicated in the development of mesothelioma.
Rural living is associated with the development of mesothelioma. Soil mixtures containing asbestos, known as
‘white-soil’ or ‘corak’ can be found in Anatolia, Turkey and ‘Luto’ in Greece.
Mesothelioma’s disease data set were prepared at Dicle University Faculty of Medicine in Turkey.
Three hundred and twenty-four Mesothelioma patient data. In the dataset, all samples have 34 features.

 
Disclaimer: This is just a learning project based on one particular dataset so please do not depend on it to actually know if you have Mesothelioma or not. It might still be a false positive or false negative. A doctor is still the best fit for the determination of such diseases.
 
Mesothelioma Awareness Day is Sept. 26. On this day, patients, family members, doctors and the mesothelioma community raise awareness of the rare cancer to help find a cure. Supporters wear blue and may wear mesothelioma awareness wristbands or ribbons.t was established in 2004 by the Mesothelioma Applied Research Foundation. It wasn’t until 2010 that Congress first declared September 26 as National Mesothelioma Awareness Day. On that note, lets raise awareness for Mesothelioma and show our support for Mesothelioma awareness and help many patients around the world. 
 

Explanation of the Code and how you can make this yourself !

Here, I am going to go through the code in a very concise and simple manner so that people with even minimal experience in programming or data science can follow along and benefit it. This app has been coded in python and has been deployed on streamlit as mentioned before. I’ve also used the Random Forest Classifier Algorithm for this particular problem. 

Alright so lets finally get started. First up I’ve imported the python packages / libraries that I’ve used for this app. More information for them is available on the project template of SkillTools. 

 

import streamlit as st
import pandas as pd
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import plotly.figure_factory as ff
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import seaborn as sns
from PIL import Image
from sklearn import preprocessing

After this I have included a slight description of the app as a string which includes the dataset resource and the developers. I’ve also put in the dataset and included headings that I’ve used.

st.markdown('''
#  Mesothelioma Detector 
This app detects if you have Mesothelioma based on Machine Learning!
- App built by Pranav Sawant and Anshuman Shukla of Team Skillocity.
- Dataset Creators: Abdullah Cetin Tanrikulu from Dicle University, Faculty of Medicine, Department of Chest Diseases, 21100 Diyarbakir, Turkey
- Orhan Er from Bozok University, Faculty of Engineering, Department of Electrical and Electronics Eng., 66200 Yozgat, Turkey
- Note: User inputs are taken from the sidebar. It is located at the top left of the page (arrow symbol). The values of the parameters can be changed from the sidebar.  
''')
st.write('---')

df = pd.read_csv(r'Mesothelioma data set.csv')
st.sidebar.header('Patient Data')
st.subheader('Training Dataset')
st.write(dfnew.describe())

After this we need to train and test our data. For the purpose of this app, I’ve used the test size and train size as 60% and 40% respectively.

x = dfnew.drop(['Outcome'], axis = 1)
y = dfnew.iloc[:, -1]
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.6, random_state = 0)
lab_enc = preprocessing.LabelEncoder()
training_scores_encoded = lab_enc.fit_transform(y_train)

Once we’re done with this, we need to define the user report and the user report data depending on the various parameters given in the training dataset. For this particular dataset, the parameters are Age, Platelet count, Blood Lactic Dehydrogenise, Alkaline Phosphatise, Total Protein, Albumin, Glucose, Pleural Lactic Dehydrogenise, Pleural Protein, Pleural Albumin, Pleural Glucose and C-Reactive Protein.

def user_report():
  Age = st.sidebar.slider('Age', 0,100, 54)
  Platelet_Count = st.sidebar.slider('Platelet Count', 0,3500, 315 )
  Blood_Lactic_Dehydrogenise = st.sidebar.slider('Blood Lactic Dehydrogenise', 0,1000, 20 )
  Alkaline_Phosphatise = st.sidebar.slider('Alkaline Phosphatise', 0,500, 92 )
  Total_Protein = st.sidebar.slider('Total Protein', 0.0,10.0, 5.1 )
  Albumin = st.sidebar.slider('Albumin', 0.0,8.0, 1.1 )
  Glucose = st.sidebar.slider('Glucose', 0,500, 10 )
  Pleural_Lactic_Dehydrogenise = st.sidebar.slider('Pleural Lactic Dehydrogenise', 0,8000, 5 )
  Pleural_Protein = st.sidebar.slider('Pleural Protein', 0.0,8.0, 6.5 )
  Pleural_Albumin = st.sidebar.slider('Pleural Albumin', 0.0,6.0, 4.2 )
  Pleural_Glucose = st.sidebar.slider('Pleural Glucose', 0,120, 44)
  Creactive_Protein = st.sidebar.slider('C-reactive Protein', 0,120, 6)
  
  
  
  
  
  
  user_report_data = {
      'Age':Age,
      'Platelet_Count':Platelet_Count,
      'Blood_Lactic_Dehydrogenise':Blood_Lactic_Dehydrogenise,
      'Alkaline_Phosphatise':Alkaline_Phosphatise,
      'Total_Protein':Total_Protein,
      'Albumin':Albumin,
      'Glucose':Glucose,
      'Pleural_Lactic_Dehydrogenise':Pleural_Lactic_Dehydrogenise,
      'Pleural_Protein':Pleural_Protein,
      'Pleural_Albumin':Pleural_Albumin,
      'Pleural_Glucose':Pleural_Glucose,
      'Creactive_Protein':Creactive_Protein,
        
  }
  report_data = pd.DataFrame(user_report_data, index=[0])
  return report_data





user_data = user_report()
st.subheader('Patient Data')
st.write(user_data)

After we’ve defined the user report, we need to run the algorithm.

rf  = RandomForestClassifier()
rf.fit(x_train, training_scores_encoded)
user_result = rf.predict(user_data)

Now we finally come to my most favourite part of these web apps: Visualizations. I have been experimenting a lot with a number of visualization libraries but some of them really stand out for me and I use them often in my apps. So here as a convention I’ve used blue colour for healthy patients and the colour red for unhealthy patients.

st.title('Graphical Patient Report')



if user_result[0]==0:
  color = 'blue'
else:
  color = 'red'

We start off with Platelet Count and code in its visualizations. Here I’ve basically plotted a seaborn scatterplot with age on the x axis and the values of the Platelet Count parameter on the y axis. I have used the purple palette and have scaled the axes according to the data. A value of 0 represents a healthy case whereas a value of 1 represents an unhealthy case.

st.header('Platelet Count Value Graph (Yours vs Others)')
fig_Radius = plt.figure()
ax3 = sns.scatterplot(x = 'Age', y = 'Platelet_Count', data = df, hue = 'Outcome' , palette='Purples')
ax4 = sns.scatterplot(x = user_data['Age'], y = user_data['Platelet_Count'], s = 150, color = color)
plt.xticks(np.arange(0,100,5))
plt.yticks(np.arange(0,3500,175))
plt.title('0 - Healthy & 1 - Unhealthy')
st.pyplot(fig_Radius)

Now that we are done with one parameter, we can very easily do this same for the other parameters as well. Just replace the above code snippet with that of the other parameters and you are set to go. I will leave this as an exercise for you’ll and if you have any queries regarding it, please do ask. After completing the visualizations for all the parameters, we are finally ready to display the outcome and the prediction. I have given the outcome in the form of a user report.

st.subheader('Your Report: ')
output=''
if user_result[0]==0:
  output = 'Congratulations, you do not have  Mesothelioma'
else:
  output = 'Unfortunately, you do have Mesothelioma'
st.title(output)

Next, I have duly given the dataset credits to the respective owners and authorities in charge of this dataset . I have also mentioned where I received the dataset from (UCI Machine Learning Repository) and have cited the original creators of this dataset for their commendable work.

To cap up this web app, I’ve given a disclaimer that I give for all my BioTechnology and medical applications of data science that this is an application based on one particular dataset so we cannot use it universally. I have also attached the logo of Skillocity at the end.

So that’s it from this web app and I’ll see you soon with another fun application of Machine Learning / Data Science and give some interesting insights. Hasta pronto !

Mesothelioma Disease Detector8 min read

6 thoughts on “Mesothelioma Disease Detector8 min read

    1. I actually found it while reading a research paper. Then found out that it was on the UCI ML repo, yeah apparently nobody on Kaggle mentioned it

  1. Hi, from where did you learn machine learning and what’s the easiest, fastest and best way to learn it so that I can start projects ASAP ?

  2. Hey Roger, I basically learnt machine learning with the help of content available on YouTube and other organisations such as Coursera. You can also read a number of books to get an in-depth and deep knowledge of ML and Computer Vision. Do check out ‘The Data Professor’ and Krish Naik if you want to build some projects.

  3. Also, if you need any help with your projects or have any doubts, you can always contact us.

Leave a Reply

Your email address will not be published.

Scroll to top