Penguin Classification and Prediction Web App

After a friend told me about the Streamlit library, I instantly started working on multiple data science related projects which not only increased my abilities in data science, but made me more adept with the use of programming languages such as Python, MATLAB and HTML. At the time of writing this article, users can only deploy 3 web apps on Streamlit from a particular account. But, if you request them you can potentially also deploy more web apps. 

So coming onto this project, it’s similar to the Iris Classification and Prediction Web App and this dataset is used as an alternative to the classic Iris dataset given by R.A. Fisher. It’s a penguin classification and prediction web app. It successfully classifies and predicts the species of Palmer penguin based upon a number of user input features such as bill length, bill depth, body mass, flipper length etc. The prediction is also based upon the gender of the penguin and the island group (Bisoce, Dream and Torgersen) where it is located.

Just like most of my other web apps, this has a simple design and structure to improve UI/UX  for user convenience. The sidebar sliders help in changing the values of the parameters for determination of the result. It also displays the prediction probability along with the predicted output of the Palmer penguin species. 

This web app also displays the image of the predicted species. We have 3 different images for all three species which are displayed depending upon the predicted species based upon user input of parameter values. 

We have deployed the app using Streamlit. It is an open source framework that allows data science teams to deploy web apps fairly easily. It’s one of the best hosting services I’ve used and it’s great for quick and easy deployment of web apps. The app is majorly coded in python. 

This web app helped me to improve my experience in Machine Learning and definitely helped in my future projects. Feel free to add onto this project and don’t hesitate to drop by any suggestions. Hope you enjoy the app!

Link of the app: https://share.streamlit.io/skillocity/penguin/main/prediction.py

About the dataset: This dataset was created by Dr. Kristen Gorman and members of the Palmer Station, Antarctica (LTER). Palmer is one of the three US Antarctic Stations governed by the Antarctic Treaty of 1959. The Palmer Station is an interdisciplinary polar marine research program established in 1990. 

The dataset was uploaded by Allison Horst and it is available by CC-0 license in accordance with the Palmer Station LTER Data Policy and the LTER Data Access Policy for Type 1 data. The dataset contains data for 344 penguins. There are 3 diffrent species of penguins in this dataset, collected from 3 islands in the Palmer Archipelago, Antarctica. 

Explanation of the Code and how you can make this yourself !

Here, I am going to go through the code in a very concise and simple manner so that people with even minimal experience in programming or data science can follow along and benefit it. This app has been coded in python and has been deployed on streamlit as mentioned before. I’ve also used the Random Forest Classifier Algorithm for this particular problem. 

Alright so lets finally get started. First up I’ve imported the python packages / libraries that I’ve used for this app. More information for them is available on the project template of SkillTools. 

import streamlit as st
import pandas as pd
import numpy as np
import pickle
from sklearn.ensemble import RandomForestClassifier
from PIL import Image,ImageFilter,ImageEnhance
import os

After this I have included a slight description of the app as a string which includes the dataset resource and the developers. Next we need to define our input features i.e the parameters of our dataset, In this case, the parameters are Island, Sex, Bill Length, Bill Depth, Flipper length and body mass.

def user_input_features():
        island = st.sidebar.selectbox('Island',('Biscoe','Dream','Torgersen'))
        sex = st.sidebar.selectbox('Sex',('male','female'))
        bill_length_mm = st.sidebar.slider('Bill length (mm)', 32.1,59.6,43.9)
        bill_depth_mm = st.sidebar.slider('Bill depth (mm)', 13.1,21.5,17.2)
        flipper_length_mm = st.sidebar.slider('Flipper length (mm)', 172.0,231.0,201.0)
        body_mass_g = st.sidebar.slider('Body mass (g)', 2700.0,6300.0,4207.0)
        data = {'island': island,
                'bill_length_mm': bill_length_mm,
                'bill_depth_mm': bill_depth_mm,
                'flipper_length_mm': flipper_length_mm,
                'body_mass_g': body_mass_g,
                'sex': sex}
        features = pd.DataFrame(data, index=[0])
        return features
input_df = user_input_features()
  
st.subheader('User Input parameters')
st.write(input_df)

Once we’ve done this, we need to feed in the dataset and separate the column as it has to predict that itself. Then we need to encode the sex and island column to basically easily manipulate the data using pd.get_dummies , after which it removes this dummy data from the dataset. Next it takes user inputs , pickles them, and finally runs it’s algorithm to find result

penguins_raw = pd.read_csv('penguins_cleaned.csv')
penguins = penguins_raw.drop(columns=['species'])
df = pd.concat([input_df,penguins],axis=0)


encode = ['sex','island']
for col in encode:
    dummy = pd.get_dummies(df[col], prefix=col)
    df = pd.concat([df,dummy], axis=1)
    del df[col]
df = df[:1] # Selects only the first row (the user input data)





load_clf = pickle.load(open('penguins_clf.pkl', 'rb'))


prediction = load_clf.predict(df)
prediction_proba = load_clf.predict_proba(df)

After this we just need to print the predicted output and the prediction probability.

st.subheader('Prediction')
penguins_species = np.array(['Adelie','Chinstrap','Gentoo'])
st.write(penguins_species[prediction])

st.subheader('Prediction Probability')
st.write(prediction_proba)

I’ve also included a pretty cool functionality into the app, apart from predicting the penguin species, it also displays an image of the predicted species. To do this, you just need to download some open source images of the three penguin species and save them.

@st.cache
def load_image(img):
    im =Image.open(os.path.join(img))
    return im
#images
if penguins_species[prediction] == 'Chinstrap':
    st.text("Showing Chinstrap Penguin")
    st.image(load_image('chinstrap.jpg'))
elif penguins_species[prediction] == 'Gentoo':
    st.text("Showing Gentoo Penguin")
    st.image(load_image('gentoo.jpg'))
elif penguins_species[prediction] == 'Adelie':
    st.text("Showing Adelie Penguin")
    st.image(load_image('adelie.jpg'))

To cap up the web app, I’ve linked this article into the app and have also included the logo of Team Skillocity. I have given due credits to the researchers of this dataset and have cited the original paper. So that’s it for this app and I’ll be back soon with another cool application of Machine Learning.

Penguin Classification and Prediction Web App6 min read

6 thoughts on “Penguin Classification and Prediction Web App6 min read

  1. Hi, I'm currently taking some bioinformatics courses. Any idea where I can learn some ML/Data Science ?

  2. Hi, yeah I've worked on a few bioinformatics projects. You can check out Data Professor (Chanin Nanatasenamat, PhD Mahidol University) on YouTube. He has tutorials for fairly simple bioweb apps deployed on streamlit. You can also check out bioinformatics guy on YouTube. All the best !

  3. Hi, this is a very commonly used dataset and is often used as an alternative to the Iris Classification dataset. It's uploaded by Allison Horst on her website. For the other datasets, you can check the UC Irvine Machine Learning Repository or Kaggle. All the best!

  4. The Centre for Cold-Formed Material Set ups (CCFSS) was established at the University of Missouri-Rolla (now Missouri College of Science and
    Engineering) in May 1990 under an initial grant received from the Us Metal and Metal Start.
    Over the years, the Center’s sponsorship possesses produced to contain:
    Cold-Formed Metal Engineers Company, Metallic Construction Relationship, Rack Suppliers
    Company, Simpson Strong-Tie, Steel Deck Start and Steel Framing Market Association. In 2000, the Middle seemed to be renamed for its Founding Director, Dr.
    Wei-Wen Yu.

    The Centre is dedicated to furthering the field of cold-formed steel and
    hosts continuing education events such as the Wei-Wen Yu
    Essential Niche Seminar on Cold-Formed Material Structures, which has taken place every other year
    since 1971. Top researchers, planners, companies and educators who have interested in analysis, design,
    make and the employ of cold-formed metal members get at this conference to found illustrated chats of their latest conclusions.

Leave a Reply

Your email address will not be published.

Scroll to top