- Learning Jupyter 5
- Dan Toomey
- 382字
- 2025-04-04 16:20:21
Python pandas in Jupyter
One of the most widely used features of Python is pandas. The pandas are built-in libraries of data analysis packages that can be used freely. In this example, we will develop a Python script that uses pandas to see if there is any affect of using them in Jupyter.
I am using the Titanic dataset from https://www.kaggle.com/c/titanic/data. I am sure that the same data is available from a variety of sources.
Here is our Python script that we want to run in Jupyter:
from pandas import * training_set = read_csv('train.csv') training_set.head() male = training_set[training_set.Sex == 'male'] female = training_set[training_set.Sex =='female'] womens_survival_rate = float(sum(female.Survived))/len(female) mens_survival_rate = float(sum(male.Survived))/len(male)
womens_survival_rate, mens_survival_rate
The result is that we calculate the survival rates of the passengers based on sex.
We create a new Notebook, enter the script into the appropriate cells, include adding displays of calculated data at each point, and produce our results.
Here is our Notebook laid out, where we added displays of calculated data at each cell:
On Windows, it is common to use a backslash ( \) to separate parts of a filename. However, this coding uses the backslash as a special character. So, I had to change over to using a forward slash ( /) in my .csv file path.
The dataset column names are taken directly from the file and are case-sensitive. In this case, I was originally using the sex field in my script, but in the .csv file, the column is named Sex. Similarly, I had to change survived to Survived.
The final script and results look like this when we run it:
I have used the head() function to display the first few lines of the dataset. It is interesting the amount of detail that is available for all of the passengers.
If you scroll down, you will see the results:
We can see that 74% of the survivors were women versus just 19% men. I would like to think that chivalry is not dead.
It's curious that the results do not total to 100%. However, like every other dataset I have seen, there is missing and/or inaccurate data present.