{ "metadata": { "name": "", "signature": "sha256:464eafb331066feb2485434da18ecdda9499a35c0780a8dc37fc4329c2ead536" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Preparation and Basic Statistics\n", "\n", "Author: [Pili Hu](http://hupili.net/)\n", "\n", "Data preparation, data cleansing, and getting basic statistics are the first things to do.\n", "The Titanic competition on Kaggle already gives us a well-formatted dataset.\n", "We'll try to get some basic statistics in this notebook.\n", "\n", "Competition link: [http://www.kaggle.com/c/titanic-gettingStarted](http://www.kaggle.com/c/titanic-gettingStarted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "data = pd.read_csv('train.csv')\n", "data[:5]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "
5 rows \u00d7 12 columns
\n", "\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Fare | \n", "
---|---|---|---|---|---|---|---|
count | \n", "891.000000 | \n", "891.000000 | \n", "891.000000 | \n", "714.000000 | \n", "891.000000 | \n", "891.000000 | \n", "891.000000 | \n", "
mean | \n", "446.000000 | \n", "0.383838 | \n", "2.308642 | \n", "29.699118 | \n", "0.523008 | \n", "0.381594 | \n", "32.204208 | \n", "
std | \n", "257.353842 | \n", "0.486592 | \n", "0.836071 | \n", "14.526497 | \n", "1.102743 | \n", "0.806057 | \n", "49.693429 | \n", "
min | \n", "1.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.420000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
25% | \n", "223.500000 | \n", "0.000000 | \n", "2.000000 | \n", "20.125000 | \n", "0.000000 | \n", "0.000000 | \n", "7.910400 | \n", "
50% | \n", "446.000000 | \n", "0.000000 | \n", "3.000000 | \n", "28.000000 | \n", "0.000000 | \n", "0.000000 | \n", "14.454200 | \n", "
75% | \n", "668.500000 | \n", "1.000000 | \n", "3.000000 | \n", "38.000000 | \n", "1.000000 | \n", "0.000000 | \n", "31.000000 | \n", "
max | \n", "891.000000 | \n", "1.000000 | \n", "3.000000 | \n", "80.000000 | \n", "8.000000 | \n", "6.000000 | \n", "512.329200 | \n", "
8 rows \u00d7 7 columns
\n", "\n", " | PassengerId | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "892 | \n", "3 | \n", "Kelly, Mr. James | \n", "male | \n", "34.5 | \n", "0 | \n", "0 | \n", "330911 | \n", "7.8292 | \n", "NaN | \n", "Q | \n", "
1 | \n", "893 | \n", "3 | \n", "Wilkes, Mrs. James (Ellen Needs) | \n", "female | \n", "47.0 | \n", "1 | \n", "0 | \n", "363272 | \n", "7.0000 | \n", "NaN | \n", "S | \n", "
2 | \n", "894 | \n", "2 | \n", "Myles, Mr. Thomas Francis | \n", "male | \n", "62.0 | \n", "0 | \n", "0 | \n", "240276 | \n", "9.6875 | \n", "NaN | \n", "Q | \n", "
3 | \n", "895 | \n", "3 | \n", "Wirz, Mr. Albert | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "315154 | \n", "8.6625 | \n", "NaN | \n", "S | \n", "
4 | \n", "896 | \n", "3 | \n", "Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n", "female | \n", "22.0 | \n", "1 | \n", "1 | \n", "3101298 | \n", "12.2875 | \n", "NaN | \n", "S | \n", "
5 rows \u00d7 11 columns
\n", "\n", " | PassengerId | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "Survived | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "892 | \n", "3 | \n", "Kelly, Mr. James | \n", "male | \n", "34.5 | \n", "0 | \n", "0 | \n", "330911 | \n", "7.8292 | \n", "NaN | \n", "Q | \n", "0 | \n", "
1 | \n", "893 | \n", "3 | \n", "Wilkes, Mrs. James (Ellen Needs) | \n", "female | \n", "47.0 | \n", "1 | \n", "0 | \n", "363272 | \n", "7.0000 | \n", "NaN | \n", "S | \n", "1 | \n", "
2 | \n", "894 | \n", "2 | \n", "Myles, Mr. Thomas Francis | \n", "male | \n", "62.0 | \n", "0 | \n", "0 | \n", "240276 | \n", "9.6875 | \n", "NaN | \n", "Q | \n", "0 | \n", "
3 | \n", "895 | \n", "3 | \n", "Wirz, Mr. Albert | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "315154 | \n", "8.6625 | \n", "NaN | \n", "S | \n", "0 | \n", "
4 | \n", "896 | \n", "3 | \n", "Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n", "female | \n", "22.0 | \n", "1 | \n", "1 | \n", "3101298 | \n", "12.2875 | \n", "NaN | \n", "S | \n", "1 | \n", "
5 rows \u00d7 12 columns
\n", "