Introduction to R - Titanic Baseline

rpi.analyticsdojo.com

Running Code using Kaggle Notebooks

Kaggle utilizes Docker to create a fully functional environment for hosting competitions in data science.
You could download/run kaggle/python docker image from GitHub and run it as an alternative to the standard Jupyter Stack for Data Science we have been using.
Kaggle has created an incredible resource for learning analytics. You can view a number of toy examples that can be used to understand data science and also compete in real problems faced by top companies.

train <- read.csv('../../input/train.csv', stringsAsFactors = F)
test  <- read.csv('../../input/test.csv', stringsAsFactors = F)

`train` and `test` set on Kaggle

The train file contains a wide variety of information that might be useful in understanding whether they survived or not. It also includes a record as to whether they survived or not.
The test file contains all of the columns of the first file except whether they survived. Our goal is to predict whether the individuals survived.

head(train)

PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22	1	A/5 21171	7.2500		S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38	1	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26	0	STON/O2. 3101282	7.9250		S
4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	113803	53.1000	C123	S
5	0	3	Allen, Mr. William Henry	male	35	0	373450	8.0500		S
6	0	3	Moran, Mr. James	male	NA	0	330877	8.4583		Q

head(test)

PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Embarked
892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	Q
893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	S
894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	Q
895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	S
896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	S
897	3	Svensson, Mr. Johan Cervin	male	14.0	0	0	7538	9.2250	S

Baseline Model: No Survivors

The Titanic problem is one of classification, and often the simplest baseline of all 0/1 is an appropriate baseline.
Even if you aren’t familiar with the history of the tragedy, by checking out the Wikipedia Page we can quickly see that the majority of people (68%) died.
As a result, our baseline model will be for no survivors.

test["Survived"] <- 0

submission <- test[,c("PassengerId", "Survived")]

head(submission)

PassengerId	Survived
892	0
893	0
894	0
895	0
896	0
897	0

# Write the solution to file
write.csv(submission, file = 'nosurvivors.csv', row.names = F)

The First Rule of Shipwrecks

You may have seen it in a movie or read it in a novel, but women and children first has at it’s roots something that could provide our first model.
Now let’s recode the Survived column based on whether was a man or a woman.
We are using conditionals to select rows of interest (for example, where test[‘Sex’] == ‘male’) and recoding appropriate columns.

#Here we can code it as Survived, but if we do so we will overwrite our other prediction. 
#Instead, let's code it as PredGender

test[test$Sex == "male", "PredGender"] <- 0
test[test$Sex == "female", "PredGender"] <- 1

submission = test[,c("PassengerId", "PredGender")]
#This will Rename the survived column
names(submission)[2] <- "Survived"
head(submission)

PassengerId	Survived
892	0
893	1
894	0
895	0
896	1
897	0

names(submission)[2]<-"new"
submission

PassengerId	new
892	0
893	1
894	0
895	0
896	1
897	0
898	1
899	0
900	1
901	0
902	0
903	0
904	1
905	0
906	1
907	1
908	0
909	0
910	1
911	1
912	0
913	0
914	1
915	0
916	1
917	0
918	1
919	0
920	0
921	0
⋮	⋮
1280	0
1281	0
1282	0
1283	1
1284	0
1285	0
1286	0
1287	1
1288	0
1289	1
1290	0
1291	0
1292	1
1293	0
1294	1
1295	0
1296	0
1297	0
1298	0
1299	0
1300	1
1301	1
1302	1
1303	1
1304	1
1305	0
1306	1
1307	0
1308	0
1309	0

write.csv(submission, file = 'womensurvive.csv', row.names = F)

Introduction to R - Titanic Baseline

rpi.analyticsdojo.com

Running Code using Kaggle Notebooks

train and test set on Kaggle

Baseline Model: No Survivors

The First Rule of Shipwrecks

`train` and `test` set on Kaggle