{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"[](http://rpi.analyticsdojo.com)\n",
"
Introduction to R - DataFrames
\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Introduction to R DataFrames\n",
"- Data frames are combinations of vectors of the same length, but can be of different types.\n",
"- It is a special type of list. \n",
"- Data frames are what is used for standard rectangular (record by field) datasets, similar to a spreadsheet\n",
"- Data frames are a functionality that both sets R aside from some languages (e.g., Matlab) and provides functionality similar to some statistical packages (e.g., Stata, SAS) and Python's Pandas Packages.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'data.frame'"
],
"text/latex": [
"'data.frame'"
],
"text/markdown": [
"'data.frame'"
],
"text/plain": [
"[1] \"data.frame\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa\\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | \n",
"|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"1 5.1 3.5 1.4 0.2 setosa \n",
"2 4.9 3.0 1.4 0.2 setosa \n",
"3 4.7 3.2 1.3 0.2 setosa \n",
"4 4.6 3.1 1.5 0.2 setosa \n",
"5 5.0 3.6 1.4 0.2 setosa \n",
"6 5.4 3.9 1.7 0.4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" | sepal_length | sepal_width | petal_length | petal_width | species |
\n",
"\n",
"\t145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
\n",
"\t146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
\n",
"\t147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
\n",
"\t148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
\n",
"\t149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
\n",
"\t150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" & sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species\\\\\n",
"\\hline\n",
"\t145 & 6.7 & 3.3 & 5.7 & 2.5 & virginica\\\\\n",
"\t146 & 6.7 & 3.0 & 5.2 & 2.3 & virginica\\\\\n",
"\t147 & 6.3 & 2.5 & 5.0 & 1.9 & virginica\\\\\n",
"\t148 & 6.5 & 3.0 & 5.2 & 2.0 & virginica\\\\\n",
"\t149 & 6.2 & 3.4 & 5.4 & 2.3 & virginica\\\\\n",
"\t150 & 5.9 & 3.0 & 5.1 & 1.8 & virginica\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | sepal_length | sepal_width | petal_length | petal_width | species | \n",
"|---|---|---|---|---|---|\n",
"| 145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica | \n",
"| 146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica | \n",
"| 147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica | \n",
"| 148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica | \n",
"| 149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica | \n",
"| 150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species \n",
"145 6.7 3.3 5.7 2.5 virginica\n",
"146 6.7 3.0 5.2 2.3 virginica\n",
"147 6.3 2.5 5.0 1.9 virginica\n",
"148 6.5 3.0 5.2 2.0 virginica\n",
"149 6.2 3.4 5.4 2.3 virginica\n",
"150 5.9 3.0 5.1 1.8 virginica"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"'data.frame':\t150 obs. of 5 variables:\n",
" $ sepal_length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...\n",
" $ sepal_width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...\n",
" $ petal_length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...\n",
" $ petal_width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...\n",
" $ species : Factor w/ 3 levels \"setosa\",\"versicolor\",..: 1 1 1 1 1 1 1 1 1 1 ...\n"
]
}
],
"source": [
"frame=read.csv(file=\"../../input/iris.csv\", header=TRUE,sep=\",\")\n",
"class(frame)\n",
"head(frame) #The first few rows.\n",
"tail(frame) #The last few rows.\n",
"str(frame) #The Structure.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 150
\n",
"\t- 5
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 150\n",
"\\item 5\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 150\n",
"2. 5\n",
"\n",
"\n"
],
"text/plain": [
"[1] 150 5"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"150"
],
"text/latex": [
"150"
],
"text/markdown": [
"150"
],
"text/plain": [
"[1] 150"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- 'sepal_length'
\n",
"\t- 'sepal_width'
\n",
"\t- 'petal_length'
\n",
"\t- 'petal_width'
\n",
"\t- 'species'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'sepal\\_length'\n",
"\\item 'sepal\\_width'\n",
"\\item 'petal\\_length'\n",
"\\item 'petal\\_width'\n",
"\\item 'species'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'sepal_length'\n",
"2. 'sepal_width'\n",
"3. 'petal_length'\n",
"4. 'petal_width'\n",
"5. 'species'\n",
"\n",
"\n"
],
"text/plain": [
"[1] \"sepal_length\" \"sepal_width\" \"petal_length\" \"petal_width\" \"species\" "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"5"
],
"text/latex": [
"5"
],
"text/markdown": [
"5"
],
"text/plain": [
"[1] 5"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
" sepal_length sepal_width petal_length petal_width \n",
" Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 \n",
" 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 \n",
" Median :5.800 Median :3.000 Median :4.350 Median :1.300 \n",
" Mean :5.843 Mean :3.054 Mean :3.759 Mean :1.199 \n",
" 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 \n",
" Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 \n",
" species \n",
" setosa :50 \n",
" versicolor:50 \n",
" virginica :50 \n",
" \n",
" \n",
" "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"FALSE"
],
"text/latex": [
"FALSE"
],
"text/markdown": [
"FALSE"
],
"text/plain": [
"[1] FALSE"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"TRUE"
],
"text/latex": [
"TRUE"
],
"text/markdown": [
"TRUE"
],
"text/plain": [
"[1] TRUE"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"'numeric'"
],
"text/latex": [
"'numeric'"
],
"text/markdown": [
"'numeric'"
],
"text/plain": [
"[1] \"numeric\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"'factor'"
],
"text/latex": [
"'factor'"
],
"text/markdown": [
"'factor'"
],
"text/plain": [
"[1] \"factor\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- 'setosa'
\n",
"\t- 'versicolor'
\n",
"\t- 'virginica'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'setosa'\n",
"\\item 'versicolor'\n",
"\\item 'virginica'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'setosa'\n",
"2. 'versicolor'\n",
"3. 'virginica'\n",
"\n",
"\n"
],
"text/plain": [
"[1] \"setosa\" \"versicolor\" \"virginica\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dim(frame) #Results in rows x columns\n",
"nrow(frame) #The number of Rows\n",
"names(frame) #Provides the names\n",
"length(frame) #The number of columns\n",
"summary(frame) #Provides summary statistics.\n",
"is.matrix(frame) #Yields False because it has different types. \n",
"is.list(frame) #Yields True\n",
"class(frame$sepal_length)\n",
"class(frame$species)\n",
"levels(frame$species)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"species | sepal_width |
\n",
"\n",
"\tsetosa | 3.5 |
\n",
"\tsetosa | 3.0 |
\n",
"\tsetosa | 3.2 |
\n",
"\tsetosa | 3.1 |
\n",
"\tsetosa | 3.6 |
\n",
"\tsetosa | 3.9 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 2.9 |
\n",
"\tsetosa | 3.1 |
\n",
"\tsetosa | 3.7 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 3.0 |
\n",
"\tsetosa | 3.0 |
\n",
"\tsetosa | 4.0 |
\n",
"\tsetosa | 4.4 |
\n",
"\tsetosa | 3.9 |
\n",
"\tsetosa | 3.5 |
\n",
"\tsetosa | 3.8 |
\n",
"\tsetosa | 3.8 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 3.7 |
\n",
"\tsetosa | 3.6 |
\n",
"\tsetosa | 3.3 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 3.0 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 3.5 |
\n",
"\tsetosa | 3.4 |
\n",
"\tsetosa | 3.2 |
\n",
"\t⋮ | ⋮ |
\n",
"\tvirginica | 3.2 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 2.7 |
\n",
"\tvirginica | 3.3 |
\n",
"\tvirginica | 3.2 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 3.0 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 3.0 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 3.8 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 2.8 |
\n",
"\tvirginica | 2.6 |
\n",
"\tvirginica | 3.0 |
\n",
"\tvirginica | 3.4 |
\n",
"\tvirginica | 3.1 |
\n",
"\tvirginica | 3.0 |
\n",
"\tvirginica | 3.1 |
\n",
"\tvirginica | 3.1 |
\n",
"\tvirginica | 3.1 |
\n",
"\tvirginica | 2.7 |
\n",
"\tvirginica | 3.2 |
\n",
"\tvirginica | 3.3 |
\n",
"\tvirginica | 3.0 |
\n",
"\tvirginica | 2.5 |
\n",
"\tvirginica | 3.0 |
\n",
"\tvirginica | 3.4 |
\n",
"\tvirginica | 3.0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" species & sepal\\_width\\\\\n",
"\\hline\n",
"\t setosa & 3.5 \\\\\n",
"\t setosa & 3.0 \\\\\n",
"\t setosa & 3.2 \\\\\n",
"\t setosa & 3.1 \\\\\n",
"\t setosa & 3.6 \\\\\n",
"\t setosa & 3.9 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 2.9 \\\\\n",
"\t setosa & 3.1 \\\\\n",
"\t setosa & 3.7 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 3.0 \\\\\n",
"\t setosa & 3.0 \\\\\n",
"\t setosa & 4.0 \\\\\n",
"\t setosa & 4.4 \\\\\n",
"\t setosa & 3.9 \\\\\n",
"\t setosa & 3.5 \\\\\n",
"\t setosa & 3.8 \\\\\n",
"\t setosa & 3.8 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 3.7 \\\\\n",
"\t setosa & 3.6 \\\\\n",
"\t setosa & 3.3 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 3.0 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 3.5 \\\\\n",
"\t setosa & 3.4 \\\\\n",
"\t setosa & 3.2 \\\\\n",
"\t ⋮ & ⋮\\\\\n",
"\t virginica & 3.2 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 2.7 \\\\\n",
"\t virginica & 3.3 \\\\\n",
"\t virginica & 3.2 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 3.8 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 2.8 \\\\\n",
"\t virginica & 2.6 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\t virginica & 3.4 \\\\\n",
"\t virginica & 3.1 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\t virginica & 3.1 \\\\\n",
"\t virginica & 3.1 \\\\\n",
"\t virginica & 3.1 \\\\\n",
"\t virginica & 2.7 \\\\\n",
"\t virginica & 3.2 \\\\\n",
"\t virginica & 3.3 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\t virginica & 2.5 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\t virginica & 3.4 \\\\\n",
"\t virginica & 3.0 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"species | sepal_width | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| setosa | 3.5 | \n",
"| setosa | 3.0 | \n",
"| setosa | 3.2 | \n",
"| setosa | 3.1 | \n",
"| setosa | 3.6 | \n",
"| setosa | 3.9 | \n",
"| setosa | 3.4 | \n",
"| setosa | 3.4 | \n",
"| setosa | 2.9 | \n",
"| setosa | 3.1 | \n",
"| setosa | 3.7 | \n",
"| setosa | 3.4 | \n",
"| setosa | 3.0 | \n",
"| setosa | 3.0 | \n",
"| setosa | 4.0 | \n",
"| setosa | 4.4 | \n",
"| setosa | 3.9 | \n",
"| setosa | 3.5 | \n",
"| setosa | 3.8 | \n",
"| setosa | 3.8 | \n",
"| setosa | 3.4 | \n",
"| setosa | 3.7 | \n",
"| setosa | 3.6 | \n",
"| setosa | 3.3 | \n",
"| setosa | 3.4 | \n",
"| setosa | 3.0 | \n",
"| setosa | 3.4 | \n",
"| setosa | 3.5 | \n",
"| setosa | 3.4 | \n",
"| setosa | 3.2 | \n",
"| ⋮ | ⋮ | \n",
"| virginica | 3.2 | \n",
"| virginica | 2.8 | \n",
"| virginica | 2.8 | \n",
"| virginica | 2.7 | \n",
"| virginica | 3.3 | \n",
"| virginica | 3.2 | \n",
"| virginica | 2.8 | \n",
"| virginica | 3.0 | \n",
"| virginica | 2.8 | \n",
"| virginica | 3.0 | \n",
"| virginica | 2.8 | \n",
"| virginica | 3.8 | \n",
"| virginica | 2.8 | \n",
"| virginica | 2.8 | \n",
"| virginica | 2.6 | \n",
"| virginica | 3.0 | \n",
"| virginica | 3.4 | \n",
"| virginica | 3.1 | \n",
"| virginica | 3.0 | \n",
"| virginica | 3.1 | \n",
"| virginica | 3.1 | \n",
"| virginica | 3.1 | \n",
"| virginica | 2.7 | \n",
"| virginica | 3.2 | \n",
"| virginica | 3.3 | \n",
"| virginica | 3.0 | \n",
"| virginica | 2.5 | \n",
"| virginica | 3.0 | \n",
"| virginica | 3.4 | \n",
"| virginica | 3.0 | \n",
"\n",
"\n"
],
"text/plain": [
" species sepal_width\n",
"1 setosa 3.5 \n",
"2 setosa 3.0 \n",
"3 setosa 3.2 \n",
"4 setosa 3.1 \n",
"5 setosa 3.6 \n",
"6 setosa 3.9 \n",
"7 setosa 3.4 \n",
"8 setosa 3.4 \n",
"9 setosa 2.9 \n",
"10 setosa 3.1 \n",
"11 setosa 3.7 \n",
"12 setosa 3.4 \n",
"13 setosa 3.0 \n",
"14 setosa 3.0 \n",
"15 setosa 4.0 \n",
"16 setosa 4.4 \n",
"17 setosa 3.9 \n",
"18 setosa 3.5 \n",
"19 setosa 3.8 \n",
"20 setosa 3.8 \n",
"21 setosa 3.4 \n",
"22 setosa 3.7 \n",
"23 setosa 3.6 \n",
"24 setosa 3.3 \n",
"25 setosa 3.4 \n",
"26 setosa 3.0 \n",
"27 setosa 3.4 \n",
"28 setosa 3.5 \n",
"29 setosa 3.4 \n",
"30 setosa 3.2 \n",
"⋮ ⋮ ⋮ \n",
"121 virginica 3.2 \n",
"122 virginica 2.8 \n",
"123 virginica 2.8 \n",
"124 virginica 2.7 \n",
"125 virginica 3.3 \n",
"126 virginica 3.2 \n",
"127 virginica 2.8 \n",
"128 virginica 3.0 \n",
"129 virginica 2.8 \n",
"130 virginica 3.0 \n",
"131 virginica 2.8 \n",
"132 virginica 3.8 \n",
"133 virginica 2.8 \n",
"134 virginica 2.8 \n",
"135 virginica 2.6 \n",
"136 virginica 3.0 \n",
"137 virginica 3.4 \n",
"138 virginica 3.1 \n",
"139 virginica 3.0 \n",
"140 virginica 3.1 \n",
"141 virginica 3.1 \n",
"142 virginica 3.1 \n",
"143 virginica 2.7 \n",
"144 virginica 3.2 \n",
"145 virginica 3.3 \n",
"146 virginica 3.0 \n",
"147 virginica 2.5 \n",
"148 virginica 3.0 \n",
"149 virginica 3.4 \n",
"150 virginica 3.0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"frame[c(\"species\",\"sepal_width\")]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa | 0 | 0 |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa | 0 | 0 |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals & petals2\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa & 0 & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 | \n",
"|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 0 | 0 | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 0 | 0 | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals petals2\n",
"1 5.1 3.5 1.4 0.2 setosa 0 0 \n",
"2 4.9 3.0 1.4 0.2 setosa 0 0 \n",
"3 4.7 3.2 1.3 0.2 setosa 0 0 \n",
"4 4.6 3.1 1.5 0.2 setosa 0 0 \n",
"5 5.0 3.6 1.4 0.2 setosa 0 0 \n",
"6 5.4 3.9 1.7 0.4 setosa 0 0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"frame['petals']<-0\n",
"frame$petals2<-0\n",
"head(frame)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mean.sepalLenth.setosa<-mean(frame[,'sepal_length'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Slicing a Dataframe by Column\n",
"- Remember the syntax of `df[rows,columns]` \n",
"- Using `dataframe$column` provides one way of selecting a column. \n",
"- We can also specify the index position: `dataframe[,columnIndex]`\n",
"- We can also specify the column name: `dataframe[,columnsName]`"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 5.1
\n",
"\t- 4.9
\n",
"\t- 4.7
\n",
"\t- 4.6
\n",
"\t- 5
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 5.1\n",
"\\item 4.9\n",
"\\item 4.7\n",
"\\item 4.6\n",
"\\item 5\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 5.1\n",
"2. 4.9\n",
"3. 4.7\n",
"4. 4.6\n",
"5. 5\n",
"\n",
"\n"
],
"text/plain": [
"[1] 5.1 4.9 4.7 4.6 5.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- 5.1
\n",
"\t- 4.9
\n",
"\t- 4.7
\n",
"\t- 4.6
\n",
"\t- 5
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 5.1\n",
"\\item 4.9\n",
"\\item 4.7\n",
"\\item 4.6\n",
"\\item 5\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 5.1\n",
"2. 4.9\n",
"3. 4.7\n",
"4. 4.6\n",
"5. 5\n",
"\n",
"\n"
],
"text/plain": [
"[1] 5.1 4.9 4.7 4.6 5.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- 5.1
\n",
"\t- 4.9
\n",
"\t- 4.7
\n",
"\t- 4.6
\n",
"\t- 5
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 5.1\n",
"\\item 4.9\n",
"\\item 4.7\n",
"\\item 4.6\n",
"\\item 5\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 5.1\n",
"2. 4.9\n",
"3. 4.7\n",
"4. 4.6\n",
"5. 5\n",
"\n",
"\n"
],
"text/plain": [
"[1] 5.1 4.9 4.7 4.6 5.0"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sepal_length1<-frame$sepal_length #Using Dollar Sign and the column name.\n",
"sepal_length2<- frame[,1] #Using the Index Location\n",
"sepal_length3<- frame[,'sepal_length']\n",
"sepal_length4<- frame[,c('sepal_length','sepal_width')]\n",
"\n",
"sepal_length1[1:5] #Print the first 5 \n",
"sepal_length2[1:5]\n",
"sepal_length3[1:5]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Selecting Rows\n",
"- We can select rows from a dataframe using index position: `dataframe[rowIndex,columnIndex]`. \n",
"- Use `c(row1, row2, row3)` to select out specific rows. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"20"
],
"text/latex": [
"20"
],
"text/markdown": [
"20"
],
"text/plain": [
"[1] 20"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"3"
],
"text/latex": [
"3"
],
"text/markdown": [
"3"
],
"text/plain": [
"[1] 3"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" | sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 |
\n",
"\n",
"\t1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" & sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals & petals2\\\\\n",
"\\hline\n",
"\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t5 & 5.0 & 3.6 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa & 0 & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 | \n",
"|---|---|---|\n",
"| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals petals2\n",
"1 5.1 3.5 1.4 0.2 setosa 0 0 \n",
"5 5.0 3.6 1.4 0.2 setosa 0 0 \n",
"6 5.4 3.9 1.7 0.4 setosa 0 0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"frame2<-frame[1:20,] \n",
"frame3<-frame[c(1,5,6),] #This selects out specific rows\n",
"nrow(frame2)\n",
"nrow(frame3)\n",
"frame3"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Conditional Statements and Dataframes with Subset\n",
"- We can select subsets of a dataframe by putting an equality in the row or subset. \n",
"- Subset is also a dataframe. \n",
"- Can optionally select columns with the `select = c(col1, col2)`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa | 0 | 0 |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa | 0 | 0 |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals & petals2\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa & 0 & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 | \n",
"|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 0 | 0 | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 0 | 0 | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals petals2\n",
"1 5.1 3.5 1.4 0.2 setosa 0 0 \n",
"2 4.9 3.0 1.4 0.2 setosa 0 0 \n",
"3 4.7 3.2 1.3 0.2 setosa 0 0 \n",
"4 4.6 3.1 1.5 0.2 setosa 0 0 \n",
"5 5.0 3.6 1.4 0.2 setosa 0 0 \n",
"6 5.4 3.9 1.7 0.4 setosa 0 0 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"'data.frame'"
],
"text/latex": [
"'data.frame'"
],
"text/markdown": [
"'data.frame'"
],
"text/plain": [
"[1] \"data.frame\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"50"
],
"text/latex": [
"50"
],
"text/markdown": [
"50"
],
"text/plain": [
"[1] 50"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"5.006"
],
"text/latex": [
"5.006"
],
"text/markdown": [
"5.006"
],
"text/plain": [
"[1] 5.006"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"22"
],
"text/latex": [
"22"
],
"text/markdown": [
"22"
],
"text/plain": [
"[1] 22"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" | sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 |
\n",
"\n",
"\t1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 |
\n",
"\t6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 |
\n",
"\t11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa | 0 | 0 |
\n",
"\t15 | 5.8 | 4.0 | 1.2 | 0.2 | setosa | 0 | 0 |
\n",
"\t16 | 5.7 | 4.4 | 1.5 | 0.4 | setosa | 0 | 0 |
\n",
"\t17 | 5.4 | 3.9 | 1.3 | 0.4 | setosa | 0 | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" & sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals & petals2\\\\\n",
"\\hline\n",
"\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa & 0 & 0 \\\\\n",
"\t11 & 5.4 & 3.7 & 1.5 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t15 & 5.8 & 4.0 & 1.2 & 0.2 & setosa & 0 & 0 \\\\\n",
"\t16 & 5.7 & 4.4 & 1.5 & 0.4 & setosa & 0 & 0 \\\\\n",
"\t17 & 5.4 & 3.9 & 1.3 & 0.4 & setosa & 0 & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | sepal_length | sepal_width | petal_length | petal_width | species | petals | petals2 | \n",
"|---|---|---|---|---|---|\n",
"| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | 0 | \n",
"| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 | 0 | \n",
"| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa | 0 | 0 | \n",
"| 15 | 5.8 | 4.0 | 1.2 | 0.2 | setosa | 0 | 0 | \n",
"| 16 | 5.7 | 4.4 | 1.5 | 0.4 | setosa | 0 | 0 | \n",
"| 17 | 5.4 | 3.9 | 1.3 | 0.4 | setosa | 0 | 0 | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals petals2\n",
"1 5.1 3.5 1.4 0.2 setosa 0 0 \n",
"6 5.4 3.9 1.7 0.4 setosa 0 0 \n",
"11 5.4 3.7 1.5 0.2 setosa 0 0 \n",
"15 5.8 4.0 1.2 0.2 setosa 0 0 \n",
"16 5.7 4.4 1.5 0.4 setosa 0 0 \n",
"17 5.4 3.9 1.3 0.4 setosa 0 0 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" | sepal_length | species |
\n",
"\n",
"\t1 | 5.1 | setosa |
\n",
"\t6 | 5.4 | setosa |
\n",
"\t11 | 5.4 | setosa |
\n",
"\t15 | 5.8 | setosa |
\n",
"\t16 | 5.7 | setosa |
\n",
"\t17 | 5.4 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" & sepal\\_length & species\\\\\n",
"\\hline\n",
"\t1 & 5.1 & setosa\\\\\n",
"\t6 & 5.4 & setosa\\\\\n",
"\t11 & 5.4 & setosa\\\\\n",
"\t15 & 5.8 & setosa\\\\\n",
"\t16 & 5.7 & setosa\\\\\n",
"\t17 & 5.4 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | sepal_length | species | \n",
"|---|---|---|---|---|---|\n",
"| 1 | 5.1 | setosa | \n",
"| 6 | 5.4 | setosa | \n",
"| 11 | 5.4 | setosa | \n",
"| 15 | 5.8 | setosa | \n",
"| 16 | 5.7 | setosa | \n",
"| 17 | 5.4 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length species\n",
"1 5.1 setosa \n",
"6 5.4 setosa \n",
"11 5.4 setosa \n",
"15 5.8 setosa \n",
"16 5.7 setosa \n",
"17 5.4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"setosa.df <- subset(frame, species == 'setosa')\n",
"\n",
"head(setosa.df)\n",
"class(setosa.df)\n",
"nrow(setosa.df)\n",
"mean.sepalLenth.setosa<-mean(setosa.df$sepal_length) #This creates a new vector\n",
"mean.sepalLenth.setosa\n",
"setosa.df.highseptalLength <- subset(setosa.df, sepal_length > mean.sepalLenth.setosa)\n",
"nrow(setosa.df.highseptalLength)\n",
"head(setosa.df.highseptalLength)\n",
"setosa.dfhighseptalLength2 <- subset(setosa.df, sepal_length > mean.sepalLenth.setosa, select = c(sepal_length, species))\n",
"head(setosa.dfhighseptalLength2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Subsetting Rows Using Indices\n",
"- Just like pandas, we are using conditional statements to specify specific rows. \n",
"- See [here](http://www.ats.ucla.edu/stat/r/faq/subset_R.htm) for good coverage and examples. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" | sepal_length | sepal_width | petal_length | petal_width | species | petals |
\n",
"\n",
"\t1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 |
\n",
"\t2 | 4.9 | 3 | 1.4 | 0.2 | setosa | 0 |
\n",
"\t3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | 0 |
\n",
"\t4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 0 |
\n",
"\t5 | 5 | 3.6 | 1.4 | 0.2 | setosa | 0 |
\n",
"\t6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllll}\n",
" & sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals\\\\\n",
"\\hline\n",
"\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 \\\\\n",
"\t2 & 4.9 & 3 & 1.4 & 0.2 & setosa & 0 \\\\\n",
"\t3 & 4.7 & 3.2 & 1.3 & 0.2 & setosa & 0 \\\\\n",
"\t4 & 4.6 & 3.1 & 1.5 & 0.2 & setosa & 0 \\\\\n",
"\t5 & 5 & 3.6 & 1.4 & 0.2 & setosa & 0 \\\\\n",
"\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals\n",
"1 5.1 3.5 1.4 0.2 setosa 0\n",
"2 4.9 3.0 1.4 0.2 setosa 0\n",
"3 4.7 3.2 1.3 0.2 setosa 0\n",
"4 4.6 3.1 1.5 0.2 setosa 0\n",
"5 5.0 3.6 1.4 0.2 setosa 0\n",
"6 5.4 3.9 1.7 0.4 setosa 0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"'data.frame'"
],
"text/latex": [
"'data.frame'"
],
"text/markdown": [
"'data.frame'"
],
"text/plain": [
"[1] \"data.frame\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"50"
],
"text/latex": [
"50"
],
"text/markdown": [
"50"
],
"text/plain": [
"[1] 50"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"5.006"
],
"text/latex": [
"5.006"
],
"text/markdown": [
"5.006"
],
"text/plain": [
"[1] 5.006"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"22"
],
"text/latex": [
"22"
],
"text/markdown": [
"22"
],
"text/plain": [
"[1] 22"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" | sepal_length | sepal_width | petal_length | petal_width | species | petals |
\n",
"\n",
"\t1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 |
\n",
"\t6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0 |
\n",
"\t11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa | 0 |
\n",
"\t15 | 5.8 | 4 | 1.2 | 0.2 | setosa | 0 |
\n",
"\t16 | 5.7 | 4.4 | 1.5 | 0.4 | setosa | 0 |
\n",
"\t17 | 5.4 | 3.9 | 1.3 | 0.4 | setosa | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllll}\n",
" & sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals\\\\\n",
"\\hline\n",
"\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 \\\\\n",
"\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa & 0 \\\\\n",
"\t11 & 5.4 & 3.7 & 1.5 & 0.2 & setosa & 0 \\\\\n",
"\t15 & 5.8 & 4 & 1.2 & 0.2 & setosa & 0 \\\\\n",
"\t16 & 5.7 & 4.4 & 1.5 & 0.4 & setosa & 0 \\\\\n",
"\t17 & 5.4 & 3.9 & 1.3 & 0.4 & setosa & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals\n",
"1 5.1 3.5 1.4 0.2 setosa 0\n",
"6 5.4 3.9 1.7 0.4 setosa 0\n",
"11 5.4 3.7 1.5 0.2 setosa 0\n",
"15 5.8 4.0 1.2 0.2 setosa 0\n",
"16 5.7 4.4 1.5 0.4 setosa 0\n",
"17 5.4 3.9 1.3 0.4 setosa 0"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"setosa.df <- frame[frame$species == \"setosa\",]\n",
"head(setosa.df)\n",
"class(setosa.df)\n",
"nrow(setosa.df)\n",
"mean.sepalLenth.setosa<-mean(setosa.df$sepal_length) #This creates a new vector\n",
"mean.sepalLenth.setosa\n",
"setosa.df.highseptalLength <- setosa.df[setosa.df$sepal_length > mean.sepalLenth.setosa,]\n",
"nrow(setosa.df.highseptalLength)\n",
"head(setosa.df.highseptalLength)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" | sepal_length | sepal_width | petal_length | petal_width | species | petals |
\n",
"\n",
"\t1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 |
\n",
"\t15 | 5.8 | 4.0 | 1.2 | 0.2 | setosa | 0 |
\n",
"\t18 | 5.1 | 3.5 | 1.4 | 0.3 | setosa | 0 |
\n",
"\t20 | 5.1 | 3.8 | 1.5 | 0.3 | setosa | 0 |
\n",
"\t22 | 5.1 | 3.7 | 1.5 | 0.4 | setosa | 0 |
\n",
"\t24 | 5.1 | 3.3 | 1.7 | 0.5 | setosa | 0 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllll}\n",
" & sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & petals\\\\\n",
"\\hline\n",
"\t1 & 5.1 & 3.5 & 1.4 & 0.2 & setosa & 0 \\\\\n",
"\t15 & 5.8 & 4.0 & 1.2 & 0.2 & setosa & 0 \\\\\n",
"\t18 & 5.1 & 3.5 & 1.4 & 0.3 & setosa & 0 \\\\\n",
"\t20 & 5.1 & 3.8 & 1.5 & 0.3 & setosa & 0 \\\\\n",
"\t22 & 5.1 & 3.7 & 1.5 & 0.4 & setosa & 0 \\\\\n",
"\t24 & 5.1 & 3.3 & 1.7 & 0.5 & setosa & 0 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | sepal_length | sepal_width | petal_length | petal_width | species | petals | \n",
"|---|---|---|---|---|---|\n",
"| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0 | \n",
"| 15 | 5.8 | 4.0 | 1.2 | 0.2 | setosa | 0 | \n",
"| 18 | 5.1 | 3.5 | 1.4 | 0.3 | setosa | 0 | \n",
"| 20 | 5.1 | 3.8 | 1.5 | 0.3 | setosa | 0 | \n",
"| 22 | 5.1 | 3.7 | 1.5 | 0.4 | setosa | 0 | \n",
"| 24 | 5.1 | 3.3 | 1.7 | 0.5 | setosa | 0 | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species petals\n",
"1 5.1 3.5 1.4 0.2 setosa 0 \n",
"15 5.8 4.0 1.2 0.2 setosa 0 \n",
"18 5.1 3.5 1.4 0.3 setosa 0 \n",
"20 5.1 3.8 1.5 0.3 setosa 0 \n",
"22 5.1 3.7 1.5 0.4 setosa 0 \n",
"24 5.1 3.3 1.7 0.5 setosa 0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"specific.df <- frame[frame$sepal_length %in% c(5.1,5.8),]\n",
"head(specific.df)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basics\n",
"\n",
"1. Load the Titanic train.csv data into an R data frame.\n",
"2. Calculate the number of rows in the data frame.\n",
"3. Calcuated general descriptive statistics for the data frame.\n",
"4. Slice the data frame into 2 parts, selecting the first half of the rows. \n",
"5. Select just the columns passangerID and whether they survivied or not. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## CREDITS\n",
"\n",
"Copyright [AnalyticsDojo](http://rpi.analyticsdojo.com) 2016\n",
"This work is licensed under the [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license agreement.\n",
"Adopted from [Berkley R Bootcamp](https://github.com/berkeley-scf/r-bootcamp-2016).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}