{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"[](http://rpi.analyticsdojo.com)\n",
"
Introduction to R - Merging and Aggregating Data
\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Overview\n",
"- Merging Dataframes \n",
"- Aggregating Dataframes\n",
"- Advanced Functions\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Merging Data Frame with Vector\n",
"- Can combine vector with data frame in multiple ways. \n",
"- `data.frame(a,b)` where a & b can be vectors, matrices, or data frames. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d |
\n",
"\n",
"\t 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" key & a & b & c & d\\\\\n",
"\\hline\n",
"\t 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d \n",
"1 1 16.19873 14.41495 27.73225 15.564688\n",
"2 2 27.76677 18.54772 21.06688 15.697810\n",
"3 3 11.68592 14.91207 22.82086 15.666790\n",
"4 4 16.39982 28.30284 10.97550 19.083633\n",
"5 5 22.16232 16.82574 14.28676 20.162797\n",
"6 6 17.17425 14.36932 18.55487 13.498498\n",
"7 7 20.15380 18.00987 15.99028 14.325000\n",
"8 8 20.68866 12.83505 25.24119 24.538494\n",
"9 9 18.84664 24.01079 12.69775 8.095156\n",
"10 10 16.29913 21.51270 15.14676 23.722103"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"key | e | f | g | h |
\n",
"\n",
"\t 1 | 1004.1240 | 997.4379 | 997.5697 | 1000.8540 |
\n",
"\t 2 | 1002.6933 | 998.4041 | 1009.1720 | 1010.4120 |
\n",
"\t 3 | 995.9138 | 1001.0959 | 1004.6025 | 1002.5405 |
\n",
"\t 4 | 999.5493 | 998.8054 | 1003.9649 | 1000.0133 |
\n",
"\t 5 | 1007.2373 | 1006.2580 | 1000.1882 | 992.9980 |
\n",
"\t 6 | 1000.2068 | 994.7482 | 998.2876 | 1002.7093 |
\n",
"\t 7 | 999.1622 | 998.6231 | 998.7175 | 998.0497 |
\n",
"\t 8 | 1003.1263 | 1002.7279 | 1004.1623 | 1000.5204 |
\n",
"\t 9 | 1003.1548 | 994.2030 | 1002.1614 | 999.3726 |
\n",
"\t10 | 1006.8744 | 1004.2677 | 998.8720 | 993.5726 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" key & e & f & g & h\\\\\n",
"\\hline\n",
"\t 1 & 1004.1240 & 997.4379 & 997.5697 & 1000.8540\\\\\n",
"\t 2 & 1002.6933 & 998.4041 & 1009.1720 & 1010.4120\\\\\n",
"\t 3 & 995.9138 & 1001.0959 & 1004.6025 & 1002.5405\\\\\n",
"\t 4 & 999.5493 & 998.8054 & 1003.9649 & 1000.0133\\\\\n",
"\t 5 & 1007.2373 & 1006.2580 & 1000.1882 & 992.9980\\\\\n",
"\t 6 & 1000.2068 & 994.7482 & 998.2876 & 1002.7093\\\\\n",
"\t 7 & 999.1622 & 998.6231 & 998.7175 & 998.0497\\\\\n",
"\t 8 & 1003.1263 & 1002.7279 & 1004.1623 & 1000.5204\\\\\n",
"\t 9 & 1003.1548 & 994.2030 & 1002.1614 & 999.3726\\\\\n",
"\t 10 & 1006.8744 & 1004.2677 & 998.8720 & 993.5726\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | e | f | g | h | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 1004.1240 | 997.4379 | 997.5697 | 1000.8540 | \n",
"| 2 | 1002.6933 | 998.4041 | 1009.1720 | 1010.4120 | \n",
"| 3 | 995.9138 | 1001.0959 | 1004.6025 | 1002.5405 | \n",
"| 4 | 999.5493 | 998.8054 | 1003.9649 | 1000.0133 | \n",
"| 5 | 1007.2373 | 1006.2580 | 1000.1882 | 992.9980 | \n",
"| 6 | 1000.2068 | 994.7482 | 998.2876 | 1002.7093 | \n",
"| 7 | 999.1622 | 998.6231 | 998.7175 | 998.0497 | \n",
"| 8 | 1003.1263 | 1002.7279 | 1004.1623 | 1000.5204 | \n",
"| 9 | 1003.1548 | 994.2030 | 1002.1614 | 999.3726 | \n",
"| 10 | 1006.8744 | 1004.2677 | 998.8720 | 993.5726 | \n",
"\n",
"\n"
],
"text/plain": [
" key e f g h \n",
"1 1 1004.1240 997.4379 997.5697 1000.8540\n",
"2 2 1002.6933 998.4041 1009.1720 1010.4120\n",
"3 3 995.9138 1001.0959 1004.6025 1002.5405\n",
"4 4 999.5493 998.8054 1003.9649 1000.0133\n",
"5 5 1007.2373 1006.2580 1000.1882 992.9980\n",
"6 6 1000.2068 994.7482 998.2876 1002.7093\n",
"7 7 999.1622 998.6231 998.7175 998.0497\n",
"8 8 1003.1263 1002.7279 1004.1623 1000.5204\n",
"9 9 1003.1548 994.2030 1002.1614 999.3726\n",
"10 10 1006.8744 1004.2677 998.8720 993.5726"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d |
\n",
"\n",
"\t 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" key & a & b & c & d\\\\\n",
"\\hline\n",
"\t 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d \n",
"1 1 16.19873 14.41495 27.73225 15.564688\n",
"2 2 27.76677 18.54772 21.06688 15.697810\n",
"3 3 11.68592 14.91207 22.82086 15.666790\n",
"4 4 16.39982 28.30284 10.97550 19.083633\n",
"5 5 22.16232 16.82574 14.28676 20.162797\n",
"6 6 17.17425 14.36932 18.55487 13.498498\n",
"7 7 20.15380 18.00987 15.99028 14.325000\n",
"8 8 20.68866 12.83505 25.24119 24.538494\n",
"9 9 18.84664 24.01079 12.69775 8.095156\n",
"10 10 16.29913 21.51270 15.14676 23.722103"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"key | e | f | g | h |
\n",
"\n",
"\t 1 | 1004.1240 | 997.4379 | 997.5697 | 1000.8540 |
\n",
"\t 2 | 1002.6933 | 998.4041 | 1009.1720 | 1010.4120 |
\n",
"\t 3 | 995.9138 | 1001.0959 | 1004.6025 | 1002.5405 |
\n",
"\t 4 | 999.5493 | 998.8054 | 1003.9649 | 1000.0133 |
\n",
"\t 5 | 1007.2373 | 1006.2580 | 1000.1882 | 992.9980 |
\n",
"\t 6 | 1000.2068 | 994.7482 | 998.2876 | 1002.7093 |
\n",
"\t 7 | 999.1622 | 998.6231 | 998.7175 | 998.0497 |
\n",
"\t 8 | 1003.1263 | 1002.7279 | 1004.1623 | 1000.5204 |
\n",
"\t 9 | 1003.1548 | 994.2030 | 1002.1614 | 999.3726 |
\n",
"\t10 | 1006.8744 | 1004.2677 | 998.8720 | 993.5726 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" key & e & f & g & h\\\\\n",
"\\hline\n",
"\t 1 & 1004.1240 & 997.4379 & 997.5697 & 1000.8540\\\\\n",
"\t 2 & 1002.6933 & 998.4041 & 1009.1720 & 1010.4120\\\\\n",
"\t 3 & 995.9138 & 1001.0959 & 1004.6025 & 1002.5405\\\\\n",
"\t 4 & 999.5493 & 998.8054 & 1003.9649 & 1000.0133\\\\\n",
"\t 5 & 1007.2373 & 1006.2580 & 1000.1882 & 992.9980\\\\\n",
"\t 6 & 1000.2068 & 994.7482 & 998.2876 & 1002.7093\\\\\n",
"\t 7 & 999.1622 & 998.6231 & 998.7175 & 998.0497\\\\\n",
"\t 8 & 1003.1263 & 1002.7279 & 1004.1623 & 1000.5204\\\\\n",
"\t 9 & 1003.1548 & 994.2030 & 1002.1614 & 999.3726\\\\\n",
"\t 10 & 1006.8744 & 1004.2677 & 998.8720 & 993.5726\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | e | f | g | h | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 1004.1240 | 997.4379 | 997.5697 | 1000.8540 | \n",
"| 2 | 1002.6933 | 998.4041 | 1009.1720 | 1010.4120 | \n",
"| 3 | 995.9138 | 1001.0959 | 1004.6025 | 1002.5405 | \n",
"| 4 | 999.5493 | 998.8054 | 1003.9649 | 1000.0133 | \n",
"| 5 | 1007.2373 | 1006.2580 | 1000.1882 | 992.9980 | \n",
"| 6 | 1000.2068 | 994.7482 | 998.2876 | 1002.7093 | \n",
"| 7 | 999.1622 | 998.6231 | 998.7175 | 998.0497 | \n",
"| 8 | 1003.1263 | 1002.7279 | 1004.1623 | 1000.5204 | \n",
"| 9 | 1003.1548 | 994.2030 | 1002.1614 | 999.3726 | \n",
"| 10 | 1006.8744 | 1004.2677 | 998.8720 | 993.5726 | \n",
"\n",
"\n"
],
"text/plain": [
" key e f g h \n",
"1 1 1004.1240 997.4379 997.5697 1000.8540\n",
"2 2 1002.6933 998.4041 1009.1720 1010.4120\n",
"3 3 995.9138 1001.0959 1004.6025 1002.5405\n",
"4 4 999.5493 998.8054 1003.9649 1000.0133\n",
"5 5 1007.2373 1006.2580 1000.1882 992.9980\n",
"6 6 1000.2068 994.7482 998.2876 1002.7093\n",
"7 7 999.1622 998.6231 998.7175 998.0497\n",
"8 8 1003.1263 1002.7279 1004.1623 1000.5204\n",
"9 9 1003.1548 994.2030 1002.1614 999.3726\n",
"10 10 1006.8744 1004.2677 998.8720 993.5726"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Below is the sample data we will be creating 2 dataframes \n",
"key=(1:10)\n",
"\n",
"#Here we are passing the row names and column names as a list. \n",
"m<- data.frame(matrix(rnorm(40, mean=20, sd=5), nrow=10, ncol=4, dimnames=list((1:10),c(\"a\",\"b\",\"c\",\"d\"))))\n",
"m2<- data.frame(matrix(rnorm(40, mean=1000, sd=5), nrow=10, ncol=4, dimnames=list((1:10),c(\"e\",\"f\",\"g\",\"h\"))))\n",
"\n",
"#This is one way of combining a vector with a dataframe. \n",
"df<- data.frame(key,m)\n",
"df2<- data.frame(key,m2)\n",
"\n",
"#This is another way way of combining a vector with a dataframe. \n",
"dfb<- cbind(key,m)\n",
"df2b<- cbind(key,m2)\n",
"\n",
"df\n",
"df2\n",
"dfb\n",
"df2b\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Merging Columns of Data Frame with another Data Frame\n",
"- Can combine data frame in multiple ways. \n",
"- `merge(a,b,by=\"key\")` where a & b are dataframes with the same keys.\n",
"- `cbind(a,b)` where a & b are dataframes with the same number of rows."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d | e | f | g | h |
\n",
"\n",
"\t 1 | 16.278799 | 23.004297 | 19.262524 | 22.11648 | 1004.4714 | 995.4055 | 1001.5156 | 1004.8862 |
\n",
"\t 2 | 19.287252 | 19.229253 | 18.817575 | 13.67939 | 1000.7399 | 1004.6248 | 995.8950 | 1008.5242 |
\n",
"\t 3 | 18.833623 | 16.142004 | 18.454224 | 18.29241 | 994.3794 | 1002.2578 | 998.6004 | 999.7609 |
\n",
"\t 4 | 23.780847 | 10.934207 | 13.448540 | 17.83936 | 1001.4795 | 1009.0885 | 1002.5866 | 998.8287 |
\n",
"\t 5 | 13.982935 | 16.924402 | 19.037475 | 20.19748 | 993.9745 | 999.9868 | 1001.0336 | 987.3751 |
\n",
"\t 6 | 15.534589 | 23.437320 | 6.795926 | 20.19305 | 996.1284 | 1008.8440 | 1005.5196 | 1003.6926 |
\n",
"\t 7 | 16.660076 | 18.315077 | 32.107139 | 23.35534 | 994.5026 | 1004.9990 | 1004.0972 | 1005.6532 |
\n",
"\t 8 | 19.447799 | 18.278384 | 11.823108 | 13.09162 | 1007.8858 | 993.8745 | 1005.1093 | 996.8686 |
\n",
"\t 9 | 9.225069 | 24.925796 | 13.868021 | 17.06181 | 997.6026 | 1001.1045 | 991.7969 | 1000.5898 |
\n",
"\t10 | 25.809451 | 7.492747 | 18.483003 | 24.99244 | 995.6190 | 1010.2642 | 998.6192 | 998.8618 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllll}\n",
" key & a & b & c & d & e & f & g & h\\\\\n",
"\\hline\n",
"\t 1 & 16.278799 & 23.004297 & 19.262524 & 22.11648 & 1004.4714 & 995.4055 & 1001.5156 & 1004.8862\\\\\n",
"\t 2 & 19.287252 & 19.229253 & 18.817575 & 13.67939 & 1000.7399 & 1004.6248 & 995.8950 & 1008.5242\\\\\n",
"\t 3 & 18.833623 & 16.142004 & 18.454224 & 18.29241 & 994.3794 & 1002.2578 & 998.6004 & 999.7609\\\\\n",
"\t 4 & 23.780847 & 10.934207 & 13.448540 & 17.83936 & 1001.4795 & 1009.0885 & 1002.5866 & 998.8287\\\\\n",
"\t 5 & 13.982935 & 16.924402 & 19.037475 & 20.19748 & 993.9745 & 999.9868 & 1001.0336 & 987.3751\\\\\n",
"\t 6 & 15.534589 & 23.437320 & 6.795926 & 20.19305 & 996.1284 & 1008.8440 & 1005.5196 & 1003.6926\\\\\n",
"\t 7 & 16.660076 & 18.315077 & 32.107139 & 23.35534 & 994.5026 & 1004.9990 & 1004.0972 & 1005.6532\\\\\n",
"\t 8 & 19.447799 & 18.278384 & 11.823108 & 13.09162 & 1007.8858 & 993.8745 & 1005.1093 & 996.8686\\\\\n",
"\t 9 & 9.225069 & 24.925796 & 13.868021 & 17.06181 & 997.6026 & 1001.1045 & 991.7969 & 1000.5898\\\\\n",
"\t 10 & 25.809451 & 7.492747 & 18.483003 & 24.99244 & 995.6190 & 1010.2642 & 998.6192 & 998.8618\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | e | f | g | h | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.278799 | 23.004297 | 19.262524 | 22.11648 | 1004.4714 | 995.4055 | 1001.5156 | 1004.8862 | \n",
"| 2 | 19.287252 | 19.229253 | 18.817575 | 13.67939 | 1000.7399 | 1004.6248 | 995.8950 | 1008.5242 | \n",
"| 3 | 18.833623 | 16.142004 | 18.454224 | 18.29241 | 994.3794 | 1002.2578 | 998.6004 | 999.7609 | \n",
"| 4 | 23.780847 | 10.934207 | 13.448540 | 17.83936 | 1001.4795 | 1009.0885 | 1002.5866 | 998.8287 | \n",
"| 5 | 13.982935 | 16.924402 | 19.037475 | 20.19748 | 993.9745 | 999.9868 | 1001.0336 | 987.3751 | \n",
"| 6 | 15.534589 | 23.437320 | 6.795926 | 20.19305 | 996.1284 | 1008.8440 | 1005.5196 | 1003.6926 | \n",
"| 7 | 16.660076 | 18.315077 | 32.107139 | 23.35534 | 994.5026 | 1004.9990 | 1004.0972 | 1005.6532 | \n",
"| 8 | 19.447799 | 18.278384 | 11.823108 | 13.09162 | 1007.8858 | 993.8745 | 1005.1093 | 996.8686 | \n",
"| 9 | 9.225069 | 24.925796 | 13.868021 | 17.06181 | 997.6026 | 1001.1045 | 991.7969 | 1000.5898 | \n",
"| 10 | 25.809451 | 7.492747 | 18.483003 | 24.99244 | 995.6190 | 1010.2642 | 998.6192 | 998.8618 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d e f g \n",
"1 1 16.278799 23.004297 19.262524 22.11648 1004.4714 995.4055 1001.5156\n",
"2 2 19.287252 19.229253 18.817575 13.67939 1000.7399 1004.6248 995.8950\n",
"3 3 18.833623 16.142004 18.454224 18.29241 994.3794 1002.2578 998.6004\n",
"4 4 23.780847 10.934207 13.448540 17.83936 1001.4795 1009.0885 1002.5866\n",
"5 5 13.982935 16.924402 19.037475 20.19748 993.9745 999.9868 1001.0336\n",
"6 6 15.534589 23.437320 6.795926 20.19305 996.1284 1008.8440 1005.5196\n",
"7 7 16.660076 18.315077 32.107139 23.35534 994.5026 1004.9990 1004.0972\n",
"8 8 19.447799 18.278384 11.823108 13.09162 1007.8858 993.8745 1005.1093\n",
"9 9 9.225069 24.925796 13.868021 17.06181 997.6026 1001.1045 991.7969\n",
"10 10 25.809451 7.492747 18.483003 24.99244 995.6190 1010.2642 998.6192\n",
" h \n",
"1 1004.8862\n",
"2 1008.5242\n",
"3 999.7609\n",
"4 998.8287\n",
"5 987.3751\n",
"6 1003.6926\n",
"7 1005.6532\n",
"8 996.8686\n",
"9 1000.5898\n",
"10 998.8618"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d | key | e | f | g | h |
\n",
"\n",
"\t 1 | 16.278799 | 23.004297 | 19.262524 | 22.11648 | 1 | 1004.4714 | 995.4055 | 1001.5156 | 1004.8862 |
\n",
"\t 2 | 19.287252 | 19.229253 | 18.817575 | 13.67939 | 2 | 1000.7399 | 1004.6248 | 995.8950 | 1008.5242 |
\n",
"\t 3 | 18.833623 | 16.142004 | 18.454224 | 18.29241 | 3 | 994.3794 | 1002.2578 | 998.6004 | 999.7609 |
\n",
"\t 4 | 23.780847 | 10.934207 | 13.448540 | 17.83936 | 4 | 1001.4795 | 1009.0885 | 1002.5866 | 998.8287 |
\n",
"\t 5 | 13.982935 | 16.924402 | 19.037475 | 20.19748 | 5 | 993.9745 | 999.9868 | 1001.0336 | 987.3751 |
\n",
"\t 6 | 15.534589 | 23.437320 | 6.795926 | 20.19305 | 6 | 996.1284 | 1008.8440 | 1005.5196 | 1003.6926 |
\n",
"\t 7 | 16.660076 | 18.315077 | 32.107139 | 23.35534 | 7 | 994.5026 | 1004.9990 | 1004.0972 | 1005.6532 |
\n",
"\t 8 | 19.447799 | 18.278384 | 11.823108 | 13.09162 | 8 | 1007.8858 | 993.8745 | 1005.1093 | 996.8686 |
\n",
"\t 9 | 9.225069 | 24.925796 | 13.868021 | 17.06181 | 9 | 997.6026 | 1001.1045 | 991.7969 | 1000.5898 |
\n",
"\t10 | 25.809451 | 7.492747 | 18.483003 | 24.99244 | 10 | 995.6190 | 1010.2642 | 998.6192 | 998.8618 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllll}\n",
" key & a & b & c & d & key & e & f & g & h\\\\\n",
"\\hline\n",
"\t 1 & 16.278799 & 23.004297 & 19.262524 & 22.11648 & 1 & 1004.4714 & 995.4055 & 1001.5156 & 1004.8862\\\\\n",
"\t 2 & 19.287252 & 19.229253 & 18.817575 & 13.67939 & 2 & 1000.7399 & 1004.6248 & 995.8950 & 1008.5242\\\\\n",
"\t 3 & 18.833623 & 16.142004 & 18.454224 & 18.29241 & 3 & 994.3794 & 1002.2578 & 998.6004 & 999.7609\\\\\n",
"\t 4 & 23.780847 & 10.934207 & 13.448540 & 17.83936 & 4 & 1001.4795 & 1009.0885 & 1002.5866 & 998.8287\\\\\n",
"\t 5 & 13.982935 & 16.924402 & 19.037475 & 20.19748 & 5 & 993.9745 & 999.9868 & 1001.0336 & 987.3751\\\\\n",
"\t 6 & 15.534589 & 23.437320 & 6.795926 & 20.19305 & 6 & 996.1284 & 1008.8440 & 1005.5196 & 1003.6926\\\\\n",
"\t 7 & 16.660076 & 18.315077 & 32.107139 & 23.35534 & 7 & 994.5026 & 1004.9990 & 1004.0972 & 1005.6532\\\\\n",
"\t 8 & 19.447799 & 18.278384 & 11.823108 & 13.09162 & 8 & 1007.8858 & 993.8745 & 1005.1093 & 996.8686\\\\\n",
"\t 9 & 9.225069 & 24.925796 & 13.868021 & 17.06181 & 9 & 997.6026 & 1001.1045 & 991.7969 & 1000.5898\\\\\n",
"\t 10 & 25.809451 & 7.492747 & 18.483003 & 24.99244 & 10 & 995.6190 & 1010.2642 & 998.6192 & 998.8618\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | key | e | f | g | h | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.278799 | 23.004297 | 19.262524 | 22.11648 | 1 | 1004.4714 | 995.4055 | 1001.5156 | 1004.8862 | \n",
"| 2 | 19.287252 | 19.229253 | 18.817575 | 13.67939 | 2 | 1000.7399 | 1004.6248 | 995.8950 | 1008.5242 | \n",
"| 3 | 18.833623 | 16.142004 | 18.454224 | 18.29241 | 3 | 994.3794 | 1002.2578 | 998.6004 | 999.7609 | \n",
"| 4 | 23.780847 | 10.934207 | 13.448540 | 17.83936 | 4 | 1001.4795 | 1009.0885 | 1002.5866 | 998.8287 | \n",
"| 5 | 13.982935 | 16.924402 | 19.037475 | 20.19748 | 5 | 993.9745 | 999.9868 | 1001.0336 | 987.3751 | \n",
"| 6 | 15.534589 | 23.437320 | 6.795926 | 20.19305 | 6 | 996.1284 | 1008.8440 | 1005.5196 | 1003.6926 | \n",
"| 7 | 16.660076 | 18.315077 | 32.107139 | 23.35534 | 7 | 994.5026 | 1004.9990 | 1004.0972 | 1005.6532 | \n",
"| 8 | 19.447799 | 18.278384 | 11.823108 | 13.09162 | 8 | 1007.8858 | 993.8745 | 1005.1093 | 996.8686 | \n",
"| 9 | 9.225069 | 24.925796 | 13.868021 | 17.06181 | 9 | 997.6026 | 1001.1045 | 991.7969 | 1000.5898 | \n",
"| 10 | 25.809451 | 7.492747 | 18.483003 | 24.99244 | 10 | 995.6190 | 1010.2642 | 998.6192 | 998.8618 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d key e f g \n",
"1 1 16.278799 23.004297 19.262524 22.11648 1 1004.4714 995.4055 1001.5156\n",
"2 2 19.287252 19.229253 18.817575 13.67939 2 1000.7399 1004.6248 995.8950\n",
"3 3 18.833623 16.142004 18.454224 18.29241 3 994.3794 1002.2578 998.6004\n",
"4 4 23.780847 10.934207 13.448540 17.83936 4 1001.4795 1009.0885 1002.5866\n",
"5 5 13.982935 16.924402 19.037475 20.19748 5 993.9745 999.9868 1001.0336\n",
"6 6 15.534589 23.437320 6.795926 20.19305 6 996.1284 1008.8440 1005.5196\n",
"7 7 16.660076 18.315077 32.107139 23.35534 7 994.5026 1004.9990 1004.0972\n",
"8 8 19.447799 18.278384 11.823108 13.09162 8 1007.8858 993.8745 1005.1093\n",
"9 9 9.225069 24.925796 13.868021 17.06181 9 997.6026 1001.1045 991.7969\n",
"10 10 25.809451 7.492747 18.483003 24.99244 10 995.6190 1010.2642 998.6192\n",
" h \n",
"1 1004.8862\n",
"2 1008.5242\n",
"3 999.7609\n",
"4 998.8287\n",
"5 987.3751\n",
"6 1003.6926\n",
"7 1005.6532\n",
"8 996.8686\n",
"9 1000.5898\n",
"10 998.8618"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d | key.1 | e | f | g | h |
\n",
"\n",
"\t 1 | 16.278799 | 23.004297 | 19.262524 | 22.11648 | 1 | 1004.4714 | 995.4055 | 1001.5156 | 1004.8862 |
\n",
"\t 2 | 19.287252 | 19.229253 | 18.817575 | 13.67939 | 2 | 1000.7399 | 1004.6248 | 995.8950 | 1008.5242 |
\n",
"\t 3 | 18.833623 | 16.142004 | 18.454224 | 18.29241 | 3 | 994.3794 | 1002.2578 | 998.6004 | 999.7609 |
\n",
"\t 4 | 23.780847 | 10.934207 | 13.448540 | 17.83936 | 4 | 1001.4795 | 1009.0885 | 1002.5866 | 998.8287 |
\n",
"\t 5 | 13.982935 | 16.924402 | 19.037475 | 20.19748 | 5 | 993.9745 | 999.9868 | 1001.0336 | 987.3751 |
\n",
"\t 6 | 15.534589 | 23.437320 | 6.795926 | 20.19305 | 6 | 996.1284 | 1008.8440 | 1005.5196 | 1003.6926 |
\n",
"\t 7 | 16.660076 | 18.315077 | 32.107139 | 23.35534 | 7 | 994.5026 | 1004.9990 | 1004.0972 | 1005.6532 |
\n",
"\t 8 | 19.447799 | 18.278384 | 11.823108 | 13.09162 | 8 | 1007.8858 | 993.8745 | 1005.1093 | 996.8686 |
\n",
"\t 9 | 9.225069 | 24.925796 | 13.868021 | 17.06181 | 9 | 997.6026 | 1001.1045 | 991.7969 | 1000.5898 |
\n",
"\t10 | 25.809451 | 7.492747 | 18.483003 | 24.99244 | 10 | 995.6190 | 1010.2642 | 998.6192 | 998.8618 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllllll}\n",
" key & a & b & c & d & key.1 & e & f & g & h\\\\\n",
"\\hline\n",
"\t 1 & 16.278799 & 23.004297 & 19.262524 & 22.11648 & 1 & 1004.4714 & 995.4055 & 1001.5156 & 1004.8862\\\\\n",
"\t 2 & 19.287252 & 19.229253 & 18.817575 & 13.67939 & 2 & 1000.7399 & 1004.6248 & 995.8950 & 1008.5242\\\\\n",
"\t 3 & 18.833623 & 16.142004 & 18.454224 & 18.29241 & 3 & 994.3794 & 1002.2578 & 998.6004 & 999.7609\\\\\n",
"\t 4 & 23.780847 & 10.934207 & 13.448540 & 17.83936 & 4 & 1001.4795 & 1009.0885 & 1002.5866 & 998.8287\\\\\n",
"\t 5 & 13.982935 & 16.924402 & 19.037475 & 20.19748 & 5 & 993.9745 & 999.9868 & 1001.0336 & 987.3751\\\\\n",
"\t 6 & 15.534589 & 23.437320 & 6.795926 & 20.19305 & 6 & 996.1284 & 1008.8440 & 1005.5196 & 1003.6926\\\\\n",
"\t 7 & 16.660076 & 18.315077 & 32.107139 & 23.35534 & 7 & 994.5026 & 1004.9990 & 1004.0972 & 1005.6532\\\\\n",
"\t 8 & 19.447799 & 18.278384 & 11.823108 & 13.09162 & 8 & 1007.8858 & 993.8745 & 1005.1093 & 996.8686\\\\\n",
"\t 9 & 9.225069 & 24.925796 & 13.868021 & 17.06181 & 9 & 997.6026 & 1001.1045 & 991.7969 & 1000.5898\\\\\n",
"\t 10 & 25.809451 & 7.492747 & 18.483003 & 24.99244 & 10 & 995.6190 & 1010.2642 & 998.6192 & 998.8618\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | key.1 | e | f | g | h | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.278799 | 23.004297 | 19.262524 | 22.11648 | 1 | 1004.4714 | 995.4055 | 1001.5156 | 1004.8862 | \n",
"| 2 | 19.287252 | 19.229253 | 18.817575 | 13.67939 | 2 | 1000.7399 | 1004.6248 | 995.8950 | 1008.5242 | \n",
"| 3 | 18.833623 | 16.142004 | 18.454224 | 18.29241 | 3 | 994.3794 | 1002.2578 | 998.6004 | 999.7609 | \n",
"| 4 | 23.780847 | 10.934207 | 13.448540 | 17.83936 | 4 | 1001.4795 | 1009.0885 | 1002.5866 | 998.8287 | \n",
"| 5 | 13.982935 | 16.924402 | 19.037475 | 20.19748 | 5 | 993.9745 | 999.9868 | 1001.0336 | 987.3751 | \n",
"| 6 | 15.534589 | 23.437320 | 6.795926 | 20.19305 | 6 | 996.1284 | 1008.8440 | 1005.5196 | 1003.6926 | \n",
"| 7 | 16.660076 | 18.315077 | 32.107139 | 23.35534 | 7 | 994.5026 | 1004.9990 | 1004.0972 | 1005.6532 | \n",
"| 8 | 19.447799 | 18.278384 | 11.823108 | 13.09162 | 8 | 1007.8858 | 993.8745 | 1005.1093 | 996.8686 | \n",
"| 9 | 9.225069 | 24.925796 | 13.868021 | 17.06181 | 9 | 997.6026 | 1001.1045 | 991.7969 | 1000.5898 | \n",
"| 10 | 25.809451 | 7.492747 | 18.483003 | 24.99244 | 10 | 995.6190 | 1010.2642 | 998.6192 | 998.8618 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d key.1 e f \n",
"1 1 16.278799 23.004297 19.262524 22.11648 1 1004.4714 995.4055\n",
"2 2 19.287252 19.229253 18.817575 13.67939 2 1000.7399 1004.6248\n",
"3 3 18.833623 16.142004 18.454224 18.29241 3 994.3794 1002.2578\n",
"4 4 23.780847 10.934207 13.448540 17.83936 4 1001.4795 1009.0885\n",
"5 5 13.982935 16.924402 19.037475 20.19748 5 993.9745 999.9868\n",
"6 6 15.534589 23.437320 6.795926 20.19305 6 996.1284 1008.8440\n",
"7 7 16.660076 18.315077 32.107139 23.35534 7 994.5026 1004.9990\n",
"8 8 19.447799 18.278384 11.823108 13.09162 8 1007.8858 993.8745\n",
"9 9 9.225069 24.925796 13.868021 17.06181 9 997.6026 1001.1045\n",
"10 10 25.809451 7.492747 18.483003 24.99244 10 995.6190 1010.2642\n",
" g h \n",
"1 1001.5156 1004.8862\n",
"2 995.8950 1008.5242\n",
"3 998.6004 999.7609\n",
"4 1002.5866 998.8287\n",
"5 1001.0336 987.3751\n",
"6 1005.5196 1003.6926\n",
"7 1004.0972 1005.6532\n",
"8 1005.1093 996.8686\n",
"9 991.7969 1000.5898\n",
"10 998.6192 998.8618"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# This manages the merge by an associated key.\n",
"df3 <- merge(df,df2,by=\"key\")\n",
"# This just does a \"column bind\" \n",
"df4<- cbind(df,df2)\n",
"df5<- data.frame(df,df2)\n",
"df3\n",
"df4\n",
"df5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Merging Rows of Data Frame with another Data Frame\n",
"- `rbind(a,b)` combines rows of data frames of a and b.\n",
"- `rbind(a,b, make.row.names=FALSE)` this will reset the index."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" | key | a | b | c | d |
\n",
"\n",
"\t1 | 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t2 | 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t3 | 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t4 | 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t5 | 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t6 | 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t7 | 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t8 | 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t9 | 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\t11 | 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t21 | 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t31 | 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t41 | 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t51 | 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t61 | 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t71 | 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t81 | 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t91 | 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t101 | 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" & key & a & b & c & d\\\\\n",
"\\hline\n",
"\t1 & 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t2 & 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t3 & 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t4 & 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t5 & 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t6 & 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t7 & 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t8 & 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t9 & 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t10 & 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\t11 & 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t21 & 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t31 & 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t41 & 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t51 & 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t61 & 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t71 & 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t81 & 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t91 & 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t101 & 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | key | a | b | c | d | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"| 11 | 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 21 | 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 31 | 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 41 | 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 51 | 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 61 | 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 71 | 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 81 | 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 91 | 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 101 | 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d \n",
"1 1 16.19873 14.41495 27.73225 15.564688\n",
"2 2 27.76677 18.54772 21.06688 15.697810\n",
"3 3 11.68592 14.91207 22.82086 15.666790\n",
"4 4 16.39982 28.30284 10.97550 19.083633\n",
"5 5 22.16232 16.82574 14.28676 20.162797\n",
"6 6 17.17425 14.36932 18.55487 13.498498\n",
"7 7 20.15380 18.00987 15.99028 14.325000\n",
"8 8 20.68866 12.83505 25.24119 24.538494\n",
"9 9 18.84664 24.01079 12.69775 8.095156\n",
"10 10 16.29913 21.51270 15.14676 23.722103\n",
"11 1 16.19873 14.41495 27.73225 15.564688\n",
"21 2 27.76677 18.54772 21.06688 15.697810\n",
"31 3 11.68592 14.91207 22.82086 15.666790\n",
"41 4 16.39982 28.30284 10.97550 19.083633\n",
"51 5 22.16232 16.82574 14.28676 20.162797\n",
"61 6 17.17425 14.36932 18.55487 13.498498\n",
"71 7 20.15380 18.00987 15.99028 14.325000\n",
"81 8 20.68866 12.83505 25.24119 24.538494\n",
"91 9 18.84664 24.01079 12.69775 8.095156\n",
"101 10 16.29913 21.51270 15.14676 23.722103"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d |
\n",
"\n",
"\t 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\t 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" key & a & b & c & d\\\\\n",
"\\hline\n",
"\t 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\t 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"| 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d \n",
"1 1 16.19873 14.41495 27.73225 15.564688\n",
"2 2 27.76677 18.54772 21.06688 15.697810\n",
"3 3 11.68592 14.91207 22.82086 15.666790\n",
"4 4 16.39982 28.30284 10.97550 19.083633\n",
"5 5 22.16232 16.82574 14.28676 20.162797\n",
"6 6 17.17425 14.36932 18.55487 13.498498\n",
"7 7 20.15380 18.00987 15.99028 14.325000\n",
"8 8 20.68866 12.83505 25.24119 24.538494\n",
"9 9 18.84664 24.01079 12.69775 8.095156\n",
"10 10 16.29913 21.51270 15.14676 23.722103\n",
"11 1 16.19873 14.41495 27.73225 15.564688\n",
"12 2 27.76677 18.54772 21.06688 15.697810\n",
"13 3 11.68592 14.91207 22.82086 15.666790\n",
"14 4 16.39982 28.30284 10.97550 19.083633\n",
"15 5 22.16232 16.82574 14.28676 20.162797\n",
"16 6 17.17425 14.36932 18.55487 13.498498\n",
"17 7 20.15380 18.00987 15.99028 14.325000\n",
"18 8 20.68866 12.83505 25.24119 24.538494\n",
"19 9 18.84664 24.01079 12.69775 8.095156\n",
"20 10 16.29913 21.51270 15.14676 23.722103"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Here we can combine rows with rbind. \n",
"df5<-df\n",
"#The make Row\n",
"df6<-rbind(df,df5)\n",
"df6\n",
"df7<-rbind(df,df5, make.row.names=FALSE)\n",
"df7"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"key | a | b | c | d |
\n",
"\n",
"\t 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\t 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 |
\n",
"\t 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 |
\n",
"\t 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 |
\n",
"\t 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 |
\n",
"\t 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 |
\n",
"\t 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 |
\n",
"\t 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 |
\n",
"\t 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 |
\n",
"\t 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 |
\n",
"\t10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" key & a & b & c & d\\\\\n",
"\\hline\n",
"\t 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\t 1 & 16.19873 & 14.41495 & 27.73225 & 15.564688\\\\\n",
"\t 2 & 27.76677 & 18.54772 & 21.06688 & 15.697810\\\\\n",
"\t 3 & 11.68592 & 14.91207 & 22.82086 & 15.666790\\\\\n",
"\t 4 & 16.39982 & 28.30284 & 10.97550 & 19.083633\\\\\n",
"\t 5 & 22.16232 & 16.82574 & 14.28676 & 20.162797\\\\\n",
"\t 6 & 17.17425 & 14.36932 & 18.55487 & 13.498498\\\\\n",
"\t 7 & 20.15380 & 18.00987 & 15.99028 & 14.325000\\\\\n",
"\t 8 & 20.68866 & 12.83505 & 25.24119 & 24.538494\\\\\n",
"\t 9 & 18.84664 & 24.01079 & 12.69775 & 8.095156\\\\\n",
"\t 10 & 16.29913 & 21.51270 & 15.14676 & 23.722103\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"key | a | b | c | d | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"| 1 | 16.19873 | 14.41495 | 27.73225 | 15.564688 | \n",
"| 2 | 27.76677 | 18.54772 | 21.06688 | 15.697810 | \n",
"| 3 | 11.68592 | 14.91207 | 22.82086 | 15.666790 | \n",
"| 4 | 16.39982 | 28.30284 | 10.97550 | 19.083633 | \n",
"| 5 | 22.16232 | 16.82574 | 14.28676 | 20.162797 | \n",
"| 6 | 17.17425 | 14.36932 | 18.55487 | 13.498498 | \n",
"| 7 | 20.15380 | 18.00987 | 15.99028 | 14.325000 | \n",
"| 8 | 20.68866 | 12.83505 | 25.24119 | 24.538494 | \n",
"| 9 | 18.84664 | 24.01079 | 12.69775 | 8.095156 | \n",
"| 10 | 16.29913 | 21.51270 | 15.14676 | 23.722103 | \n",
"\n",
"\n"
],
"text/plain": [
" key a b c d \n",
"1 1 16.19873 14.41495 27.73225 15.564688\n",
"2 2 27.76677 18.54772 21.06688 15.697810\n",
"3 3 11.68592 14.91207 22.82086 15.666790\n",
"4 4 16.39982 28.30284 10.97550 19.083633\n",
"5 5 22.16232 16.82574 14.28676 20.162797\n",
"6 6 17.17425 14.36932 18.55487 13.498498\n",
"7 7 20.15380 18.00987 15.99028 14.325000\n",
"8 8 20.68866 12.83505 25.24119 24.538494\n",
"9 9 18.84664 24.01079 12.69775 8.095156\n",
"10 10 16.29913 21.51270 15.14676 23.722103\n",
"11 1 16.19873 14.41495 27.73225 15.564688\n",
"12 2 27.76677 18.54772 21.06688 15.697810\n",
"13 3 11.68592 14.91207 22.82086 15.666790\n",
"14 4 16.39982 28.30284 10.97550 19.083633\n",
"15 5 22.16232 16.82574 14.28676 20.162797\n",
"16 6 17.17425 14.36932 18.55487 13.498498\n",
"17 7 20.15380 18.00987 15.99028 14.325000\n",
"18 8 20.68866 12.83505 25.24119 24.538494\n",
"19 9 18.84664 24.01079 12.69775 8.095156\n",
"20 10 16.29913 21.51270 15.14676 23.722103"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `aggregate` and `by`\n",
"- Aggregation is a very important function.\n",
"- Can have variables/analyses that happen at different levels.\n",
"- `by(x, by, FUN)` provides similar functionality."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa\\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | \n",
"|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"1 5.1 3.5 1.4 0.2 setosa \n",
"2 4.9 3.0 1.4 0.2 setosa \n",
"3 4.7 3.2 1.3 0.2 setosa \n",
"4 4.6 3.1 1.5 0.2 setosa \n",
"5 5.0 3.6 1.4 0.2 setosa \n",
"6 5.4 3.9 1.7 0.4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris=read.csv(file=\"../../input/iris.csv\", header=TRUE,sep=\",\")\n",
"head(iris)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" species sepal_length sepal_width petal_length petal_width\n",
"1 setosa 5.006 3.418 1.464 0.244\n",
"2 versicolor 5.936 2.770 4.260 1.326\n",
"3 virginica 6.588 2.974 5.552 2.026\n"
]
},
{
"data": {
"text/plain": [
"iris$species: setosa\n",
"sepal_length sepal_width petal_length petal_width \n",
" 5.006 3.418 1.464 0.244 \n",
"------------------------------------------------------------ \n",
"iris$species: versicolor\n",
"sepal_length sepal_width petal_length petal_width \n",
" 5.936 2.770 4.260 1.326 \n",
"------------------------------------------------------------ \n",
"iris$species: virginica\n",
"sepal_length sepal_width petal_length petal_width \n",
" 6.588 2.974 5.552 2.026 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris<-read.csv(file=\"../../input/iris.csv\", header=TRUE,sep=\",\")\n",
"\n",
"#Aggregate by Species aggregate(x, by, FUN, ...)\n",
"iris.agg<-aggregate(iris[,1:4], by=list(\"species\" = iris$species), mean)\n",
"print(iris.agg)\n",
"\n",
"#Notice this gives us the same output but structured differently. \n",
"by(iris[, 1:4], iris$species, colMeans)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `apply`(plus `lapply`/`sapply`/`tapply`/`rapply`)\n",
"- `apply` - Applying a function to **an array or matrix**, return a vector or array or list of values. `apply(X, MARGIN, FUN, ...)`\n",
"- [`lapply`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html) - Apply a function to **each element of a list or vector**, return a **list**. \n",
"- [`sapply`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html) - A user-friendly version if `lapply`. Apply a function to **each element of a list or vector**, return a **vector**.\n",
"- `tapply` - Apply a function to **subsets of a vector** (and the subsets are defined by some other vector, usually a factor), return a **vector**. \n",
"- `rapply` - Apply a function to **each element of a nested list structure, recursively,** return a list.\n",
"- Some functions aren't vectorized, or you may want to use a function on every row or column of a matrix/data frame, every element of a list, etc.\n",
"- For more info see this [tutorial](https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `apply`\n",
"- `apply` - Applying a function to **an array or matrix**, return a vector or array or list of values. `apply(X, MARGIN, FUN, ...)`\n",
"- If you are using a data frame the data types must all be the same. \n",
"- `apply(X, MARGIN, FUN, ...) where X is an array or matrix. \n",
"- `MARGIN` is a vector giving the where function should be applied. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns.\n",
"- `FUN` is any function. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | sum | mean |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa | 10.2 | 2.550 |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa | 9.5 | 2.375 |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa | 9.4 | 2.350 |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa | 9.4 | 2.350 |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa | 10.2 | 2.550 |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa | 11.4 | 2.850 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" sepal\\_length & sepal\\_width & petal\\_length & petal\\_width & species & sum & mean\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa & 10.2 & 2.550 \\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa & 9.5 & 2.375 \\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa & 9.4 & 2.350 \\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa & 9.4 & 2.350 \\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa & 10.2 & 2.550 \\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa & 11.4 & 2.850 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"sepal_length | sepal_width | petal_length | petal_width | species | sum | mean | \n",
"|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 10.2 | 2.550 | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 9.5 | 2.375 | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 9.4 | 2.350 | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 9.4 | 2.350 | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 10.2 | 2.550 | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 11.4 | 2.850 | \n",
"\n",
"\n"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species sum mean \n",
"1 5.1 3.5 1.4 0.2 setosa 10.2 2.550\n",
"2 4.9 3.0 1.4 0.2 setosa 9.5 2.375\n",
"3 4.7 3.2 1.3 0.2 setosa 9.4 2.350\n",
"4 4.6 3.1 1.5 0.2 setosa 9.4 2.350\n",
"5 5.0 3.6 1.4 0.2 setosa 10.2 2.550\n",
"6 5.4 3.9 1.7 0.4 setosa 11.4 2.850"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- sepal_length
\n",
"\t\t- 5.84333333333333
\n",
"\t- sepal_width
\n",
"\t\t- 3.054
\n",
"\t- petal_length
\n",
"\t\t- 3.75866666666667
\n",
"\t- petal_width
\n",
"\t\t- 1.19866666666667
\n",
"
\n"
],
"text/latex": [
"\\begin{description*}\n",
"\\item[sepal\\textbackslash{}\\_length] 5.84333333333333\n",
"\\item[sepal\\textbackslash{}\\_width] 3.054\n",
"\\item[petal\\textbackslash{}\\_length] 3.75866666666667\n",
"\\item[petal\\textbackslash{}\\_width] 1.19866666666667\n",
"\\end{description*}\n"
],
"text/markdown": [
"sepal_length\n",
": 5.84333333333333sepal_width\n",
": 3.054petal_length\n",
": 3.75866666666667petal_width\n",
": 1.19866666666667\n",
"\n"
],
"text/plain": [
"sepal_length sepal_width petal_length petal_width \n",
" 5.843333 3.054000 3.758667 1.198667 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris<-read.csv(file=\"../../input/iris.csv\", header=TRUE,sep=\",\")\n",
"iris$sum<-apply(iris[1:4], 1, sum) #This provides a sum across for each row. \n",
"iris$mean<-apply(iris[1:4], 1, mean)#This provides a mean across collumns for each row. \n",
"head(iris)\n",
"apply(iris[1:4], 2, mean)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `lapply` & `sapply`\n",
"- [`lapply`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html) - Apply a function to **each element of a list or vector**, return a **list**.\n",
"- `lapply(X, FUN, ...)`\n",
"- [`sapply`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html) - A user-friendly version if `lapply`. Apply a function to **each element of a list or vector**, return a **vector**.\n",
"- `sapply(X, FUN, ...)`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'list'"
],
"text/latex": [
"'list'"
],
"text/markdown": [
"'list'"
],
"text/plain": [
"[1] \"list\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"$count\n",
"[1] 15\n",
"\n",
"$numbers\n",
"[1] 45\n",
"\n",
"$numbers\n",
"[1] 45\n",
"\n",
"$count\n",
"[1] 15\n",
"\n"
]
}
],
"source": [
"# create a list with 2 elements\n",
"sample <- list(\"count\" = 1:5, \"numbers\" =5:10)\n",
"\n",
"# sum each and return as a list. \n",
"sample.sum<-lapply(sample, sum)\n",
"\n",
"class(sample.sum)\n",
"print(c(sample.sum, sample.sum[\"numbers\"],sample.sum[\"count\"]))\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"'integer'"
],
"text/latex": [
"'integer'"
],
"text/markdown": [
"'integer'"
],
"text/plain": [
"[1] \"integer\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" count numbers numbers count \n",
" 15 45 45 15 15 \n"
]
}
],
"source": [
"# create a list with 2 elements\n",
"sample <- list(\"count\" = 1:5, \"numbers\" =5:10)\n",
"\n",
"# sum each and return as a list. \n",
"sample.sum<-sapply(sample, sum)\n",
"\n",
"class(sample.sum)\n",
"print(c(sample.sum, sample.sum[\"numbers\"],sample.sum[\"count\"],sample.sum[[\"count\"]]))\n",
"\n",
"#Note the differenece between #sample.sum[[\"count\"]] and sample.sum[\"count\"]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 1
\n",
"\t- 4
\n",
"\t- 9
\n",
"\t- 16
\n",
"\t- 25
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 4\n",
"\\item 9\n",
"\\item 16\n",
"\\item 25\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 4\n",
"3. 9\n",
"4. 16\n",
"5. 25\n",
"\n",
"\n"
],
"text/plain": [
"[1] 1 4 9 16 25"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- 1
\n",
"\t- 4
\n",
"\t- 9
\n",
"\t- 16
\n",
"\t- 25
\n",
"\t- 36
\n",
"\t- 49
\n",
"\t- 64
\n",
"\t- 81
\n",
"\t- 100
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 4\n",
"\\item 9\n",
"\\item 16\n",
"\\item 25\n",
"\\item 36\n",
"\\item 49\n",
"\\item 64\n",
"\\item 81\n",
"\\item 100\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 4\n",
"3. 9\n",
"4. 16\n",
"5. 25\n",
"6. 36\n",
"7. 49\n",
"8. 64\n",
"9. 81\n",
"10. 100\n",
"\n",
"\n"
],
"text/plain": [
" [1] 1 4 9 16 25 36 49 64 81 100"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\t- 1
\n",
"\t- 4
\n",
"\t- 9
\n",
"\t- 16
\n",
"\t- 25
\n",
"\t- 36
\n",
"\t- 49
\n",
"\t- 64
\n",
"\t- 81
\n",
"\t- 100
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 4\n",
"\\item 9\n",
"\\item 16\n",
"\\item 25\n",
"\\item 36\n",
"\\item 49\n",
"\\item 64\n",
"\\item 81\n",
"\\item 100\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 4\n",
"3. 9\n",
"4. 16\n",
"5. 25\n",
"6. 36\n",
"7. 49\n",
"8. 64\n",
"9. 81\n",
"10. 100\n",
"\n",
"\n"
],
"text/plain": [
" [1] 1 4 9 16 25 36 49 64 81 100"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# We can also utilize simple \n",
"square<-function(x) x^2\n",
"square(1:5)\n",
"\n",
"# We can use our own function here. \n",
"sapply(1:10, square)\n",
"\n",
"#We can also specify the function directly in sapply.\n",
"sapply(1:10, function(x) x^2)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `tapply` \n",
"- `tapply` - Apply a function to subsets of a vector (and the subsets are defined by some other vector, usually a factor), return a vector.\n",
"- Can do something similar to aggregate. "
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" setosa versicolor virginica \n",
" 5.006 5.936 6.588 \n"
]
}
],
"source": [
"#Tapply example\n",
"#tapply(X, INDEX, FUN, …) \n",
"#X = a vector, INDEX = list of one or more factor, FUN = Function or operation that needs to be applied. \n",
"iris<-read.csv(file=\"../../input/iris.csv\", header=TRUE,sep=\",\")\n",
"iris.sepal_length.agg<-tapply(iris$sepal_length, iris$species, mean)\n",
"print(iris.sepal_length.agg)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## CREDITS\n",
"\n",
"\n",
"Copyright [AnalyticsDojo](http://rpi.analyticsdojo.com) 2016.\n",
"This work is licensed under the [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license agreement.\n",
"This work is adopted from the Berkley R Bootcamp. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {
"attach-environment": true,
"environment": "py3k",
"summary": "Introduction and Overview of Pything",
"url": "https://anaconda.org/analyticsdojo/intro-python-overview"
},
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}