H-indexes of CRAN package maintainers. Read to the end to find your ownh-index. Bookdown: Authoring Books and Technical Documents with R Markdown. This book explains how to use bookdown to write books and technical documents. The bookdown package is built on top.
This is the website for Tidy Modeling with R. This book is a guide to using a new collection of software in the R programming language for model building, and it has two main goals:
First and foremost, this book provides an introduction to how to use our software to create models. We focus on a dialect of R called the tidyverse that is designed to be a better interface for common tasks using R. If you’ve never heard of or used the tidyverse, Chapter 2 provides an introduction. In this book, we demonstrate how the tidyverse can be used to produce high quality models. The tools used to do this are referred to as the tidymodels packages.
Second, we use the tidymodels packages to encourage good methodology and statistical practice. Many models, especially complex predictive or machine learning models, can work very well on the data at hand but may fail when exposed to new data. Often, this issue is due to poor choices made during the development and/or selection of the models. Whenever possible, our software, documentation, and other materials attempt to prevent these and other pitfalls.
This book is not intended to be a comprehensive reference on modeling techniques; we suggest other resources to learn such nuances. For general background on the most common type of model, the linear model, we suggest Fox (2008). For predictive models, Kuhn and Johnson (2013) is a good resource. Also, Kuhn and Johnson (2020) is referenced heavily here, mostly because it is freely available online. For machine learning methods, Goodfellow, Bengio, and Courville (2016) is an excellent (but formal) source of information. In some cases, we describe models that are used in this text but in a way that is less mathematical, and hopefully more intuitive.
Investigating and analyzing data are an important part of the model process, and an excellent resource on this topic is Wickham and Grolemund (2016).
We do not assume that readers have extensive experience in model building and statistics. Some statistical knowledge is required, such as random sampling, variance, correlation, basic linear regression, and other topics that are usually found in a basic undergraduate statistics or data analysis course.
Tidy Modeling with R is currently a work in progress. As we create it, this website is updated. Be aware that, until it is finalized, the content and/or structure of the book may change.
This openness also allows users to contribute if they wish. Most often, this comes in the form of correcting typos, grammar, and other aspects of our work that could use improvement. Instructions for making contributions can be found in the
contributing.md file. Also, be aware that this effort has a code of conduct, which can be found at
The tidymodels packages are fairly young in the software lifecycle. We will do our best to maintain backwards compatibility and, at the completion of this work, will archive and tag the specific versions of software that were used to produce it.
This book was written in RStudio using bookdown. The
tmwr.org website is hosted via Netlify, and automatically built after every push by GitHub Actions. The complete source is available on GitHub. We generated all plots in this book using ggplot2 and its black and white theme (
theme_bw()). This version of the book was built with R version 4.0.5 (2021-03-31), pandoc version 2.7.3, and the following packages:
|finetune||0.0.1.9000||Github (tidymodels/[email protected])|
|nlme||3.1-152||CRAN (R 4.0.5)|
|nnet||7.3-15||CRAN (R 4.0.5)|
|rpart||4.1-15||CRAN (R 4.0.5)|
|tidymodels||0.1.3.9000||Github (tidymodels/[email protected])|
|tidyposterior||0.1.0.9000||Github (tidymodels/[email protected])|
This site contains supplemental materials for Stat 1201, mainly: 1) clarifications on which sections we cover in the textbook (Devore, Probability and Statistics for Engineering and the Sciences9th edition), 2) R code, and 3) links to helpful resources online. It is not in any way a substitute for materials available in CourseWorks.
If you find additional online resources that are helpful to this class, please create an issue or send me an email and I’ll add them to this resource. Let me know as well if you find any typos or other mistakes.
Note that while you’re encouraged to look ahead, be sure to circle back to those sections when they’re covered in class since content may be added or modified slightly.
General study tips
The website for the book Make It Stick offers a summary of the experimentally tested study strategies. The tl;dr is:
working out problems is better than reviewing notes / textbook
doing mixed reviews is better than focusing on one type of problem at a time
learning is hard work; if it seems too easy your study strategy might not be the most effective
making mistakes and learning from them is a useful strategy (don’t wait until you’ve mastered all of the examples to try a problem)
Using Rstudio Bookdown
You’ve likely heard a lot of these ideas before, but it’s worth really thinking about them and putting them into practice.
As you’re reading the textbook or working on a problem set, keep a list of questions. Challenge yourself by thinking about how the problem would differ if you changed the setup.
Try creating your own questions and solving them.
Try solving problems in multiple ways.
Learn from a variety of sources: class, textbook, Cartoon Guide, etc. If you find differences, ask.
Github Rstudio Bookdown
You will need to install two applications: R and RStudio:
- R – the programming language itself – is available here:
- RStudio – an integrated development environment (IDE) which makes it much easier to use R. It is optional but highly recommended. This is the app you will open to use R. Choose the free version of RStudio Desktop:
Getting Started with R: Working in the Console
The first step in getting started is getting comfortable working in the RStudio console. It works like a calculator in the sense that your work is not saved. Do the following:
Quick review of material covered in the video, plus additional examples
Working in the console pane is similar to a using a calculator: each line of code is executed when you press enter. Note that your work is not saved with this approach.
Assigning a variable
Drawing a stem and leaf plot
Working with vectors:
Read and try the examples in Chapter 1 of Introduction to R
Creating Graphs, Saving your work
Saving code as an .R file
(Also covered in video above)
Saving with this method saves only the code, not the output. Below are two methods for creating .html documents that contain both code and output:
Convert .R file to .html
(Also covered in video above)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.