Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with archaeology data in R.
This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from R.
This lesson assumes no prior knowledge of R or RStudio and no programming experience.
Data Carpentry’s teaching is hands-on, and to follow this lesson learners must have R and RStudio installed on their computers. They also need to be able to install a number of R packages, create directories, and download files.
To avoid troubleshooting during the lesson, learners should follow the instruction below to download and install everything beforehand. If they are using their own computers this should be no problem, but if the computer is managed by their organization’s IT department they might need help from an IT administrator.
R and RStudio are two separate pieces of software:
If you don’t already have R and RStudio installed, follow the instructions for your operating system below. You have to install R before you install RStudio.
.exe
file that was just downloaded.pkg
file for the latest R versionsudo apt-get install r-base
, and for Fedora sudo yum install R
), but we don’t recommend this approach as the versions provided by this are usually out of date. In any case, make sure you have at least R 3.3.1.sudo dpkg -i rstudio-x.yy.zzz-amd64.deb
at the terminal).If you already have R and RStudio installed, first check if your R version is up to date:
sessionInfo()
into the console. If your R version is 4.0.0 or later, you don’t need to update R for this lesson. If your version of R is older than that, download and install the latest version of R from the R project website for Windows, for MacOS, or for LinuxTools > Global Options > General > Basic
.installr
that can help you with upgrading your R version and migrate your package library.To update RStudio to the latest version, open RStudio and click on Help > Check for Updates
. If a new version is available follow the instruction on screen. By default, RStudio will also automatically notify you of new versions every once in a while.
During the course we will need a number of R packages. Packages contain useful R code written by other people. We will use the packages tidyverse
, hexbin
, patchwork
, and RSQLite
.
To try to install these packages, open RStudio and copy and paste the following command into the console window (look for a blinking cursor on the bottom left), then press the Enter (Windows and Linux) or Return (MacOS) to execute the command.
Alternatively, you can install the packages using RStudio’s graphical user interface by going to Tools > Install Packages
and typing the names of the packages separated by a comma.
R tries to download and install the packages on your machine. When the installation has finished, you can try to load the packages by pasting the following code into the console:
If you do not see an error like there is no package called ‘...’
you are good to go!
Generally, it is recommended to keep your R version and all packages up to date, because new versions bring improvements and important bugfixes. To update the packages that you have installed, click Update
in the Packages
tab in the bottom right panel of RStudio, or go to Tools > Check for Package Updates...
.
Sometimes, package updates introduce changes that break your old code, which can be very frustrating. To avoid this problem, you can use a package called renv
. It locks the package versions you have used for a given project and makes it straightforward to reinstall those exact package version in a new environment, for example after updating your R version or on another computer. However, the details are outside of the scope of this lesson.
We will download the data directly from R during the lessons. However, if you are expecting problems with the network, it may be better to download the data beforehand and store it on your machine.
The data files for the lesson can be downloaded manually here: https://doi.org/10.5281/zenodo.6478180
The list of contributors to this lesson is available here.
Page built on: 📆 2022-05-18 ‒ 🕢 12:38:06
Data Carpentry, 2014-2021.
Questions? Feedback?
Please file
an issue on GitHub.
On Twitter: @datacarpentry
If this lesson is useful to you, consider