R Quick Start Guide for the Fish Scientist

Montana Cooperative Fishery Research Unit

Authors

Michelle Briggs

Robert Eckelbecker

Katie Furey

Christopher Guy

Drew MacDonald

Tori Ogolin

Hannah Stapleton

Cody Vender

Published

September 18, 2023

What is this Document

We wrote this document for incoming students that have no or little experience with data management, R, and RStudio. We think of this document as a beginner’s guide or quick start guide. We remember struggling with some of the most basic aspects of file management and R. We hope to help you avoid those hurdles. Given we see this as a beginner’s guide or quick start guide, we are trying to keep the text to a minimum for each section. This document is not a comprehensive guide to using R, RStudio, or analyzing fisheries data.

This is a living document and we hope to continue to update the document as we receive comments and new ideas.

Load R and RStudio

We use R with RStudio. RStudio is an integrated development environment for R and Python. You can use R without RStudio but you will find it very frustrating unless you are a serious coder. You can load R from the CRAN site for your operating system https://cran.r-project.org/. We think it is a good idea to check the CRAN site periodically for R updates and keep your R and RStudio up to date. For RStudio desktop you can load it from the RStudio website https://www.rstudio.com/products/rstudio/.

Once R and RStudio are loaded you need to do a couple things in RStudio. First, it is our understanding that you do not want to have R save the Workspace. In RStudio, go to Preferences and in the General pane under Workspace uncheck “Restore .RData into workspace at startup” and select Never in the “Save workspace to .RData on exit”. If you do not do this all your current variables and functions are saved to .RData and when you reopen R from that working directory the workspace will be loaded with the variables and functions. We prefer to have a clean workspace when we reopen R and RStudio so we know exactly what we are starting with each time. It avoids confusion when you have many variables and functions.

Second, you should customize RStudio to your liking. In Preferences select the Appearance pane you can select a RStudio theme, Editor font, Editor font size, and Editor theme. In the Pane Layout pane you can select how you want the layout of the panels. There may be a time when you want to add a column and you can do this by clicking the “Add Column” button. For example, we find it helpful sometimes to have two Source panes.

RStudio Projects

We think it is a good habit to create RStudio Projects for your analyses. RStudio Projects help with workflow and keep all the files associate with a project (using relative file paths) together and allow for version control using GIT (see Version Control) to name a few benefits. Having all the files for a project “connected” is important when you are working on complex research projects and when you may need to leave a project for a bit and come back. In that case, the RStudio Project has everything in one location for you. In addition, RStudio Projects make sharing R scripts among colleagues easy. That is, if you send a colleague a RStudio Project and the data files they can run the code as you did and obtain the exact same results. In the past we would use set working directory ‘setwd’ and if that was not the same on a colleagues computer they couldn’t run the code, which was frustrating. It is advised to avoid the ‘setwd’ approach to file management. RStudio Projects really help with workflow and we think it should be the default for all R users.

You can have as many Projects as you want, but we usually have one Project for a research project or major analysis within a research project. Setting up a RStudio Project is very simple. First, make a folder on your computer or SharePoint (e.g., OneDrive) where you want the Project to be stored. Second, in RStudio go to File and the second line down it should say “New Project.” Click on New Project and follow the prompts, that is it. The Project file contains information that helps maintain all the associated files. Keep in mind a RStudio Project is not a R script, it is simply a “file management” file.

Version Control

You may want to skip this section if you a new to R. When using R projects you can have set version control by going to Tools and then click on Version Control and follow the prompts. See here for more information: https://nceas.github.io/oss-lessons/version-control/4-getting-started-with-git-in-RStudio.html#version_control_using_rstudio You can also link to your own GitHub site, but we don’t have that figured out yet.

Quarto (was R Markdown)

This html file was created using Quarto, which is the “new” version of R Markdown. R Markdown will continue to be supported by RStudio but any new features will only be in Quarto. To learn more about Quarto see https://quarto.org. We use Quarto for all our analyses even if we do not initially plan to render it to a html, pdf, or Word document. It makes that option available in the future if we want to share the output with a colleague in a document form. The coding is the same, you simply write your code in chunks. You can then add narrative above or below the chunks to explain the output. There are a lot of advantages to using Quarto.

The screen shot below shows the YAML header between the “---” in Quarto, which is a must and spacing is very important. If your Quarto file won’t render it is mostly likely because your YAML has a typo. You can also see we have an outline, that is because we are using headers as indicated by the “##”. This makes navigating the file very simple because you can click on a header to jump to that section.

We think making a html output with a table of contents makes for great reporting, as in this document.

You may want to check the Render on Save button to save you a couple clicks.

The screen shot below shows the chunk that contains the code and any information you want specific to that chunk, which is preceded by the #| You can add a chunk at any time by clicking on the Green square with the plus symbol in the upper right corner of the Source pane.

Packages and Libraries

What in the heck are packages and libraries? Packages are a collection of R functions, code, and sometimes sample data that someone put together to help you wrangle data, conduct analyses, or graph to name a few. This is the real beauty of R because it is a world community of people helping each other and is probably why R is so popular. There were about 16,000 packages available as of November 2020. One package we use every time is called “tidyverse”. It is a collection of R functions and other packages that we find very useful for data analyses and graphing. To obtain an R package, we go to the Tools tab in RStudio and click on “Install Packages.” Type in the name of the package you want to install and you should see the package being install in the Console window.

Now that you have a package installed you need to tell R you want to use that package in your analysis. You do this by loading the library in your R session (e.g., “library(tidyverse)”). If you close R and reopen you need to reload your package in your R session because we are not saving the R session (see second paragraph in [R and RStudio]). Below we loaded two packages (tidyverse and FSA) in my library. Note that loading tidyverse also loaded additional packages and there are some conflicts with other packages. For the FSA package you have additional information about citing the package. You should always cite the packages you using in your publications and reports. Finally, it is a good idea to update your packages, which is easily done under the Tools tab in RStudio.

library (psych) #some good summary stats functions
library (tidyverse) #has my favorite graphing package in it: ggplot2
library (FSA) #fisheries stock assessment package by Dr. Derek Ogle
library (DescTools) #some good summary stats functions
library (confintr) #calculates confidence intervals for many parameters
library (kableExtra) #for making tables
Important

Take note of the objects that are masked by packages. The order of the libraries will alter what is masked. You can call an object specific to a package by using package name::object.

File Structure, File and Variable Naming

We recommend having your directory setup such that the data, figures, and perhaps scripts are within folders in your working directory. It is also a good idea to pick a standardization for folder and file naming. We think this is good practice beyond naming files for R, we use the same file naming method for folders and Word documents. We like to use snake case (fish_data_050922) because it is easier for me to read than camel case (fishData050922) or pascal case (FishData050922). You will notice the similarity in the naming is the lack of spaces. It is good practice in coding to avoid putting spaces in names. We also like to add the date that we worked on the file at the end. For me, this helps with version control. There are much better ways to keep track of versions, such as GIT, but we have yet to mastered that with R and RStudio (although see Version Control). Disk space is rarely an issue anymore and my data files are often not that large so having multiple versions with a date at the end is often not problematic.

Data File Structure in Excel

If you are using Excel to input your “raw” data, then you need to be careful about how you create the spreadsheet. You want to have a very simple spreadsheet. That is, avoid complex headers, complex names, and any analyses within the spreadsheet. If you do not keep it simple, you will likely encounter errors when loading the file into R. The spreadsheet below is what mine typically look like. Notice the snake case for the variable naming and everything is lower case. This helps avoid typos and errors in my code. We usually try to only have one sheet, but you can have multiple sheets and load each sheet into R. One of the largest time sinks when working with someone’s data is formatting the spreadsheet to work in R. Also, avoid having more than one tab per excel file. You can import multiple tabs in Excel but it can make tracking files a bit more difficult.

Importing Data

We think the first thing most people want to do is import data and start working on data they have in an excel spreadsheet. For example, “I have a data spreadsheet in excel and I want to bring it into R and calculate the median and quantiles, how do I do that?” One of the most frustrating things for us when we were first learning R was how the heck do we get data into R. There are several ways, but we think the most fool proof is to save the excel file as a .csv, then use read.csv. You can load directly from an excel file (use readxl package), which has become less problematic through time because we believe the package to load excel files has improved. Below we will show you how to import some data using both methods.

You must download my files and place them in a folder called “data” in the same location as the R project. It should look like this in the files tab (the excel files are in the data folder).

Now that the data are in the data folder let’s load them into R.

library(readxl)

fish_data_csv <- read.csv("data/lake_trout_catch_net.csv")

fish_data_xlsx <- read_xlsx ("data/lake_trout_catch_net.xlsx", na="NA") #because the data has NA (for data not available in some rows we need the na="NA")

It really is that simple. If you look in the Environment tab in RStudio you should see two Data files.

Coding

Check out https://style.tidyverse.org/ for tips on good coding. There is also the styler and lintr packages to support the style guide presented in the style.tidyverse link.

Data Wrangling

Common data wrangling

The beauty and the frustration with R is that you can do the same analyses (e.g., data wrangling and graphing) with different code structure and packages. The examples we give below are what work for me and may not be the most efficient. You may have different ways to conduct an analysis and that is fine because the beauty of R is that all the analytical steps are recorded in your script. We am always learning new ways to analyze and graph data, which is fun for me. Please reach out if you find a mistake or a more efficient method of an example we present below.

Each of the grey boxes in the html document represent a chunk, that is chunk of code that we thought was worth separating out to make it more readable. You will notice in the tan boxes there is text preceded by a “#”, that tells R to ignore that text. The text following the “#” is what we use to annotate my code so we can easily recall what we did. Think of it as notes. Below we will get the structure of the fish_data data frame and calculate the quantile and min-max values for all the variables in the data frame fish_data. This is always my first step because we can see how R defines my variables into data types (i.e., as a character, numeric (contains decimal and whole number), or integer (contains only whole number)) and look for mistakes in my data by evaluating the min-max values.

Checking the Data

Recall above we loaded two files, one from a .csv and called it fish_data_csv and one from a .xlsx and called it fish_data_xlsx. Let’s start playing with the fish_data_csv file. You can see the files in the Environment tab.

Tip

If you click on the the rectangle icon to the right it will open the file in R in a new tab.

Below are a few things I do to always check the data before I start any analyses. The annotations within the chunk briefly describe what each line of code is doing.

str(fish_data_csv) # check out the structure and data types
'data.frame':   2351 obs. of  19 variables:
 $ lift                 : chr  "1" "1" "1" "1" ...
 $ net                  : num  1 2 3 4 5 6 6.5 7 7.5 8 ...
 $ date                 : chr  "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
 $ year                 : int  10 10 10 10 10 10 10 10 10 10 ...
 $ week                 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ north_South          : chr  "South" "South" "South" "South" ...
 $ temp                 : num  64 64 64 64 64 64 64 64 64 64 ...
 $ morning_evening_night: chr  "morning" "morning" "morning" "morning" ...
 $ set_time             : chr  "4:40" "4:40" "4:40" "4:40" ...
 $ pull_time            : chr  "8:35" "8:35" "8:35" "8:35" ...
 $ duration             : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mesh_inch            : num  2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
 $ mesh                 : num  63.5 50.8 57.1 50.8 63.5 ...
 $ depth                : num  101 96 92 80.5 84 73 83 68 83 84.5 ...
 $ lat                  : chr  "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
 $ long                 : chr  "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
 $ total_catch          : int  2 38 8 6 5 2 7 14 120 154 ...
 $ catch_200            : int  2 38 8 6 5 0 0 14 120 154 ...
 $ catch_effort         : num  NA NA NA NA NA NA NA NA NA NA ...
head(fish_data_csv, n = 10) # shows first 10 lines, you can pick you own n
   lift net    date year week north_South temp morning_evening_night set_time
1     1 1.0 8/23/10   10    1       South   64               morning     4:40
2     1 2.0 8/23/10   10    1       South   64               morning     4:40
3     1 3.0 8/23/10   10    1       South   64               morning     4:40
4     1 4.0 8/23/10   10    1       South   64               morning     4:40
5     1 5.0 8/23/10   10    1       South   64               morning     4:40
6     1 6.0 8/23/10   10    1       South   64               morning     4:40
7     1 6.5 8/23/10   10    1       South   64               morning     4:40
8     1 7.0 8/23/10   10    1       South   64               morning     4:40
9     1 7.5 8/23/10   10    1       South   64               morning     4:40
10    1 8.0 8/23/10   10    1       South   64               morning     4:40
   pull_time duration mesh_inch  mesh depth       lat       long total_catch
1       8:35       NA      2.50 63.50 101.0 N47.55714 W113.51997           2
2       8:35       NA      2.00 50.80  96.0 N47.55502 W113.51975          38
3       8:35       NA      2.25 57.15  92.0 N47.92443 W113.86772           8
4       8:35       NA      2.00 50.80  80.5 N47.92434 W113.86467           6
5       8:35       NA      2.50 63.50  84.0 N47.92251 W113.86462           5
6       8:35       NA      1.50 38.10  73.0 N47.92295 W113.86238           2
7       8:35       NA      1.50 38.10  83.0 N47.92062 W113.86237           7
8       8:35       NA      1.75 44.45  68.0 N47.92151 W113.86058          14
9       8:35       NA      1.75 44.45  83.0 N47.92033 W113.85909         120
10      8:35       NA      2.00 50.80  84.5 N47.91946 W113.85773         154
   catch_200 catch_effort
1          2           NA
2         38           NA
3          8           NA
4          6           NA
5          5           NA
6          0           NA
7          0           NA
8         14           NA
9        120           NA
10       154           NA
tail(fish_data_csv, n = 10) # shows last 10 lines, you can pick your own n
     lift net   date year week north_South temp morning_evening_night set_time
2342   22 267 9/2/16   16    3       South   62             overnight    17:30
2343   22 268 9/2/16   16    3       South   62             overnight    17:30
2344   22 269 9/2/16   16    3       South   62             overnight    17:30
2345   22 270 9/2/16   16    3       South   62             overnight    17:30
2346   22 271 9/2/16   16    3       South   62             overnight    17:30
2347   22 272 9/2/16   16    3       South   62             overnight    17:30
2348   22 273 9/2/16   16    3       South   62             overnight    17:30
2349   22 274 9/2/16   16    3       South   62             overnight    17:30
2350   22 275 9/2/16   16    3       South   62             overnight    17:30
2351   22 276 9/2/16   16    3       South   62             overnight    17:30
     pull_time duration mesh_inch  mesh depth      lat      long total_catch
2342      9:47    16.28      2.25 57.15    80 47.92604 113.86982          25
2343     10:00    16.50      1.50 38.10    86 47.92611 113.86729           5
2344     10:10    16.67      2.50 63.50    76 47.92419 113.86733          14
2345     10:21    16.85      2.00 50.80    88 47.92559  113.8643          23
2346     10:32    17.03      1.75 44.45    80 47.92386 113.86494          12
2347     10:42    17.20      1.50 38.10    81 47.92359 113.86312          16
2348     10:53    17.38      2.25 57.15    86 47.92379 113.86153           8
2349     11:04    17.57      2.00 50.80    83 47.92251 113.86059          26
2350     11:16    17.77      2.75 69.85    85 47.92141 113.85925          21
2351     11:26    17.93      1.75 44.45    81 47.92054 113.85809          15
     catch_200 catch_effort
2342        25         1.54
2343         4         0.24
2344        14         0.84
2345        23         1.36
2346        12         0.70
2347         9         0.52
2348         8         0.46
2349        26         1.48
2350        21         1.18
2351        15         0.84
summary(fish_data_csv) # Quartiles from base R
     lift                net            date                year      
 Length:2351        Min.   :  1.0   Length:2351        Min.   :10.00  
 Class :character   1st Qu.: 84.0   Class :character   1st Qu.:11.00  
 Mode  :character   Median :169.0   Mode  :character   Median :13.00  
                    Mean   :172.2                      Mean   :12.83  
                    3rd Qu.:252.0                      3rd Qu.:14.00  
                    Max.   :399.0                      Max.   :16.00  
                                                                      
      week       north_South             temp       morning_evening_night
 Min.   :1.000   Length:2351        Min.   :58.00   Length:2351          
 1st Qu.:1.000   Class :character   1st Qu.:62.00   Class :character     
 Median :2.000   Mode  :character   Median :64.00   Mode  :character     
 Mean   :1.997                      Mean   :63.83                        
 3rd Qu.:3.000                      3rd Qu.:66.00                        
 Max.   :3.000                      Max.   :73.00                        
                                    NA's   :12                           
   set_time          pull_time            duration        mesh_inch    
 Length:2351        Length:2351        Min.   : 2.000   Min.   :1.500  
 Class :character   Class :character   1st Qu.: 3.580   1st Qu.:1.750  
 Mode  :character   Mode  :character   Median : 4.742   Median :2.000  
                                       Mean   : 7.004   Mean   :2.155  
                                       3rd Qu.:10.080   3rd Qu.:2.250  
                                       Max.   :26.980   Max.   :5.000  
                                       NA's   :19       NA's   :1      
      mesh            depth           lat                long          
 Min.   : 38.10   Min.   : 34.0   Length:2351        Length:2351       
 1st Qu.: 44.45   1st Qu.: 87.0   Class :character   Class :character  
 Median : 50.80   Median :100.0   Mode  :character   Mode  :character  
 Mean   : 54.74   Mean   :103.1                                        
 3rd Qu.: 57.20   3rd Qu.:121.0                                        
 Max.   :127.00   Max.   :170.0                                        
 NA's   :1        NA's   :12                                           
  total_catch       catch_200       catch_effort   
 Min.   :  0.00   Min.   :  0.00   Min.   : 0.000  
 1st Qu.:  4.00   1st Qu.:  3.00   1st Qu.: 0.540  
 Median : 11.00   Median : 11.00   Median : 1.720  
 Mean   : 22.04   Mean   : 21.09   Mean   : 3.776  
 3rd Qu.: 26.00   3rd Qu.: 25.00   3rd Qu.: 4.605  
 Max.   :554.00   Max.   :378.00   Max.   :57.310  
                  NA's   :1        NA's   :19      
describe(fish_data_csv) # summary stats using the pysch package
                       vars    n    mean     sd  median trimmed    mad  min
lift*                     1 2351   15.97   9.12   15.00   15.96  11.86  1.0
net                       2 2351  172.22 103.61  169.00  169.27 124.54  1.0
date*                     3 2351   60.07  33.54   61.00   60.13  41.51  1.0
year                      4 2351   12.83   1.92   13.00   12.79   2.97 10.0
week                      5 2351    2.00   0.81    2.00    2.00   1.48  1.0
north_South*              6 2351    1.52   0.50    2.00    1.53   0.00  1.0
temp                      7 2339   63.83   2.71   64.00   63.77   2.97 58.0
morning_evening_night*    8 2351    2.74   1.66    3.00    2.67   2.97  1.0
set_time*                 9 2346   35.03  17.98   30.00   35.28  22.24  1.0
pull_time*               10 2346  346.73 196.22  333.50  352.88 252.78  1.0
duration                 11 2332    7.00   4.45    4.74    6.35   2.61  2.0
mesh_inch                12 2350    2.16   0.67    2.00    2.05   0.37  1.5
mesh                     13 2350   54.74  16.95   50.80   52.03   9.41 38.1
depth                    14 2339  103.06  20.13  100.00  103.31  23.72 34.0
lat*                     15 2350  981.54 566.81  984.00  978.26 722.77  1.0
long*                    16 2350 1011.39 580.61 1010.50 1008.53 736.11  1.0
total_catch              17 2351   22.04  34.51   11.00   15.13  13.34  0.0
catch_200                18 2350   21.09  32.09   11.00   14.50  13.34  0.0
catch_effort             19 2332    3.78   5.76    1.72    2.54   2.16  0.0
                           max   range  skew kurtosis    se
lift*                    32.00   31.00  0.03    -1.22  0.19
net                     399.00  398.00  0.18    -1.01  2.14
date*                   117.00  116.00 -0.05    -1.16  0.69
year                     16.00    6.00  0.13    -1.14  0.04
week                      3.00    2.00  0.00    -1.46  0.02
north_South*              2.00    1.00 -0.10    -1.99  0.01
temp                     73.00   15.00  0.27     0.14  0.06
morning_evening_night*    6.00    5.00  0.26    -1.47  0.03
set_time*                69.00   68.00  0.06    -1.11  0.37
pull_time*              663.00  662.00 -0.16    -1.22  4.05
duration                 26.98   24.98  1.13     0.72  0.09
mesh_inch                 5.00    3.50  2.47     7.64  0.01
mesh                    127.00   88.90  2.47     7.64  0.35
depth                   170.00  136.00 -0.08    -0.76  0.42
lat*                   1991.00 1990.00  0.03    -1.15 11.69
long*                  2047.00 2046.00  0.04    -1.15 11.98
total_catch             554.00  554.00  5.15    45.68  0.71
catch_200               378.00  378.00  4.32    29.24  0.66
catch_effort             57.31   57.31  3.52    17.27  0.12
describeBy(fish_data_csv, group = fish_data_csv$year) # summary stats by group (year) using pysch package

 Descriptive statistics by group 
group: 10
                       vars   n    mean     sd  median trimmed    mad    min
lift*                     1 317   15.90   8.89   16.00   15.90  10.38    1.0
net                       2 317  155.46  90.43  155.00  155.42 115.64    1.0
date*                     3 317   90.12  21.12   94.00   91.42  26.69   53.0
year                      4 317   10.00   0.00   10.00   10.00   0.00   10.0
week                      5 317    2.01   0.79    2.00    2.01   1.48    1.0
north_South*              6 317    1.45   0.50    1.00    1.44   0.00    1.0
temp                      7 317   61.10   2.07   60.50   61.05   2.22   58.0
morning_evening_night*    8 317    2.62   1.53    3.00    2.53   2.97    1.0
set_time*                 9 317   34.02  18.52   37.00   34.30  23.72    2.0
pull_time*               10 317  290.73 189.65  266.00  285.55 219.42    2.0
duration                 11 303    8.04   4.85    6.32    7.33   3.63    2.1
mesh_inch                12 317    2.32   0.76    2.00    2.20   0.37    1.5
mesh                     13 317   59.01  19.24   50.80   55.98   9.41   38.1
depth                    14 317  106.23  22.02  104.00  106.76  28.17   53.0
lat*                     15 317 1681.23 209.48 1760.00 1686.38 240.18 1338.0
long*                    16 317 1727.78 207.68 1794.00 1732.83 249.08 1384.0
total_catch              17 317   31.61  53.46   12.00   19.16  14.83    0.0
catch_200                18 317   29.51  52.47   10.00   17.31  13.34    0.0
catch_effort             19 303    3.47   5.16    1.46    2.34   1.82    0.0
                          max range  skew kurtosis    se
lift*                    31.0  30.0 -0.02    -1.14  0.50
net                     311.0 310.0  0.00    -1.21  5.08
date*                   116.0  63.0 -0.47    -1.24  1.19
year                     10.0   0.0   NaN      NaN  0.00
week                      3.0   2.0 -0.02    -1.40  0.04
north_South*              2.0   1.0  0.21    -1.96  0.03
temp                     65.0   7.0  0.33    -1.14  0.12
morning_evening_night*    5.0   4.0  0.33    -1.23  0.09
set_time*                69.0  67.0 -0.11    -1.41  1.04
pull_time*              619.0 617.0  0.32    -1.26 10.65
duration                 23.4  21.3  1.21     0.81  0.28
mesh_inch                 5.0   3.5  1.67     2.92  0.04
mesh                    127.0  88.9  1.67     2.92  1.08
depth                   140.0  87.0 -0.10    -1.18  1.24
lat*                   1991.0 653.0 -0.26    -1.47 11.77
long*                  2038.0 654.0 -0.23    -1.49 11.66
total_catch             378.0 378.0  3.61    16.05  3.00
catch_200               378.0 378.0  3.74    17.49  2.95
catch_effort             30.0  30.0  2.61     7.71  0.30
------------------------------------------------------------ 
group: 11
                       vars   n    mean     sd  median trimmed    mad     min
lift*                     1 399   16.05   8.78   16.00   16.06  10.38    1.00
net                       2 399  200.00 115.33  200.00  200.00 148.26    1.00
date*                     3 399   88.12  22.89   90.00   89.20  34.10   49.00
year                      4 399   11.00   0.00   11.00   11.00   0.00   11.00
week                      5 399    2.01   0.78    2.00    2.02   1.48    1.00
north_South*              6 399    1.50   0.50    2.00    1.50   0.00    1.00
temp                      7 399   63.97   2.79   65.00   64.01   2.97   59.00
morning_evening_night*    8 399    2.60   1.67    3.00    2.50   2.97    1.00
set_time*                 9 399   29.93  21.67   26.00   29.21  31.13    3.00
pull_time*               10 399  410.35 179.63  424.00  421.65 237.22   62.00
duration                 11 399    7.95   2.87    9.20    8.08   2.37    2.15
mesh_inch                12 399    2.23   0.66    2.00    2.13   0.37    1.50
mesh                     13 399   56.58  16.74   50.80   54.14   9.41   38.10
depth                    14 392  101.61  19.36  100.00  101.54  22.24   53.00
lat*                     15 399 1652.75 190.36 1655.00 1652.64 194.22    1.00
long*                    16 399 1706.35 178.04 1702.00 1702.77 194.22 1382.00
total_catch              17 399   12.94  18.79    6.00    9.26   7.41    0.00
catch_200                18 399   12.67  18.73    6.00    8.95   7.41    0.00
catch_effort             19 399    2.00   3.54    0.77    1.26   1.01    0.00
                           max   range  skew kurtosis   se
lift*                    31.00   30.00 -0.03    -1.15 0.44
net                     399.00  398.00  0.00    -1.21 5.77
date*                   117.00   68.00 -0.30    -1.35 1.15
year                     11.00    0.00   NaN      NaN 0.00
week                      3.00    2.00 -0.02    -1.38 0.04
north_South*              2.00    1.00  0.00    -2.00 0.03
temp                     69.00   10.00 -0.22    -0.94 0.14
morning_evening_night*    5.00    4.00  0.39    -1.46 0.08
set_time*                65.00   62.00  0.15    -1.56 1.08
pull_time*              663.00  601.00 -0.27    -0.97 8.99
duration                 12.28   10.13 -0.44    -1.39 0.14
mesh_inch                 5.00    3.50  1.85     4.20 0.03
mesh                    127.00   88.90  1.85     4.20 0.84
depth                   141.00   88.00  0.01    -0.65 0.98
lat*                   1990.00 1989.00 -1.45    12.64 9.53
long*                  2047.00  665.00  0.19    -0.83 8.91
total_catch             148.00  148.00  3.34    15.87 0.94
catch_200               148.00  148.00  3.40    16.26 0.94
catch_effort             29.24   29.24  4.33    24.11 0.18
------------------------------------------------------------ 
group: 12
                       vars   n   mean     sd median trimmed    mad  min
lift*                     1 374  16.82   8.83  17.00   17.00  11.12  1.0
net                       2 374 192.55 111.39 195.50  192.82 145.29  1.0
date*                     3 374  47.35  27.89  46.00   46.48  39.29  7.0
year                      4 374  12.00   0.00  12.00   12.00   0.00 12.0
week                      5 374   1.97   0.82   2.00    1.96   1.48  1.0
north_South*              6 374   1.52   0.50   2.00    1.53   0.00  1.0
temp                      7 374  63.86   2.66  63.50   63.49   2.22 59.0
morning_evening_night*    8 374   2.51   1.69   1.00    2.39   0.00  1.0
set_time*                 9 373  34.88  14.13  32.00   33.79  17.79 19.0
pull_time*               10 373 349.13 179.14 309.00  356.14 220.91  1.0
duration                 11 373   5.71   3.34   4.07    5.33   1.66  2.0
mesh_inch                12 373   2.03   0.38   2.00    2.02   0.37  1.5
mesh                     13 373  51.59   9.60  50.80   51.23   9.49 38.1
depth                    14 374 106.78  18.69 107.00  107.36  20.76 55.0
lat*                     15 373 733.94 365.89 709.00  745.35 447.75 12.0
long*                    16 373 760.32 359.88 714.00  764.42 446.26 30.0
total_catch              17 374  27.84  45.23  14.00   19.15  16.31  0.0
catch_200                18 373  26.03  36.02  14.00   18.57  16.31  0.0
catch_effort             19 373   5.43   7.58   2.67    3.80   3.22  0.0
                           max   range  skew kurtosis    se
lift*                    31.00   30.00 -0.13    -1.18  0.46
net                     382.00  381.00 -0.03    -1.23  5.76
date*                    96.00   89.00  0.15    -1.29  1.44
year                     12.00    0.00   NaN      NaN  0.00
week                      3.00    2.00  0.06    -1.51  0.04
north_South*              2.00    1.00 -0.10    -2.00  0.03
temp                     73.00   14.00  1.59     3.67  0.14
morning_evening_night*    5.00    4.00  0.48    -1.44  0.09
set_time*                63.00   44.00  0.43    -1.21  0.73
pull_time*              619.00  618.00 -0.10    -1.14  9.28
duration                 13.67   11.67  0.93    -0.78  0.17
mesh_inch                 2.75    1.25  0.24    -0.98  0.02
mesh                     69.90   31.80  0.24    -0.98  0.50
depth                   137.00   82.00 -0.22    -0.89  0.97
lat*                   1325.00 1313.00 -0.14    -1.16 18.95
long*                  1369.00 1339.00 -0.03    -1.18 18.63
total_catch             554.00  554.00  5.79    52.77  2.34
catch_200               288.00  288.00  3.26    14.53  1.86
catch_effort             46.58   46.58  2.79     9.19  0.39
------------------------------------------------------------ 
group: 13
                       vars   n   mean     sd median trimmed    mad   min
lift*                     1 347  16.03   8.95  16.00   16.04  10.38  1.00
net                       2 347 174.00 100.31 174.00  174.00 128.99  1.00
date*                     3 347  44.61  26.91  43.00   44.02  38.55  4.00
year                      4 347  13.00   0.00  13.00   13.00   0.00 13.00
week                      5 347   1.97   0.81   2.00    1.96   1.48  1.00
north_South*              6 347   1.55   0.50   2.00    1.57   0.00  1.00
temp                      7 347  65.93   2.20  66.00   65.87   2.97 62.00
morning_evening_night*    8 347   2.73   1.80   3.00    2.66   2.97  1.00
set_time*                 9 347  33.50  16.73  27.00   34.13   5.93  1.00
pull_time*               10 347 380.16 175.87 394.00  396.73 174.95  5.00
duration                 11 347   5.93   3.30   4.25    5.56   1.73  2.35
mesh_inch                12 347   2.00   0.37   2.00    1.97   0.37  1.50
mesh                     13 347  50.78   9.36  50.80   50.14   9.41 38.10
depth                    14 347 105.04  19.13 102.00  105.16  23.72 58.00
lat*                     15 347 671.26 389.32 710.00  672.87 518.91  5.00
long*                    16 347 694.35 391.63 719.00  693.66 510.01  5.00
total_catch              17 347  20.32  24.11  14.00   15.91  13.34  0.00
catch_200                18 347  19.66  23.56  13.00   15.25  13.34  0.00
catch_effort             19 347   4.32   6.42   2.08    2.95   2.24  0.00
                           max   range  skew kurtosis    se
lift*                    31.00   30.00 -0.01    -1.15  0.48
net                     347.00  346.00  0.00    -1.21  5.39
date*                    92.00   88.00  0.15    -1.28  1.44
year                     13.00    0.00   NaN      NaN  0.00
week                      3.00    2.00  0.06    -1.48  0.04
north_South*              2.00    1.00 -0.21    -1.96  0.03
temp                     71.00    9.00  0.24    -0.72  0.12
morning_evening_night*    5.00    4.00  0.27    -1.72  0.10
set_time*                62.00   61.00 -0.10    -0.46  0.90
pull_time*              619.00  614.00 -0.57    -0.51  9.44
duration                 15.47   13.12  0.89    -0.49  0.18
mesh_inch                 2.75    1.25  0.45    -0.65  0.02
mesh                     69.85   31.75  0.45    -0.65  0.50
depth                   137.00   79.00 -0.02    -1.14  1.03
lat*                   1316.00 1311.00 -0.09    -1.37 20.90
long*                  1367.00 1362.00 -0.02    -1.30 21.02
total_catch             222.00  222.00  3.31    17.27  1.29
catch_200               212.00  212.00  3.22    16.07  1.26
catch_effort             52.40   52.40  3.69    17.96  0.34
------------------------------------------------------------ 
group: 14
                       vars   n   mean     sd median trimmed    mad   min
lift*                     1 382  16.56   8.91  17.00   16.65  10.38  1.00
net                       2 382 189.89 109.84 189.50  189.76 140.85  1.00
date*                     3 382  41.69  25.78  44.00   41.34  38.55  2.00
year                      4 382  14.00   0.00  14.00   14.00   0.00 14.00
week                      5 382   2.01   0.81   2.00    2.02   1.48  1.00
north_South*              6 382   1.52   0.50   2.00    1.52   0.00  1.00
temp                      7 382  63.26   2.10  64.00   63.40   2.97 58.00
morning_evening_night*    8 382   2.36   1.42   3.00    2.20   2.97  1.00
set_time*                 9 378  38.57  18.40  45.00   39.33  25.20  8.00
pull_time*               10 378 314.96 196.01 298.50  315.47 279.47  4.00
duration                 11 378   5.31   3.00   4.15    4.86   1.42  2.07
mesh_inch                12 382   2.22   0.86   2.00    2.04   0.37  1.50
mesh                     13 382  56.40  21.84  50.80   51.78   9.41 38.10
depth                    14 377 100.09  21.27  98.00  100.29  23.72 46.00
lat*                     15 382 676.63 410.54 737.50  679.51 555.23  6.00
long*                    16 382 706.81 399.51 748.00  711.41 487.78  6.00
total_catch              17 382  18.32  23.85  11.00   13.81  13.34  0.00
catch_200                18 382  18.02  23.68  11.00   13.52  12.60  0.00
catch_effort             19 378   3.99   5.03   2.02    3.01   2.51  0.00
                           max   range  skew kurtosis    se
lift*                    32.00   31.00 -0.07    -1.16  0.46
net                     379.00  378.00  0.01    -1.21  5.62
date*                    86.00   84.00  0.06    -1.29  1.32
year                     14.00    0.00   NaN      NaN  0.00
week                      3.00    2.00 -0.02    -1.50  0.04
north_South*              2.00    1.00 -0.07    -2.00  0.03
temp                     66.00    8.00 -0.29    -0.54  0.11
morning_evening_night*    5.00    4.00  0.55    -0.90  0.07
set_time*                68.00   60.00 -0.12    -1.41  0.95
pull_time*              619.00  615.00  0.03    -1.22 10.08
duration                 17.88   15.81  1.53     1.99  0.15
mesh_inch                 5.00    3.50  2.36     5.12  0.04
mesh                    127.00   88.90  2.36     5.12  1.12
depth                   170.00  124.00  0.01    -0.75  1.10
lat*                   1334.00 1328.00 -0.11    -1.30 21.00
long*                  1374.00 1368.00 -0.13    -1.17 20.44
total_catch             233.00  233.00  3.50    20.35  1.22
catch_200               233.00  233.00  3.55    20.95  1.21
catch_effort             37.85   37.85  2.43     8.21  0.26
------------------------------------------------------------ 
group: 15
                       vars   n   mean     sd median trimmed    mad   min
lift*                     1 255  14.80   9.84  13.00   14.51  13.34  1.00
net                       2 255 128.00  73.76 128.00  128.00  94.89  1.00
date*                     3 255  43.25  27.09  39.00   42.25  37.06  3.00
year                      4 255  15.00   0.00  15.00   15.00   0.00 15.00
week                      5 255   2.01   0.81   2.00    2.01   1.48  1.00
north_South*              6 255   1.59   0.49   2.00    1.61   0.00  1.00
temp                      7 255  63.25   1.76  62.00   63.07   0.00 61.00
morning_evening_night*    8 255   3.00   1.54   3.00    3.00   2.97  1.00
set_time*                 9 255  39.59  17.06  27.00   39.39  10.38 17.00
pull_time*               10 255 327.26 215.94 310.00  331.15 309.86  5.00
duration                 11 255   7.70   6.30   4.23    6.89   1.70  2.27
mesh_inch                12 255   2.00   0.37   2.00    1.97   0.37  1.50
mesh                     13 255  50.82   9.39  50.80   50.15   9.41 38.10
depth                    14 255 102.43  17.67  98.00  101.75  19.27 65.00
lat*                     15 255 614.78 397.87 489.00  604.05 512.98 17.00
long*                    16 255 601.19 441.04 495.00  588.77 603.42  7.00
total_catch              17 255  22.48  29.18  13.00   16.91  14.83  0.00
catch_200                18 255  22.11  28.80  13.00   16.63  13.34  0.00
catch_effort             19 255   3.82   5.34   2.18    2.71   2.42  0.00
                           max   range  skew kurtosis    se
lift*                    31.00   30.00  0.34    -1.29  0.62
net                     255.00  254.00  0.00    -1.21  4.62
date*                   100.00   97.00  0.24    -0.97  1.70
year                     15.00    0.00   NaN      NaN  0.00
week                      3.00    2.00 -0.02    -1.47  0.05
north_South*              2.00    1.00 -0.36    -1.88  0.03
temp                     69.00    8.00  0.89    -0.04  0.11
morning_evening_night*    5.00    4.00  0.00    -1.31  0.10
set_time*                62.00   45.00  0.34    -1.71  1.07
pull_time*              618.00  613.00 -0.09    -1.51 13.52
duration                 26.98   24.71  1.13    -0.01  0.39
mesh_inch                 2.75    1.25  0.48    -0.63  0.02
mesh                     69.85   31.75  0.48    -0.63  0.59
depth                   136.00   71.00  0.37    -1.03  1.11
lat*                   1321.00 1304.00  0.23    -1.43 24.92
long*                  1353.00 1346.00  0.20    -1.54 27.62
total_catch             282.00  282.00  3.80    24.81  1.83
catch_200               281.00  281.00  3.88    25.86  1.80
catch_effort             37.98   37.98  3.39    14.92  0.33
------------------------------------------------------------ 
group: 16
                       vars   n   mean     sd median trimmed    mad   min
lift*                     1 277  14.98   9.95  13.00   14.75  13.34  1.00
net                       2 277 138.06  80.02 138.00  138.00 102.30  1.00
date*                     3 277  62.62  36.48  87.00   64.66  29.65  1.00
year                      4 277  16.00   0.00  16.00   16.00   0.00 16.00
week                      5 277   2.01   0.82   2.00    2.01   1.48  1.00
north_South*              6 277   1.56   0.50   2.00    1.58   0.00  1.00
temp                      7 265  65.42   2.14  65.00   65.31   2.97 62.00
morning_evening_night*    8 277   3.71   1.62   4.00    3.81   1.48  1.00
set_time*                 9 277  36.58  15.65  31.00   35.52   5.93 20.00
pull_time*               10 277 335.35 222.01 377.00  340.99 296.52  5.00
duration                 11 277   9.26   6.05   4.67    9.10   3.34  2.25
mesh_inch                12 277   2.27   0.93   2.00    2.06   0.37  1.50
mesh                     13 277  57.75  23.63  50.80   52.25   9.41 38.10
depth                    14 277  98.62  20.98  98.00   99.50  22.24 34.00
lat*                     15 277 694.20 360.18 704.00  696.79 413.65  2.00
long*                    16 277 723.40 376.16 740.00  728.36 440.33  1.00
total_catch              17 277  23.26  31.66  11.00   16.45  14.83  0.00
catch_200                18 277  22.04  29.85  10.00   15.65  13.34  0.00
catch_effort             19 277   3.42   5.86   1.34    2.22   1.81  0.00
                           max   range  skew kurtosis    se
lift*                    31.00   30.00  0.28    -1.35  0.60
net                     276.00  275.00  0.00    -1.22  4.81
date*                   107.00  106.00 -0.29    -1.57  2.19
year                     16.00    0.00   NaN      NaN  0.00
week                      3.00    2.00 -0.02    -1.51  0.05
north_South*              2.00    1.00 -0.25    -1.94  0.03
temp                     70.00    8.00  0.29    -0.74  0.13
morning_evening_night*    6.00    5.00 -0.55    -1.12  0.10
set_time*                62.00   42.00  0.85    -0.95  0.94
pull_time*              618.00  613.00 -0.20    -1.56 13.34
duration                 17.93   15.68  0.13    -1.88  0.36
mesh_inch                 5.00    3.50  2.15     3.76  0.06
mesh                    127.00   88.90  2.15     3.76  1.42
depth                   136.00  102.00 -0.31    -0.38  1.26
lat*                   1337.00 1335.00 -0.02    -0.96 21.64
long*                  1381.00 1380.00 -0.07    -0.94 22.60
total_catch             192.00  192.00  2.49     7.21  1.90
catch_200               192.00  192.00  2.46     7.17  1.79
catch_effort             57.31   57.31  4.56    30.58  0.35
Desc(fish_data_csv$total_catch) # more summary stats from the DescTools package for only total_catch. You can do it for all the data by naming only the dataframe but it has a lot of output
------------------------------------------------------------------------------ 
fish_data_csv$total_catch (integer)

  length       n    NAs  unique     0s   mean  meanCI'
   2'351   2'351      0     153    184  22.04   20.65
          100.0%   0.0%           7.8%          23.44
                                                     
     .05     .10    .25  median    .75    .90     .95
    0.00    1.00   4.00   11.00  26.00  50.00   81.00
                                                     
   range      sd  vcoef     mad    IQR   skew    kurt
  554.00   34.51   1.57   13.34  22.00   5.15   45.68
                                                     
lowest : 0 (184), 1 (147), 2 (117), 3 (128), 4 (97)
highest: 311, 346, 362, 378, 554

heap(?): remarkable frequency (7.8%) for the mode(s) (= 0)

' 95%-CI (classic)

More basic summary information

mean(fish_data_csv$total_catch) # show how to calculate a mean from the data frame fish_data_csv...dollar sign allows you to draw variables (elements) from a dataframe
[1] 22.04424
max(fish_data_csv$total_catch) # calcualte maximum value
[1] 554
min(fish_data_csv$total_catch) # calculate minimum value
[1] 0
var(fish_data_csv$total_catch) # variance
[1] 1191.002
sqrt(var(fish_data_csv$total_catch)) # square root of variance of total catch = standard deviation
[1] 34.5109
sd(fish_data_csv$total_catch) # same as SD of course
[1] 34.5109
ci_mean(fish_data_csv$total_catch) # calculates 95% ci (default) using confintr package

    Two-sided 95% t confidence interval for the population mean

Sample estimate: 22.04424 
Confidence interval:
    2.5%    97.5% 
20.64851 23.43997 
ci_mean(fish_data_csv$total_catch, probs = c(0.1, 0.90)) # calculates 80% ci using confintr package

    Two-sided 80% t confidence interval for the population mean

Sample estimate: 22.04424 
Confidence interval:
     10%      90% 
21.13183 22.95664 
ci_median(fish_data_csv$total_catch) # calculates the 95% ci for the median. I use median a fair amount in analyses using the confintr package

    Two-sided 95% binomial confidence interval for the population median

Sample estimate: 11 
Confidence interval:
 2.5% 97.5% 
   10    12 
ci_IQR(fish_data_csv$total_catch, boot_type = "basic") # calculates the 95% ci for the interquartile range (IQR is the range between the 25th and 75th quartiles)

    Two-sided 95% bootstrap confidence interval for the population IQR
    based on 9999 bootstrap replications and the basic method

Sample estimate: 22 
Confidence interval:
 2.5% 97.5% 
 19.5  23.0 
unique(fish_data_csv$date) # show me the unique dates
  [1] "8/23/10" "8/24/10" "8/25/10" "8/26/10" "8/27/10" "8/29/10" "8/30/10"
  [8] "8/31/10" "9/1/10"  "9/2/10"  "9/3/10"  "9/6/10"  "9/7/10"  "9/8/10" 
 [15] "9/9/10"  "9/10/10" "8/22/11" "8/23/11" "8/24/11" "8/25/11" "8/26/11"
 [22] "8/28/11" "8/29/11" "8/30/11" "8/31/11" "9/1/11"  "9/2/11"  "9/5/11" 
 [29] "9/6/11"  "9/7/11"  "9/8/11"  "9/9/11"  "8/12/12" "8/13/12" "8/14/12"
 [36] "8/15/12" "8/16/12" "8/19/12" "8/20/12" "8/21/12" "8/22/12" "8/23/12"
 [43] "8/26/12" "8/27/12" "8/28/12" "8/29/12" "8/30/12" "8/17/12" "8/24/12"
 [50] "8/31/12" "8/11/13" "8/12/13" "8/13/13" "8/14/13" "8/15/13" "8/16/13"
 [57] "8/18/13" "8/19/13" "8/20/13" "8/21/13" "8/22/13" "8/23/13" "8/26/13"
 [64] "8/27/13" "8/28/13" "8/29/13" "8/30/13" "8/10/14" "8/11/14" "8/12/14"
 [71] "8/13/14" "8/14/14" "8/15/14" "8/17/14" "8/18/14" "8/19/14" "8/20/14"
 [78] "8/21/14" "8/22/14" "8/24/14" "8/25/14" "8/26/14" "8/27/14" "8/28/14"
 [85] "8/29/14" "8/9/15"  "8/10/15" "8/11/15" "8/12/15" "8/13/15" "8/14/15"
 [92] "8/16/15" "8/17/15" "8/18/15" "8/19/15" "8/20/15" "8/24/15" "8/25/15"
 [99] "8/26/15" "8/27/15" "8/28/15" "8/1/16"  "8/2/16"  "8/3/16"  "8/4/16" 
[106] "8/5/16"  "8/14/16" "8/15/16" "8/16/16" "8/17/16" "8/18/16" "8/19/16"
[113] "8/29/16" "8/30/16" "8/31/16" "9/1/16"  "9/2/16" 
length(unique(fish_data_csv$date)) # multipe actions ok, find length of unique dates
[1] 117
matrix_plot <- pairs(~ year + mesh + catch_effort, data = fish_data_csv) # another way to quickly look at some of the data

How to make bins

Merge data sets

Missing values (NAs)

Missing values can cause problems, see below. It is always good to have an idea of what variables may have missing values and how many. I use the summary analysis above to evaluate the NAs for each variable. Missing values are different than zeros so make sure you know the difference in your data. Again, you can use the summary output from above to evaluate each variable for NAs versus zeros.

mean(fish_data_csv$duration) # show that you get NA because you have NAs in the variable duration
[1] NA
head(fish_data_csv$duration, 100) # see the NAs
  [1]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 [13]    NA    NA  2.83  3.00  3.30  3.55  3.83  4.40  4.65 14.88 15.12 15.38
 [25] 15.60 15.83 16.12  5.18  5.43  5.65  5.85  6.07  6.28  6.52  6.75  3.47
 [37]  3.73  4.08  4.52  4.77  4.97  5.33  5.83 15.58 16.12 16.67 17.45 18.20
 [49] 18.53 14.67 15.10 16.18 16.75 17.13 17.77 18.88 19.30  3.67  3.90  4.15
 [61]  4.40  4.60  4.80  5.02  5.22  5.45  5.67  5.98  6.27  6.50  6.72  3.00
 [73]  3.30  3.57  3.90  4.18  4.48 10.35 10.78 11.20 11.58 11.90 12.27  6.75
 [85]  7.18  7.53  7.75  8.02  8.50  5.37  5.58  5.80  6.08  6.32  6.52  6.80
 [97]  6.80  3.35  3.57  3.73
mean(fish_data_csv$duration, na.rm = TRUE) # recalculate mean by removing NAs
[1] 7.003782

Changing the data type of a variable

Sometimes when you import data from a .csv or .xlsx the data type assigned by R isn’t what you want it to be. For example, you may want to change a numeric variable to a factor.

str(fish_data_csv)
'data.frame':   2351 obs. of  19 variables:
 $ lift                 : chr  "1" "1" "1" "1" ...
 $ net                  : num  1 2 3 4 5 6 6.5 7 7.5 8 ...
 $ date                 : chr  "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
 $ year                 : int  10 10 10 10 10 10 10 10 10 10 ...
 $ week                 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ north_South          : chr  "South" "South" "South" "South" ...
 $ temp                 : num  64 64 64 64 64 64 64 64 64 64 ...
 $ morning_evening_night: chr  "morning" "morning" "morning" "morning" ...
 $ set_time             : chr  "4:40" "4:40" "4:40" "4:40" ...
 $ pull_time            : chr  "8:35" "8:35" "8:35" "8:35" ...
 $ duration             : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mesh_inch            : num  2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
 $ mesh                 : num  63.5 50.8 57.1 50.8 63.5 ...
 $ depth                : num  101 96 92 80.5 84 73 83 68 83 84.5 ...
 $ lat                  : chr  "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
 $ long                 : chr  "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
 $ total_catch          : int  2 38 8 6 5 2 7 14 120 154 ...
 $ catch_200            : int  2 38 8 6 5 0 0 14 120 154 ...
 $ catch_effort         : num  NA NA NA NA NA NA NA NA NA NA ...

See that mesh size is numeric and perhaps we would like it as a factor for some analyses.

fish_data_csv$mesh_factor <- as.factor(fish_data_csv$mesh) # here is a new variable as a factor
str(fish_data_csv)
'data.frame':   2351 obs. of  20 variables:
 $ lift                 : chr  "1" "1" "1" "1" ...
 $ net                  : num  1 2 3 4 5 6 6.5 7 7.5 8 ...
 $ date                 : chr  "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
 $ year                 : int  10 10 10 10 10 10 10 10 10 10 ...
 $ week                 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ north_South          : chr  "South" "South" "South" "South" ...
 $ temp                 : num  64 64 64 64 64 64 64 64 64 64 ...
 $ morning_evening_night: chr  "morning" "morning" "morning" "morning" ...
 $ set_time             : chr  "4:40" "4:40" "4:40" "4:40" ...
 $ pull_time            : chr  "8:35" "8:35" "8:35" "8:35" ...
 $ duration             : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mesh_inch            : num  2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
 $ mesh                 : num  63.5 50.8 57.1 50.8 63.5 ...
 $ depth                : num  101 96 92 80.5 84 73 83 68 83 84.5 ...
 $ lat                  : chr  "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
 $ long                 : chr  "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
 $ total_catch          : int  2 38 8 6 5 2 7 14 120 154 ...
 $ catch_200            : int  2 38 8 6 5 0 0 14 120 154 ...
 $ catch_effort         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mesh_factor          : Factor w/ 14 levels "38.1","44.45",..: 7 4 5 4 7 1 1 2 2 4 ...

Now let’s change it back to numeric.

fish_data_csv$mesh_numeric <- as.numeric(as.character(fish_data_csv$mesh_factor)) # factor to character to numeric

str(fish_data_csv)
'data.frame':   2351 obs. of  21 variables:
 $ lift                 : chr  "1" "1" "1" "1" ...
 $ net                  : num  1 2 3 4 5 6 6.5 7 7.5 8 ...
 $ date                 : chr  "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
 $ year                 : int  10 10 10 10 10 10 10 10 10 10 ...
 $ week                 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ north_South          : chr  "South" "South" "South" "South" ...
 $ temp                 : num  64 64 64 64 64 64 64 64 64 64 ...
 $ morning_evening_night: chr  "morning" "morning" "morning" "morning" ...
 $ set_time             : chr  "4:40" "4:40" "4:40" "4:40" ...
 $ pull_time            : chr  "8:35" "8:35" "8:35" "8:35" ...
 $ duration             : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mesh_inch            : num  2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
 $ mesh                 : num  63.5 50.8 57.1 50.8 63.5 ...
 $ depth                : num  101 96 92 80.5 84 73 83 68 83 84.5 ...
 $ lat                  : chr  "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
 $ long                 : chr  "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
 $ total_catch          : int  2 38 8 6 5 2 7 14 120 154 ...
 $ catch_200            : int  2 38 8 6 5 0 0 14 120 154 ...
 $ catch_effort         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ mesh_factor          : Factor w/ 14 levels "38.1","44.45",..: 7 4 5 4 7 1 1 2 2 4 ...
 $ mesh_numeric         : num  63.5 50.8 57.1 50.8 63.5 ...

Summarize data with tidyverse

Tidyverse is a package that contains several packages (e.g., dplyr and ggplot2) that allow for data wrangling and graphing. We have found that students new to R find the tidyverse “language” a bit easier to learn than base R. Piping (%>%; keyboard shortcut for Mac is cmd+shift+M and for Windows is ctrl+shit+M) is a nice way to do several actions at once. Think of this symbol %>% as the word “then.” Notice the last line does not include the %>%. The latest version of RStudio has the native piping operator that is |>. You can change this in Global Options if you prefer the |> operator.

In the simple example below we create a new dataframe that will use the fish_data_csv dataframe “then” filter out the year 10 (2010), then only select the variables year, mesh, and catch_effort. You will see an new dataframe in the Environment tab called fish_data_csv_10 with only two variables (i.e., mesh and catch_effort) and 317 observations. Click on the dataframe icon to open the new dataframe and see it only has year 10.

fish_data_csv_10 <- fish_data_csv %>% 
  filter(year==10) %>% #filter, here we filter for year 10 (2010)
  select(year, mesh, catch_effort) #we select only these variables, helps make the dataframe more manageable

Now let’s do more. For the new dataframe fish_data_csv_filter_catch_effort we filter years 10, 11, and 12; filter mesh size of 2.0 inch; select variables year, mesh_inch, catch_effort, total_catch, and catch_200; filter out NAs for catch_effort; rename the variable catch_effort to cpue; and create a new variable called ratio that is the catch_200 divided by catch_total.

fish_data_csv_filter_catch_effort <- fish_data_csv %>% # create new data frame from the fish_data_csv dataframe
  filter(year %in% c(10, 11, 12) & mesh_inch == 2.00) %>% # filter for years and mesh size
  select(year, mesh_inch, catch_effort, total_catch, catch_200) %>% # select variables we are interested in
  filter(!is.na(catch_effort)) %>% # filter out NAs because they will cause a problem when we calculate a mean
  rename(cpue = catch_effort) %>% # rename a variable to match with fisheries science terminology
  mutate(ratio = catch_200 / total_catch) # create a new variable called ratio that is the catch of fish >= 200 mm divided by the total catch

head(fish_data_csv_filter_catch_effort, n = 5) # shows first 5 lines of new dataframe
  year mesh_inch cpue total_catch catch_200 ratio
1   10         2 8.09          31        31     1
2   10         2 8.18          36        36     1
3   10         2 3.96          59        59     1
4   10         2 3.54          56        56     1
5   10         2 4.60          25        25     1
tail(fish_data_csv_filter_catch_effort, n = 5) # shows last 5 lines of new dataframe
    year mesh_inch cpue total_catch catch_200     ratio
274   12         2 1.76          23        23 1.0000000
275   12         2 0.95          17        13 0.7647059
276   12         2 0.75           9         8 0.8888889
277   12         2 0.46           5         5 1.0000000
278   12         2 1.02          12        12 1.0000000

Now let’s make a summary table from a filtered data frame.

fish_data_csv_summary <- fish_data_csv %>%
  filter(year %in% c(10, 11, 12, 13) & mesh_inch %in% c(2.00, 2.50)) %>%
  group_by(year, mesh_inch) %>%
  select(year, mesh_inch, catch_effort) %>% # select variables we are interested in
  filter(!is.na(catch_effort)) %>% # filter out NAs because they will cause a problem when we calculate a mean
  rename(cpue = catch_effort) %>% # rename a variable to match with fisheries science terminology
  summarise(min_cpue = min(cpue), max_cpue = max(cpue), mean_cpue = mean(cpue), sd_cpue = sd(cpue), median_cpue = median(cpue))


fish_data_csv_summary %>%
  mutate_if(is.numeric, round, digits = 1) %>% # changing level of precision
  kbl(caption = "Summary Data by Year and Mesh") %>% # making a table
  kable_classic(full_width = F) # table type
Summary Data by Year and Mesh
year mesh_inch min_cpue max_cpue mean_cpue sd_cpue median_cpue
10 2.0 0 30.0 6.6 6.9 4.2
10 2.5 0 23.0 2.4 4.2 1.3
11 2.0 0 22.2 2.6 3.0 1.8
11 2.5 0 4.6 0.6 0.9 0.3
12 2.0 0 33.4 6.3 7.4 3.2
12 2.5 0 14.5 2.2 3.0 1.1
13 2.0 0 52.4 4.5 6.7 2.5
13 2.5 0 7.6 2.7 2.5 1.6

Selecting specific values from a dataframe

fish_data_csv [1,2] # grabs value in first row and second column from the fish_data_csv dataframe
[1] 1
fish_data_csv [1,] #grabs first row
  lift net    date year week north_South temp morning_evening_night set_time
1    1   1 8/23/10   10    1       South   64               morning     4:40
  pull_time duration mesh_inch mesh depth       lat       long total_catch
1      8:35       NA       2.5 63.5   101 N47.55714 W113.51997           2
  catch_200 catch_effort mesh_factor mesh_numeric
1         2           NA        63.5         63.5
fish_data_csv [,2] #grabs second column
   [1]   1.0   2.0   3.0   4.0   5.0   6.0   6.5   7.0   7.5   8.0   9.0  10.0
  [13]  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0
  [25]  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0
  [37]  35.0  36.0  37.0  38.0  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0
  [49]  47.0  48.0  49.0  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0
  [61]  59.0  60.0  61.0  62.0  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0
  [73]  71.0  72.0  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0
  [85]  83.0  84.0  85.0  86.0  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0
  [97]  94.5  95.0  96.0  97.0  98.0  99.0 100.0 101.0 102.0 103.0 104.0 105.0
 [109] 106.0 107.0 108.0 109.0 110.0 111.0 112.0 113.0 114.0 115.0 116.0 117.0
 [121] 118.0 119.0 119.5 120.0 121.0 122.0 123.0 124.0 125.0 126.0 127.0 128.0
 [133] 129.0 130.0 131.0 132.0 133.0 134.0 135.0 136.0 137.0 138.0 139.0 140.0
 [145] 141.0 142.0 143.0 144.0 145.0 146.0 147.0 148.0 149.0 150.0 151.0 152.0
 [157] 153.0 154.0 155.0 156.0 157.0 158.0 159.0 160.0 161.0 162.0 163.0 164.0
 [169] 165.0 166.0 167.0 168.0 169.0 170.0 171.0 172.0 173.0 174.0 175.0 176.0
 [181] 177.0 178.0 179.0 180.0 181.0 182.0 183.0 184.0 185.0 186.0 187.0 188.0
 [193] 189.0 190.0 191.0 192.0 193.0 194.0 195.0 196.0 197.0 198.0 199.0 200.0
 [205] 201.0 202.0 203.0 204.0 205.0 206.0 207.0 208.0 209.0 210.0 211.0 212.0
 [217] 213.0 214.0 215.0 216.0 217.0 218.0 219.0 220.0 221.0 222.0 223.0 224.0
 [229] 225.0 226.0 227.0 228.0 229.0 230.0 231.0 232.0 232.5 233.0 234.0 235.0
 [241] 236.0 237.0 238.0 239.0 240.0 241.0 242.0 243.0 244.0 245.0 246.0 247.0
 [253] 248.0 249.0 250.0 251.0 252.0 253.0 254.0 255.0 256.0 257.0 258.0 259.0
 [265] 260.0 261.0 262.0 263.0 264.0 265.0 266.0 267.0 268.0 269.0 270.0 271.0
 [277] 272.0 273.0 274.0 275.0 276.0 277.0 278.0 279.0 280.0 281.0 282.0 283.0
 [289] 284.0 285.0 286.0 287.0 288.0 289.0 290.0 291.0 292.0 293.0 294.0 295.0
 [301] 296.0 297.0 298.0 299.0 300.0 301.0 302.0 303.0 304.0 304.5 305.0 306.0
 [313] 307.0 308.0 309.0 310.0 311.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0
 [325]   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
 [337]  20.0  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  31.0
 [349]  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0  41.0  42.0  43.0
 [361]  44.0  45.0  46.0  47.0  48.0  49.0  50.0  51.0  52.0  53.0  54.0  55.0
 [373]  56.0  57.0  58.0  59.0  60.0  61.0  62.0  63.0  64.0  65.0  66.0  67.0
 [385]  68.0  69.0  70.0  71.0  72.0  73.0  74.0  75.0  76.0  77.0  78.0  79.0
 [397]  80.0  81.0  82.0  83.0  84.0  85.0  86.0  87.0  88.0  89.0  90.0  91.0
 [409]  92.0  93.0  94.0  95.0  96.0  97.0  98.0  99.0 100.0 101.0 102.0 103.0
 [421] 104.0 105.0 106.0 107.0 108.0 109.0 110.0 111.0 112.0 113.0 114.0 115.0
 [433] 116.0 117.0 118.0 119.0 120.0 121.0 122.0 123.0 124.0 125.0 126.0 127.0
 [445] 128.0 129.0 130.0 131.0 132.0 133.0 134.0 135.0 136.0 137.0 138.0 139.0
 [457] 140.0 141.0 142.0 143.0 144.0 145.0 146.0 147.0 148.0 149.0 150.0 151.0
 [469] 152.0 153.0 154.0 155.0 156.0 157.0 158.0 159.0 160.0 161.0 162.0 163.0
 [481] 164.0 165.0 166.0 167.0 168.0 169.0 170.0 171.0 172.0 173.0 174.0 175.0
 [493] 176.0 177.0 178.0 179.0 180.0 181.0 182.0 183.0 184.0 185.0 186.0 187.0
 [505] 188.0 189.0 190.0 191.0 192.0 193.0 194.0 195.0 196.0 197.0 198.0 199.0
 [517] 200.0 201.0 202.0 203.0 204.0 205.0 206.0 207.0 208.0 209.0 210.0 211.0
 [529] 212.0 213.0 214.0 215.0 216.0 217.0 218.0 219.0 220.0 221.0 222.0 223.0
 [541] 224.0 225.0 226.0 227.0 228.0 229.0 230.0 231.0 232.0 233.0 234.0 235.0
 [553] 236.0 237.0 238.0 239.0 240.0 241.0 242.0 243.0 244.0 245.0 246.0 247.0
 [565] 248.0 249.0 250.0 251.0 252.0 253.0 254.0 255.0 256.0 257.0 258.0 259.0
 [577] 260.0 261.0 262.0 263.0 264.0 265.0 266.0 267.0 268.0 269.0 270.0 271.0
 [589] 272.0 273.0 274.0 275.0 276.0 277.0 278.0 279.0 280.0 281.0 282.0 283.0
 [601] 284.0 285.0 286.0 287.0 288.0 289.0 290.0 291.0 292.0 293.0 294.0 295.0
 [613] 296.0 297.0 298.0 299.0 300.0 301.0 302.0 303.0 304.0 305.0 306.0 307.0
 [625] 308.0 309.0 310.0 311.0 312.0 313.0 314.0 315.0 316.0 317.0 318.0 319.0
 [637] 320.0 321.0 322.0 323.0 324.0 325.0 326.0 327.0 328.0 329.0 330.0 331.0
 [649] 332.0 333.0 334.0 335.0 336.0 337.0 338.0 339.0 340.0 341.0 342.0 343.0
 [661] 344.0 345.0 346.0 347.0 348.0 349.0 350.0 351.0 352.0 353.0 354.0 355.0
 [673] 356.0 357.0 358.0 359.0 360.0 361.0 362.0 363.0 364.0 365.0 366.0 367.0
 [685] 368.0 369.0 370.0 371.0 372.0 373.0 374.0 375.0 376.0 377.0 378.0 379.0
 [697] 380.0 381.0 382.0 383.0 384.0 385.0 386.0 387.0 388.0 389.0 390.0 391.0
 [709] 392.0 393.0 394.0 395.0 396.0 397.0 398.0 399.0   1.0   2.0   3.0   4.0
 [721]   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0
 [733]  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0  41.0  42.0  43.0  44.0
 [745]  45.0  46.0  47.0  48.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0
 [757]  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  96.0  97.0  98.0  99.0
 [769] 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 120.0 121.0 122.0 123.0
 [781] 124.0 125.0 126.0 127.0 128.0 129.0 130.0 141.0 142.0 143.0 144.0 145.0
 [793] 146.0 147.0 148.0 149.0 150.0 151.0 152.0 165.0 166.0 167.0 168.0 169.0
 [805] 170.0 171.0 172.0 173.0 174.0 175.0 176.0 192.0 193.0 194.0 195.0 196.0
 [817] 197.0 198.0 199.0 200.0 201.0 202.0 203.0 216.0 217.0 218.0 219.0 220.0
 [829] 221.0 222.0 223.0 224.0 225.0 226.0 227.0 240.0 241.0 242.0 243.0 244.0
 [841] 245.0 246.0 247.0 248.0 249.0 250.0 251.0 264.0 265.0 266.0 267.0 268.0
 [853] 269.0 270.0 271.0 272.0 273.0 274.0 275.0 288.0 289.0 290.0 291.0 292.0
 [865] 293.0 294.0 295.0 296.0 297.0 298.0 299.0 312.0 313.0 314.0 315.0 316.0
 [877] 317.0 318.0 319.0 320.0 321.0 322.0 335.0 336.0 337.0 338.0 339.0 340.0
 [889] 341.0 342.0 343.0 344.0 345.0 346.0 359.0 360.0 361.0 362.0 363.0 364.0
 [901] 365.0 366.0 367.0 368.0 369.0 370.0  17.0  18.0  19.0  20.0  21.0  22.0
 [913]  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  49.0  50.0
 [925]  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  59.0  60.0  61.0  62.0
 [937]  63.0  64.0  81.0  82.0  83.0  84.0  85.0  86.0  86.5  87.0  88.0  89.0
 [949]  90.0  91.0  92.0  93.0  94.0  95.0 108.0 109.0 110.0 111.0 112.0 113.0
 [961] 114.0 115.0 116.0 117.0 118.0 119.0 131.0 153.0 154.0 155.0 156.0 157.0
 [973] 158.0 159.0 160.0 161.0 162.0 163.0 164.0 252.0 253.0 254.0 255.0 256.0
 [985] 257.0 258.0 259.0 260.0 261.0 262.0 263.0 177.0 178.0 179.0 180.0 181.0
 [997] 182.0 183.0 184.0 185.0 186.0 187.0 188.0 189.0 190.0 191.0 204.0 205.0
[1009] 206.0 207.0 208.0 209.0 210.0 211.0 212.0 213.0 214.0 215.0 228.0 229.0
[1021] 230.0 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.0 239.0 276.0 277.0
[1033] 278.0 279.0 280.0 281.0 282.0 283.0 284.0 285.0 286.0 287.0 300.0 301.0
[1045] 302.0 303.0 304.0 305.0 306.0 307.0 308.0 309.0 310.0 311.0 323.0 324.0
[1057] 325.0 326.0 327.0 328.0 329.0 330.0 331.0 332.0 333.0 334.0 347.0 348.0
[1069] 349.0 350.0 351.0 352.0 353.0 354.0 355.0 356.0 357.0 358.0 371.0 372.0
[1081] 373.0 374.0 375.0 376.0 377.0 378.0 379.0 380.0 381.0 382.0   1.0   2.0
[1093]   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
[1105]  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0
[1117]  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0
[1129]  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0
[1141]  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  59.0  60.0  61.0  62.0
[1153]  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  73.0  74.0
[1165]  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  83.0  84.0  85.0  86.0
[1177]  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  97.0  98.0
[1189]  99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0
[1201] 111.0 112.0 113.0 114.0 115.0 116.0 117.0 118.0 119.0 120.0 121.0 122.0
[1213] 123.0 124.0 125.0 126.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0 134.0
[1225] 135.0 136.0 137.0 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0 146.0
[1237] 147.0 148.0 149.0 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0 158.0
[1249] 159.0 160.0 161.0 162.0 163.0 164.0 165.0 166.0 167.0 168.0 169.0 170.0
[1261] 171.0 172.0 173.0 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0 182.0
[1273] 183.0 184.0 185.0 186.0 187.0 188.0 189.0 190.0 191.0 192.0 193.0 194.0
[1285] 195.0 196.0 197.0 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0 206.0
[1297] 207.0 208.0 209.0 210.0 211.0 212.0 213.0 214.0 215.0 216.0 217.0 218.0
[1309] 219.0 220.0 221.0 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0 230.0
[1321] 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.0 239.0 240.0 241.0 242.0
[1333] 243.0 244.0 245.0 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0 254.0
[1345] 255.0 256.0 257.0 258.0 259.0 260.0 261.0 262.0 263.0 264.0 265.0 266.0
[1357] 267.0 268.0 269.0 270.0 271.0 272.0 273.0 274.0 275.0 276.0 277.0 278.0
[1369] 279.0 280.0 281.0 282.0 283.0 284.0 285.0 286.0 287.0 288.0 289.0 290.0
[1381] 291.0 292.0 293.0 294.0 295.0 296.0 297.0 298.0 299.0 300.0 301.0 302.0
[1393] 303.0 304.0 305.0 306.0 307.0 308.0 309.0 310.0 311.0 312.0 313.0 314.0
[1405] 315.0 316.0 317.0 318.0 319.0 320.0 321.0 322.0 323.0 324.0 325.0 326.0
[1417] 327.0 328.0 329.0 330.0 331.0 332.0 333.0 334.0 335.0 336.0 337.0 338.0
[1429] 339.0 340.0 341.0 342.0 343.0 344.0 345.0 346.0 347.0   1.0   2.0   3.0
[1441]   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0
[1453]  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0  27.0
[1465]  28.0  29.0  30.0  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0
[1477]  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0  51.0
[1489]  52.0  53.0  54.0  55.0  56.0  57.0  58.0  59.0  60.0  60.1  61.1  62.1
[1501]  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  73.0  74.0
[1513]  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  83.0  84.0  85.0  86.0
[1525]  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  97.0  98.0
[1537]  98.1  99.1 100.1 101.1 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0
[1549] 110.0 111.0 112.0 113.0 114.1 115.1 116.0 117.0 118.0 119.0 120.0 120.1
[1561] 121.0 122.0 123.0 124.0 125.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0
[1573] 134.0 135.1 136.1 137.0 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0
[1585] 146.0 147.0 148.0 149.0 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0
[1597] 158.0 159.0 160.0 161.1 162.1 163.0 164.0 165.0 166.0 167.0 168.0 169.0
[1609] 170.0 171.0 172.0 173.0 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0
[1621] 182.0 183.0 184.0 185.0 186.0 187.1 188.1 189.0 190.0 191.0 192.0 193.0
[1633] 194.0 195.0 196.0 197.0 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0
[1645] 206.0 207.0 208.0 209.0 210.0 211.0 212.0 213.1 214.1 215.0 216.0 217.0
[1657] 218.0 219.0 220.0 221.0 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0
[1669] 230.0 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.1 239.1 240.0 241.0
[1681] 242.0 243.0 244.0 245.0 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0
[1693] 254.0 255.0 256.0 257.0 258.0 259.0 260.0 261.0 262.0 263.0 264.1 265.1
[1705] 266.0 267.0 268.0 269.0 270.0 271.0 272.0 273.0 274.0 275.0 276.0 277.0
[1717] 278.0 279.0 280.0 281.0 282.0 283.0 284.0 285.0 286.0 287.0 288.0 289.0
[1729] 290.0 291.0 292.0 293.0 294.0 295.0 296.0 297.0 298.0 299.0 300.0 301.0
[1741] 302.1 303.0 304.0 305.0 306.0 307.0 308.0 309.0 310.0 311.0 312.0 313.0
[1753] 314.0 315.0 316.0 317.0 318.0 319.0 320.0 321.0 322.0 323.0 324.0 325.0
[1765] 326.0 327.1 328.1 329.0 330.0 331.0 332.0 333.0 334.0 335.0 336.0 337.0
[1777] 338.0 339.0 340.0 341.0 342.0 343.0 344.0 345.0 346.0 347.0 348.0 349.0
[1789] 350.0 351.0 352.0 353.1 354.1 355.0 356.0 357.0 358.0 359.0 360.0 361.0
[1801] 362.0 363.0 364.0 365.0 366.0 367.1 368.1 369.0 370.0 371.0 372.0 373.0
[1813] 373.1 374.0 375.0 376.0 377.0 378.0 379.0   1.0   2.0   3.0   4.0   5.0
[1825]   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0
[1837]  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0
[1849]  30.0  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0  41.0
[1861]  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0  51.0  52.0  53.0
[1873]  54.0  55.0  56.0  57.0  58.0  59.0  60.0  61.0  62.0  63.0  64.0  65.0
[1885]  66.0  67.0  68.0  69.0  70.0  71.0  72.0  73.0  74.0  75.0  76.0  77.0
[1897]  78.0  79.0  80.0  81.0  82.0  83.0  84.0  85.0  86.0  87.0  88.0  89.0
[1909]  90.0  91.0  92.0  93.0  94.0  95.0  96.0  97.0  98.0  99.0 100.0 101.0
[1921] 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0 111.0 112.0 113.0
[1933] 114.0 115.0 116.0 117.0 118.0 119.0 120.0 121.0 122.0 123.0 124.0 125.0
[1945] 126.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0 134.0 135.0 136.0 137.0
[1957] 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0 146.0 147.0 148.0 149.0
[1969] 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0 158.0 159.0 160.0 161.0
[1981] 162.0 163.0 164.0 165.0 166.0 167.0 168.0 169.0 170.0 171.0 172.0 173.0
[1993] 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0 182.0 183.0 184.0 185.0
[2005] 186.0 187.0 188.0 189.0 190.0 191.0 192.0 193.0 194.0 195.0 196.0 197.0
[2017] 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0 206.0 207.0 208.0 209.0
[2029] 210.0 211.0 212.0 213.0 214.0 215.0 216.0 217.0 218.0 219.0 220.0 221.0
[2041] 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0 230.0 231.0 232.0 233.0
[2053] 234.0 235.0 236.0 237.0 238.0 239.0 240.0 241.0 242.0 243.0 244.0 245.0
[2065] 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0 254.0 255.0   1.0   2.0
[2077]   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
[2089]  15.0  15.5  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0
[2101]  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  35.0  36.0  37.0
[2113]  38.0  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0
[2125]  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  59.0  60.0  61.0
[2137]  62.0  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  73.0
[2149]  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  83.0  84.0  85.0
[2161]  86.0  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  97.0
[2173]  98.0  99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0
[2185] 110.0 111.0 112.0 113.0 114.0 115.0 116.0 117.0 118.0 119.0 120.0 121.0
[2197] 122.0 123.0 124.0 125.0 126.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0
[2209] 134.0 135.0 136.0 137.0 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0
[2221] 146.0 147.0 148.0 149.0 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0
[2233] 158.0 159.0 160.0 161.0 162.0 163.0 164.0 165.0 166.0 167.0 168.0 169.0
[2245] 170.0 171.0 172.0 173.0 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0
[2257] 182.0 183.0 184.0 185.0 186.0 187.0 188.0 189.0 190.0 191.0 192.0 193.0
[2269] 194.0 195.0 196.0 197.0 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0
[2281] 206.0 207.0 208.0 209.0 210.0 211.0 212.0 213.0 214.0 215.0 216.0 217.0
[2293] 218.0 219.0 220.0 221.0 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0
[2305] 230.0 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.0 239.0 240.0 241.0
[2317] 242.0 243.0 244.0 245.0 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0
[2329] 254.0 255.0 256.0 257.0 258.0 259.0 260.0 261.0 262.0 263.0 264.0 265.0
[2341] 266.0 267.0 268.0 269.0 270.0 271.0 272.0 273.0 274.0 275.0 276.0
fish_data_csv [1:4,1:5] #grabs rows 1-4 and columns 1-5
  lift net    date year week
1    1   1 8/23/10   10    1
2    1   2 8/23/10   10    1
3    1   3 8/23/10   10    1
4    1   4 8/23/10   10    1

Working with Dates

How to calculate net set times for overnight sets

Random subset

Pooling variable

Functions

For Loops

Graphing

You can make wonderful publication figures in R. You can use base R or ggplot2 (part of the tidyverse package) to make figures. Similar to data wrangling, we find beginners to R find the ggplot2 syntax easier to learn for making figures. Thus, the code below uses the ggplot2 syntax.

There are times where you simply want to inspect the data. In those cases I don’t worry about all the formatting for a publication quality figure.

Below is code for a boxplot using ggplot. Notice the boxplot is not seperated by year that is because year is an integer and not a factor. A boxplot want the x variable as a factor. So, we need to change the data type for year.

cpue_boxplot <- ggplot(data = fish_data_csv, aes(x=year, y=catch_effort)) +
  geom_boxplot()

cpue_boxplot

We change year to factor and we get a better plot. Now we have a boxplot that is perfect for looking over the data. That is, we don’t believe it is publication quality. The defaults in ggplot2 are not suitable for publication.

fish_data_csv$year_factor <- as.factor(fish_data_csv$year)

cpue_boxplot_2 <- ggplot(data = fish_data_csv, aes(x = year_factor, y = catch_effort)) +
  geom_boxplot()

cpue_boxplot_2

The code below makes the figure publication quality. The code looks intimidating for a figure but it allows for us to control all aspects of the figure. You can make all the lines below ylab an object named mytheme, for example. Then you wouldn’t need to include that code for each figure.

For a detail explanation of the code below see:https://doi.org/10.1002/fsh.10272

pub_boxplot_cpue <- ggplot(data = fish_data_csv, aes(y=catch_effort, x=year_factor, fill = year_factor)) +
  geom_point (aes(y = , color = year_factor), position = position_jitter(width = .35), size = 1.5, alpha = .4) +
  geom_boxplot (width = .8, size = .5, outlier.shape = NA, alpha = 0, notch = FALSE) +
  stat_boxplot (geom ='errorbar', width = 0.2) +
  stat_summary (fun = mean, geom = "point", shape = 21, size = 3, fill = "black") +
  scale_color_manual (values = c("#8c510a", "#d8b365", "#A8925E", "#c7eae5", "#5ab4ac", "#01665e", "red")) +
  scale_y_continuous (limits = c(-0.01,60), expand = c(0,0),breaks=seq(0,60,5)) +
  scale_x_discrete (labels = c("2010", "2011", "2012", "2013", "2014", "2015", "2016")) +
  ylab("Catch per unit effort") + xlab("Year")  +
  theme_bw() + 
  theme (axis.title.y = element_text(size = 20, vjust = 4, colour = "black"), 
         axis.title.x = element_text(size = 20, vjust = -2, colour = "black"), 
         panel.border = element_blank(),
         legend.position = "none",
         plot.margin = unit(c(1.5, 1.5, 1.5, 1.5), "cm"),
         plot.title = element_text(size = 18, hjust = 0.5, vjust = 8),
         panel.grid.major = element_blank(), 
         panel.grid.minor = element_blank(),
         axis.ticks.x = element_line(size = 0.8),
         axis.ticks.y = element_line(size = 0.8),
         axis.ticks.length = unit(0.2,"cm"),
         axis.text.x = element_text(colour = "black",size = 18, angle = 0, vjust = -1, hjust = 0.5),
         axis.text.y = element_text(colour = "black",size = 18),
         axis.line = element_line(colour = "black", size = 0.8, lineend = "square"))
ggsave("figures/pub_boxplot_cpue.jpeg", width=11, height=8.5,dpi=300) 
pub_boxplot_cpue

Figure 1. Catch per effort by year for lake trout in Swan Lake.

Important

Note the warning messages in the console. The 19 rows missing are because of the 19 NAs for catch_effort. Always check to make sure the number removed makes sense with the data you are plotting.

pub_bivariate_cpue_depth <- ggplot(data = fish_data_csv, aes(y=catch_effort, x=depth)) +
  geom_point (size = 1.5, alpha = .5, shape = 2) +
  scale_y_continuous (limits = c(-0.01,60), expand = c(0,0),breaks=seq(0,60,5)) +
  scale_x_continuous (limits = c(0, 200), expand = c(0,0), breaks = seq(0,200,20)) +
  ylab("Catch per unit effort") + xlab("Depth (ft)")  +
  theme_bw() + 
  theme (axis.title.y = element_text(size = 20, vjust = 4, colour = "black"), 
         axis.title.x = element_text(size = 20, vjust = -2, colour = "black"), 
         panel.border = element_blank(),
         legend.position = "none",
         plot.margin = unit(c(1.5, 1.5, 1.5, 1.5), "cm"),
         plot.title = element_text(size = 18, hjust = 0.5, vjust = 8),
         panel.grid.major = element_blank(), 
         panel.grid.minor = element_blank(),
         axis.ticks.x = element_line(size = 0.8),
         axis.ticks.y = element_line(size = 0.8),
         axis.ticks.length = unit(0.2,"cm"),
         axis.text.x = element_text(colour = "black",size = 18, angle = 0, vjust = -1, hjust = 0.5),
         axis.text.y = element_text(colour = "black",size = 18),
         axis.line = element_line(colour = "black", size = 0.8, lineend = "square"))
ggsave("figures/pub_bivariate_cpue_depth .jpeg", width=11, height=8.5,dpi=600) 
pub_bivariate_cpue_depth 

Figure 2. Catch per effort by depth for lake trout in Swan Lake.

Common graphing issues

Reporting values

Often R will output a value fair beyond the level of precision for how the parameter was collected. For example, you measure a fish length to the nearest millimeter (e.g., 120 mm) in the field, but when you calculate a mean for a population R outputs the value to the one millionth place or greater. Thus, be careful about how you report output from R. I think it is best to report the value at the same level of precision as originally collected.

Excellent Websites

Quarto

Mangiafico: Stats Ecology Handbook

R Companion Handbook of Biological Statistics

Statistics for Ecologists

Big Book of R

Tidyverse guide

Fundamentals of Data Visualization

Beautiful plots in R

Data to Viz

R charts

ColorBrewer 2.0

Adobe color wheel

Our world in data

Modern R with tidyverse

RStudio Cheatsheets

https://www.rstudio.com/resources/cheatsheets/

Keyboard Shortcuts

Comment large blocks (Mac); highlight section then shift+cmd+C same to remove comment large sections.

%>% symbol (Mac); shift+cmd+M