library (psych) #some good summary stats functions
library (tidyverse) #has my favorite graphing package in it: ggplot2
library (FSA) #fisheries stock assessment package by Dr. Derek Ogle
library (DescTools) #some good summary stats functions
library (confintr) #calculates confidence intervals for many parameters
library (kableExtra) #for making tables
What is this Document
We wrote this document for incoming students that have no or little experience with data management, R, and RStudio. We think of this document as a beginner’s guide or quick start guide. We remember struggling with some of the most basic aspects of file management and R. We hope to help you avoid those hurdles. Given we see this as a beginner’s guide or quick start guide, we are trying to keep the text to a minimum for each section. This document is not a comprehensive guide to using R, RStudio, or analyzing fisheries data.
This is a living document and we hope to continue to update the document as we receive comments and new ideas.
Load R and RStudio
We use R with RStudio. RStudio is an integrated development environment for R and Python. You can use R without RStudio but you will find it very frustrating unless you are a serious coder. You can load R from the CRAN site for your operating system https://cran.r-project.org/. We think it is a good idea to check the CRAN site periodically for R updates and keep your R and RStudio up to date. For RStudio desktop you can load it from the RStudio website https://www.rstudio.com/products/rstudio/.
Once R and RStudio are loaded you need to do a couple things in RStudio. First, it is our understanding that you do not want to have R save the Workspace. In RStudio, go to Preferences and in the General pane under Workspace uncheck “Restore .RData into workspace at startup” and select Never in the “Save workspace to .RData on exit”. If you do not do this all your current variables and functions are saved to .RData and when you reopen R from that working directory the workspace will be loaded with the variables and functions. We prefer to have a clean workspace when we reopen R and RStudio so we know exactly what we are starting with each time. It avoids confusion when you have many variables and functions.
Second, you should customize RStudio to your liking. In Preferences select the Appearance pane you can select a RStudio theme, Editor font, Editor font size, and Editor theme. In the Pane Layout pane you can select how you want the layout of the panels. There may be a time when you want to add a column and you can do this by clicking the “Add Column” button. For example, we find it helpful sometimes to have two Source panes.
RStudio Projects
We think it is a good habit to create RStudio Projects for your analyses. RStudio Projects help with workflow and keep all the files associate with a project (using relative file paths) together and allow for version control using GIT (see Version Control) to name a few benefits. Having all the files for a project “connected” is important when you are working on complex research projects and when you may need to leave a project for a bit and come back. In that case, the RStudio Project has everything in one location for you. In addition, RStudio Projects make sharing R scripts among colleagues easy. That is, if you send a colleague a RStudio Project and the data files they can run the code as you did and obtain the exact same results. In the past we would use set working directory ‘setwd’ and if that was not the same on a colleagues computer they couldn’t run the code, which was frustrating. It is advised to avoid the ‘setwd’ approach to file management. RStudio Projects really help with workflow and we think it should be the default for all R users.
You can have as many Projects as you want, but we usually have one Project for a research project or major analysis within a research project. Setting up a RStudio Project is very simple. First, make a folder on your computer or SharePoint (e.g., OneDrive) where you want the Project to be stored. Second, in RStudio go to File and the second line down it should say “New Project.” Click on New Project and follow the prompts, that is it. The Project file contains information that helps maintain all the associated files. Keep in mind a RStudio Project is not a R script, it is simply a “file management” file.
Version Control
You may want to skip this section if you a new to R. When using R projects you can have set version control by going to Tools and then click on Version Control and follow the prompts. See here for more information: https://nceas.github.io/oss-lessons/version-control/4-getting-started-with-git-in-RStudio.html#version_control_using_rstudio You can also link to your own GitHub site, but we don’t have that figured out yet.
Quarto (was R Markdown)
This html file was created using Quarto, which is the “new” version of R Markdown. R Markdown will continue to be supported by RStudio but any new features will only be in Quarto. To learn more about Quarto see https://quarto.org. We use Quarto for all our analyses even if we do not initially plan to render it to a html, pdf, or Word document. It makes that option available in the future if we want to share the output with a colleague in a document form. The coding is the same, you simply write your code in chunks. You can then add narrative above or below the chunks to explain the output. There are a lot of advantages to using Quarto.
The screen shot below shows the YAML header between the “---” in Quarto, which is a must and spacing is very important. If your Quarto file won’t render it is mostly likely because your YAML has a typo. You can also see we have an outline, that is because we are using headers as indicated by the “##”. This makes navigating the file very simple because you can click on a header to jump to that section.
We think making a html output with a table of contents makes for great reporting, as in this document.
The screen shot below shows the chunk that contains the code and any information you want specific to that chunk, which is preceded by the #| You can add a chunk at any time by clicking on the Green square with the plus symbol in the upper right corner of the Source pane.
Packages and Libraries
What in the heck are packages and libraries? Packages are a collection of R functions, code, and sometimes sample data that someone put together to help you wrangle data, conduct analyses, or graph to name a few. This is the real beauty of R because it is a world community of people helping each other and is probably why R is so popular. There were about 16,000 packages available as of November 2020. One package we use every time is called “tidyverse”. It is a collection of R functions and other packages that we find very useful for data analyses and graphing. To obtain an R package, we go to the Tools tab in RStudio and click on “Install Packages.” Type in the name of the package you want to install and you should see the package being install in the Console window.
Now that you have a package installed you need to tell R you want to use that package in your analysis. You do this by loading the library in your R session (e.g., “library(tidyverse)”). If you close R and reopen you need to reload your package in your R session because we are not saving the R session (see second paragraph in [R and RStudio]). Below we loaded two packages (tidyverse and FSA) in my library. Note that loading tidyverse also loaded additional packages and there are some conflicts with other packages. For the FSA package you have additional information about citing the package. You should always cite the packages you using in your publications and reports. Finally, it is a good idea to update your packages, which is easily done under the Tools tab in RStudio.
File Structure, File and Variable Naming
We recommend having your directory setup such that the data, figures, and perhaps scripts are within folders in your working directory. It is also a good idea to pick a standardization for folder and file naming. We think this is good practice beyond naming files for R, we use the same file naming method for folders and Word documents. We like to use snake case (fish_data_050922) because it is easier for me to read than camel case (fishData050922) or pascal case (FishData050922). You will notice the similarity in the naming is the lack of spaces. It is good practice in coding to avoid putting spaces in names. We also like to add the date that we worked on the file at the end. For me, this helps with version control. There are much better ways to keep track of versions, such as GIT, but we have yet to mastered that with R and RStudio (although see Version Control). Disk space is rarely an issue anymore and my data files are often not that large so having multiple versions with a date at the end is often not problematic.
Data File Structure in Excel
If you are using Excel to input your “raw” data, then you need to be careful about how you create the spreadsheet. You want to have a very simple spreadsheet. That is, avoid complex headers, complex names, and any analyses within the spreadsheet. If you do not keep it simple, you will likely encounter errors when loading the file into R. The spreadsheet below is what mine typically look like. Notice the snake case for the variable naming and everything is lower case. This helps avoid typos and errors in my code. We usually try to only have one sheet, but you can have multiple sheets and load each sheet into R. One of the largest time sinks when working with someone’s data is formatting the spreadsheet to work in R. Also, avoid having more than one tab per excel file. You can import multiple tabs in Excel but it can make tracking files a bit more difficult.
Importing Data
We think the first thing most people want to do is import data and start working on data they have in an excel spreadsheet. For example, “I have a data spreadsheet in excel and I want to bring it into R and calculate the median and quantiles, how do I do that?” One of the most frustrating things for us when we were first learning R was how the heck do we get data into R. There are several ways, but we think the most fool proof is to save the excel file as a .csv, then use read.csv. You can load directly from an excel file (use readxl package), which has become less problematic through time because we believe the package to load excel files has improved. Below we will show you how to import some data using both methods.
You must download my files and place them in a folder called “data” in the same location as the R project. It should look like this in the files tab (the excel files are in the data folder).
Now that the data are in the data folder let’s load them into R.
library(readxl)
<- read.csv("data/lake_trout_catch_net.csv")
fish_data_csv
<- read_xlsx ("data/lake_trout_catch_net.xlsx", na="NA") #because the data has NA (for data not available in some rows we need the na="NA") fish_data_xlsx
It really is that simple. If you look in the Environment tab in RStudio you should see two Data files.
Coding
Check out https://style.tidyverse.org/ for tips on good coding. There is also the styler and lintr packages to support the style guide presented in the style.tidyverse link.
Data Wrangling
Common data wrangling
The beauty and the frustration with R is that you can do the same analyses (e.g., data wrangling and graphing) with different code structure and packages. The examples we give below are what work for me and may not be the most efficient. You may have different ways to conduct an analysis and that is fine because the beauty of R is that all the analytical steps are recorded in your script. We am always learning new ways to analyze and graph data, which is fun for me. Please reach out if you find a mistake or a more efficient method of an example we present below.
Each of the grey boxes in the html document represent a chunk, that is chunk of code that we thought was worth separating out to make it more readable. You will notice in the tan boxes there is text preceded by a “#”, that tells R to ignore that text. The text following the “#” is what we use to annotate my code so we can easily recall what we did. Think of it as notes. Below we will get the structure of the fish_data data frame and calculate the quantile and min-max values for all the variables in the data frame fish_data. This is always my first step because we can see how R defines my variables into data types (i.e., as a character, numeric (contains decimal and whole number), or integer (contains only whole number)) and look for mistakes in my data by evaluating the min-max values.
Checking the Data
Recall above we loaded two files, one from a .csv and called it fish_data_csv and one from a .xlsx and called it fish_data_xlsx. Let’s start playing with the fish_data_csv file. You can see the files in the Environment tab.
Below are a few things I do to always check the data before I start any analyses. The annotations within the chunk briefly describe what each line of code is doing.
str(fish_data_csv) # check out the structure and data types
'data.frame': 2351 obs. of 19 variables:
$ lift : chr "1" "1" "1" "1" ...
$ net : num 1 2 3 4 5 6 6.5 7 7.5 8 ...
$ date : chr "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
$ year : int 10 10 10 10 10 10 10 10 10 10 ...
$ week : int 1 1 1 1 1 1 1 1 1 1 ...
$ north_South : chr "South" "South" "South" "South" ...
$ temp : num 64 64 64 64 64 64 64 64 64 64 ...
$ morning_evening_night: chr "morning" "morning" "morning" "morning" ...
$ set_time : chr "4:40" "4:40" "4:40" "4:40" ...
$ pull_time : chr "8:35" "8:35" "8:35" "8:35" ...
$ duration : num NA NA NA NA NA NA NA NA NA NA ...
$ mesh_inch : num 2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
$ mesh : num 63.5 50.8 57.1 50.8 63.5 ...
$ depth : num 101 96 92 80.5 84 73 83 68 83 84.5 ...
$ lat : chr "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
$ long : chr "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
$ total_catch : int 2 38 8 6 5 2 7 14 120 154 ...
$ catch_200 : int 2 38 8 6 5 0 0 14 120 154 ...
$ catch_effort : num NA NA NA NA NA NA NA NA NA NA ...
head(fish_data_csv, n = 10) # shows first 10 lines, you can pick you own n
lift net date year week north_South temp morning_evening_night set_time
1 1 1.0 8/23/10 10 1 South 64 morning 4:40
2 1 2.0 8/23/10 10 1 South 64 morning 4:40
3 1 3.0 8/23/10 10 1 South 64 morning 4:40
4 1 4.0 8/23/10 10 1 South 64 morning 4:40
5 1 5.0 8/23/10 10 1 South 64 morning 4:40
6 1 6.0 8/23/10 10 1 South 64 morning 4:40
7 1 6.5 8/23/10 10 1 South 64 morning 4:40
8 1 7.0 8/23/10 10 1 South 64 morning 4:40
9 1 7.5 8/23/10 10 1 South 64 morning 4:40
10 1 8.0 8/23/10 10 1 South 64 morning 4:40
pull_time duration mesh_inch mesh depth lat long total_catch
1 8:35 NA 2.50 63.50 101.0 N47.55714 W113.51997 2
2 8:35 NA 2.00 50.80 96.0 N47.55502 W113.51975 38
3 8:35 NA 2.25 57.15 92.0 N47.92443 W113.86772 8
4 8:35 NA 2.00 50.80 80.5 N47.92434 W113.86467 6
5 8:35 NA 2.50 63.50 84.0 N47.92251 W113.86462 5
6 8:35 NA 1.50 38.10 73.0 N47.92295 W113.86238 2
7 8:35 NA 1.50 38.10 83.0 N47.92062 W113.86237 7
8 8:35 NA 1.75 44.45 68.0 N47.92151 W113.86058 14
9 8:35 NA 1.75 44.45 83.0 N47.92033 W113.85909 120
10 8:35 NA 2.00 50.80 84.5 N47.91946 W113.85773 154
catch_200 catch_effort
1 2 NA
2 38 NA
3 8 NA
4 6 NA
5 5 NA
6 0 NA
7 0 NA
8 14 NA
9 120 NA
10 154 NA
tail(fish_data_csv, n = 10) # shows last 10 lines, you can pick your own n
lift net date year week north_South temp morning_evening_night set_time
2342 22 267 9/2/16 16 3 South 62 overnight 17:30
2343 22 268 9/2/16 16 3 South 62 overnight 17:30
2344 22 269 9/2/16 16 3 South 62 overnight 17:30
2345 22 270 9/2/16 16 3 South 62 overnight 17:30
2346 22 271 9/2/16 16 3 South 62 overnight 17:30
2347 22 272 9/2/16 16 3 South 62 overnight 17:30
2348 22 273 9/2/16 16 3 South 62 overnight 17:30
2349 22 274 9/2/16 16 3 South 62 overnight 17:30
2350 22 275 9/2/16 16 3 South 62 overnight 17:30
2351 22 276 9/2/16 16 3 South 62 overnight 17:30
pull_time duration mesh_inch mesh depth lat long total_catch
2342 9:47 16.28 2.25 57.15 80 47.92604 113.86982 25
2343 10:00 16.50 1.50 38.10 86 47.92611 113.86729 5
2344 10:10 16.67 2.50 63.50 76 47.92419 113.86733 14
2345 10:21 16.85 2.00 50.80 88 47.92559 113.8643 23
2346 10:32 17.03 1.75 44.45 80 47.92386 113.86494 12
2347 10:42 17.20 1.50 38.10 81 47.92359 113.86312 16
2348 10:53 17.38 2.25 57.15 86 47.92379 113.86153 8
2349 11:04 17.57 2.00 50.80 83 47.92251 113.86059 26
2350 11:16 17.77 2.75 69.85 85 47.92141 113.85925 21
2351 11:26 17.93 1.75 44.45 81 47.92054 113.85809 15
catch_200 catch_effort
2342 25 1.54
2343 4 0.24
2344 14 0.84
2345 23 1.36
2346 12 0.70
2347 9 0.52
2348 8 0.46
2349 26 1.48
2350 21 1.18
2351 15 0.84
summary(fish_data_csv) # Quartiles from base R
lift net date year
Length:2351 Min. : 1.0 Length:2351 Min. :10.00
Class :character 1st Qu.: 84.0 Class :character 1st Qu.:11.00
Mode :character Median :169.0 Mode :character Median :13.00
Mean :172.2 Mean :12.83
3rd Qu.:252.0 3rd Qu.:14.00
Max. :399.0 Max. :16.00
week north_South temp morning_evening_night
Min. :1.000 Length:2351 Min. :58.00 Length:2351
1st Qu.:1.000 Class :character 1st Qu.:62.00 Class :character
Median :2.000 Mode :character Median :64.00 Mode :character
Mean :1.997 Mean :63.83
3rd Qu.:3.000 3rd Qu.:66.00
Max. :3.000 Max. :73.00
NA's :12
set_time pull_time duration mesh_inch
Length:2351 Length:2351 Min. : 2.000 Min. :1.500
Class :character Class :character 1st Qu.: 3.580 1st Qu.:1.750
Mode :character Mode :character Median : 4.742 Median :2.000
Mean : 7.004 Mean :2.155
3rd Qu.:10.080 3rd Qu.:2.250
Max. :26.980 Max. :5.000
NA's :19 NA's :1
mesh depth lat long
Min. : 38.10 Min. : 34.0 Length:2351 Length:2351
1st Qu.: 44.45 1st Qu.: 87.0 Class :character Class :character
Median : 50.80 Median :100.0 Mode :character Mode :character
Mean : 54.74 Mean :103.1
3rd Qu.: 57.20 3rd Qu.:121.0
Max. :127.00 Max. :170.0
NA's :1 NA's :12
total_catch catch_200 catch_effort
Min. : 0.00 Min. : 0.00 Min. : 0.000
1st Qu.: 4.00 1st Qu.: 3.00 1st Qu.: 0.540
Median : 11.00 Median : 11.00 Median : 1.720
Mean : 22.04 Mean : 21.09 Mean : 3.776
3rd Qu.: 26.00 3rd Qu.: 25.00 3rd Qu.: 4.605
Max. :554.00 Max. :378.00 Max. :57.310
NA's :1 NA's :19
describe(fish_data_csv) # summary stats using the pysch package
vars n mean sd median trimmed mad min
lift* 1 2351 15.97 9.12 15.00 15.96 11.86 1.0
net 2 2351 172.22 103.61 169.00 169.27 124.54 1.0
date* 3 2351 60.07 33.54 61.00 60.13 41.51 1.0
year 4 2351 12.83 1.92 13.00 12.79 2.97 10.0
week 5 2351 2.00 0.81 2.00 2.00 1.48 1.0
north_South* 6 2351 1.52 0.50 2.00 1.53 0.00 1.0
temp 7 2339 63.83 2.71 64.00 63.77 2.97 58.0
morning_evening_night* 8 2351 2.74 1.66 3.00 2.67 2.97 1.0
set_time* 9 2346 35.03 17.98 30.00 35.28 22.24 1.0
pull_time* 10 2346 346.73 196.22 333.50 352.88 252.78 1.0
duration 11 2332 7.00 4.45 4.74 6.35 2.61 2.0
mesh_inch 12 2350 2.16 0.67 2.00 2.05 0.37 1.5
mesh 13 2350 54.74 16.95 50.80 52.03 9.41 38.1
depth 14 2339 103.06 20.13 100.00 103.31 23.72 34.0
lat* 15 2350 981.54 566.81 984.00 978.26 722.77 1.0
long* 16 2350 1011.39 580.61 1010.50 1008.53 736.11 1.0
total_catch 17 2351 22.04 34.51 11.00 15.13 13.34 0.0
catch_200 18 2350 21.09 32.09 11.00 14.50 13.34 0.0
catch_effort 19 2332 3.78 5.76 1.72 2.54 2.16 0.0
max range skew kurtosis se
lift* 32.00 31.00 0.03 -1.22 0.19
net 399.00 398.00 0.18 -1.01 2.14
date* 117.00 116.00 -0.05 -1.16 0.69
year 16.00 6.00 0.13 -1.14 0.04
week 3.00 2.00 0.00 -1.46 0.02
north_South* 2.00 1.00 -0.10 -1.99 0.01
temp 73.00 15.00 0.27 0.14 0.06
morning_evening_night* 6.00 5.00 0.26 -1.47 0.03
set_time* 69.00 68.00 0.06 -1.11 0.37
pull_time* 663.00 662.00 -0.16 -1.22 4.05
duration 26.98 24.98 1.13 0.72 0.09
mesh_inch 5.00 3.50 2.47 7.64 0.01
mesh 127.00 88.90 2.47 7.64 0.35
depth 170.00 136.00 -0.08 -0.76 0.42
lat* 1991.00 1990.00 0.03 -1.15 11.69
long* 2047.00 2046.00 0.04 -1.15 11.98
total_catch 554.00 554.00 5.15 45.68 0.71
catch_200 378.00 378.00 4.32 29.24 0.66
catch_effort 57.31 57.31 3.52 17.27 0.12
describeBy(fish_data_csv, group = fish_data_csv$year) # summary stats by group (year) using pysch package
Descriptive statistics by group
group: 10
vars n mean sd median trimmed mad min
lift* 1 317 15.90 8.89 16.00 15.90 10.38 1.0
net 2 317 155.46 90.43 155.00 155.42 115.64 1.0
date* 3 317 90.12 21.12 94.00 91.42 26.69 53.0
year 4 317 10.00 0.00 10.00 10.00 0.00 10.0
week 5 317 2.01 0.79 2.00 2.01 1.48 1.0
north_South* 6 317 1.45 0.50 1.00 1.44 0.00 1.0
temp 7 317 61.10 2.07 60.50 61.05 2.22 58.0
morning_evening_night* 8 317 2.62 1.53 3.00 2.53 2.97 1.0
set_time* 9 317 34.02 18.52 37.00 34.30 23.72 2.0
pull_time* 10 317 290.73 189.65 266.00 285.55 219.42 2.0
duration 11 303 8.04 4.85 6.32 7.33 3.63 2.1
mesh_inch 12 317 2.32 0.76 2.00 2.20 0.37 1.5
mesh 13 317 59.01 19.24 50.80 55.98 9.41 38.1
depth 14 317 106.23 22.02 104.00 106.76 28.17 53.0
lat* 15 317 1681.23 209.48 1760.00 1686.38 240.18 1338.0
long* 16 317 1727.78 207.68 1794.00 1732.83 249.08 1384.0
total_catch 17 317 31.61 53.46 12.00 19.16 14.83 0.0
catch_200 18 317 29.51 52.47 10.00 17.31 13.34 0.0
catch_effort 19 303 3.47 5.16 1.46 2.34 1.82 0.0
max range skew kurtosis se
lift* 31.0 30.0 -0.02 -1.14 0.50
net 311.0 310.0 0.00 -1.21 5.08
date* 116.0 63.0 -0.47 -1.24 1.19
year 10.0 0.0 NaN NaN 0.00
week 3.0 2.0 -0.02 -1.40 0.04
north_South* 2.0 1.0 0.21 -1.96 0.03
temp 65.0 7.0 0.33 -1.14 0.12
morning_evening_night* 5.0 4.0 0.33 -1.23 0.09
set_time* 69.0 67.0 -0.11 -1.41 1.04
pull_time* 619.0 617.0 0.32 -1.26 10.65
duration 23.4 21.3 1.21 0.81 0.28
mesh_inch 5.0 3.5 1.67 2.92 0.04
mesh 127.0 88.9 1.67 2.92 1.08
depth 140.0 87.0 -0.10 -1.18 1.24
lat* 1991.0 653.0 -0.26 -1.47 11.77
long* 2038.0 654.0 -0.23 -1.49 11.66
total_catch 378.0 378.0 3.61 16.05 3.00
catch_200 378.0 378.0 3.74 17.49 2.95
catch_effort 30.0 30.0 2.61 7.71 0.30
------------------------------------------------------------
group: 11
vars n mean sd median trimmed mad min
lift* 1 399 16.05 8.78 16.00 16.06 10.38 1.00
net 2 399 200.00 115.33 200.00 200.00 148.26 1.00
date* 3 399 88.12 22.89 90.00 89.20 34.10 49.00
year 4 399 11.00 0.00 11.00 11.00 0.00 11.00
week 5 399 2.01 0.78 2.00 2.02 1.48 1.00
north_South* 6 399 1.50 0.50 2.00 1.50 0.00 1.00
temp 7 399 63.97 2.79 65.00 64.01 2.97 59.00
morning_evening_night* 8 399 2.60 1.67 3.00 2.50 2.97 1.00
set_time* 9 399 29.93 21.67 26.00 29.21 31.13 3.00
pull_time* 10 399 410.35 179.63 424.00 421.65 237.22 62.00
duration 11 399 7.95 2.87 9.20 8.08 2.37 2.15
mesh_inch 12 399 2.23 0.66 2.00 2.13 0.37 1.50
mesh 13 399 56.58 16.74 50.80 54.14 9.41 38.10
depth 14 392 101.61 19.36 100.00 101.54 22.24 53.00
lat* 15 399 1652.75 190.36 1655.00 1652.64 194.22 1.00
long* 16 399 1706.35 178.04 1702.00 1702.77 194.22 1382.00
total_catch 17 399 12.94 18.79 6.00 9.26 7.41 0.00
catch_200 18 399 12.67 18.73 6.00 8.95 7.41 0.00
catch_effort 19 399 2.00 3.54 0.77 1.26 1.01 0.00
max range skew kurtosis se
lift* 31.00 30.00 -0.03 -1.15 0.44
net 399.00 398.00 0.00 -1.21 5.77
date* 117.00 68.00 -0.30 -1.35 1.15
year 11.00 0.00 NaN NaN 0.00
week 3.00 2.00 -0.02 -1.38 0.04
north_South* 2.00 1.00 0.00 -2.00 0.03
temp 69.00 10.00 -0.22 -0.94 0.14
morning_evening_night* 5.00 4.00 0.39 -1.46 0.08
set_time* 65.00 62.00 0.15 -1.56 1.08
pull_time* 663.00 601.00 -0.27 -0.97 8.99
duration 12.28 10.13 -0.44 -1.39 0.14
mesh_inch 5.00 3.50 1.85 4.20 0.03
mesh 127.00 88.90 1.85 4.20 0.84
depth 141.00 88.00 0.01 -0.65 0.98
lat* 1990.00 1989.00 -1.45 12.64 9.53
long* 2047.00 665.00 0.19 -0.83 8.91
total_catch 148.00 148.00 3.34 15.87 0.94
catch_200 148.00 148.00 3.40 16.26 0.94
catch_effort 29.24 29.24 4.33 24.11 0.18
------------------------------------------------------------
group: 12
vars n mean sd median trimmed mad min
lift* 1 374 16.82 8.83 17.00 17.00 11.12 1.0
net 2 374 192.55 111.39 195.50 192.82 145.29 1.0
date* 3 374 47.35 27.89 46.00 46.48 39.29 7.0
year 4 374 12.00 0.00 12.00 12.00 0.00 12.0
week 5 374 1.97 0.82 2.00 1.96 1.48 1.0
north_South* 6 374 1.52 0.50 2.00 1.53 0.00 1.0
temp 7 374 63.86 2.66 63.50 63.49 2.22 59.0
morning_evening_night* 8 374 2.51 1.69 1.00 2.39 0.00 1.0
set_time* 9 373 34.88 14.13 32.00 33.79 17.79 19.0
pull_time* 10 373 349.13 179.14 309.00 356.14 220.91 1.0
duration 11 373 5.71 3.34 4.07 5.33 1.66 2.0
mesh_inch 12 373 2.03 0.38 2.00 2.02 0.37 1.5
mesh 13 373 51.59 9.60 50.80 51.23 9.49 38.1
depth 14 374 106.78 18.69 107.00 107.36 20.76 55.0
lat* 15 373 733.94 365.89 709.00 745.35 447.75 12.0
long* 16 373 760.32 359.88 714.00 764.42 446.26 30.0
total_catch 17 374 27.84 45.23 14.00 19.15 16.31 0.0
catch_200 18 373 26.03 36.02 14.00 18.57 16.31 0.0
catch_effort 19 373 5.43 7.58 2.67 3.80 3.22 0.0
max range skew kurtosis se
lift* 31.00 30.00 -0.13 -1.18 0.46
net 382.00 381.00 -0.03 -1.23 5.76
date* 96.00 89.00 0.15 -1.29 1.44
year 12.00 0.00 NaN NaN 0.00
week 3.00 2.00 0.06 -1.51 0.04
north_South* 2.00 1.00 -0.10 -2.00 0.03
temp 73.00 14.00 1.59 3.67 0.14
morning_evening_night* 5.00 4.00 0.48 -1.44 0.09
set_time* 63.00 44.00 0.43 -1.21 0.73
pull_time* 619.00 618.00 -0.10 -1.14 9.28
duration 13.67 11.67 0.93 -0.78 0.17
mesh_inch 2.75 1.25 0.24 -0.98 0.02
mesh 69.90 31.80 0.24 -0.98 0.50
depth 137.00 82.00 -0.22 -0.89 0.97
lat* 1325.00 1313.00 -0.14 -1.16 18.95
long* 1369.00 1339.00 -0.03 -1.18 18.63
total_catch 554.00 554.00 5.79 52.77 2.34
catch_200 288.00 288.00 3.26 14.53 1.86
catch_effort 46.58 46.58 2.79 9.19 0.39
------------------------------------------------------------
group: 13
vars n mean sd median trimmed mad min
lift* 1 347 16.03 8.95 16.00 16.04 10.38 1.00
net 2 347 174.00 100.31 174.00 174.00 128.99 1.00
date* 3 347 44.61 26.91 43.00 44.02 38.55 4.00
year 4 347 13.00 0.00 13.00 13.00 0.00 13.00
week 5 347 1.97 0.81 2.00 1.96 1.48 1.00
north_South* 6 347 1.55 0.50 2.00 1.57 0.00 1.00
temp 7 347 65.93 2.20 66.00 65.87 2.97 62.00
morning_evening_night* 8 347 2.73 1.80 3.00 2.66 2.97 1.00
set_time* 9 347 33.50 16.73 27.00 34.13 5.93 1.00
pull_time* 10 347 380.16 175.87 394.00 396.73 174.95 5.00
duration 11 347 5.93 3.30 4.25 5.56 1.73 2.35
mesh_inch 12 347 2.00 0.37 2.00 1.97 0.37 1.50
mesh 13 347 50.78 9.36 50.80 50.14 9.41 38.10
depth 14 347 105.04 19.13 102.00 105.16 23.72 58.00
lat* 15 347 671.26 389.32 710.00 672.87 518.91 5.00
long* 16 347 694.35 391.63 719.00 693.66 510.01 5.00
total_catch 17 347 20.32 24.11 14.00 15.91 13.34 0.00
catch_200 18 347 19.66 23.56 13.00 15.25 13.34 0.00
catch_effort 19 347 4.32 6.42 2.08 2.95 2.24 0.00
max range skew kurtosis se
lift* 31.00 30.00 -0.01 -1.15 0.48
net 347.00 346.00 0.00 -1.21 5.39
date* 92.00 88.00 0.15 -1.28 1.44
year 13.00 0.00 NaN NaN 0.00
week 3.00 2.00 0.06 -1.48 0.04
north_South* 2.00 1.00 -0.21 -1.96 0.03
temp 71.00 9.00 0.24 -0.72 0.12
morning_evening_night* 5.00 4.00 0.27 -1.72 0.10
set_time* 62.00 61.00 -0.10 -0.46 0.90
pull_time* 619.00 614.00 -0.57 -0.51 9.44
duration 15.47 13.12 0.89 -0.49 0.18
mesh_inch 2.75 1.25 0.45 -0.65 0.02
mesh 69.85 31.75 0.45 -0.65 0.50
depth 137.00 79.00 -0.02 -1.14 1.03
lat* 1316.00 1311.00 -0.09 -1.37 20.90
long* 1367.00 1362.00 -0.02 -1.30 21.02
total_catch 222.00 222.00 3.31 17.27 1.29
catch_200 212.00 212.00 3.22 16.07 1.26
catch_effort 52.40 52.40 3.69 17.96 0.34
------------------------------------------------------------
group: 14
vars n mean sd median trimmed mad min
lift* 1 382 16.56 8.91 17.00 16.65 10.38 1.00
net 2 382 189.89 109.84 189.50 189.76 140.85 1.00
date* 3 382 41.69 25.78 44.00 41.34 38.55 2.00
year 4 382 14.00 0.00 14.00 14.00 0.00 14.00
week 5 382 2.01 0.81 2.00 2.02 1.48 1.00
north_South* 6 382 1.52 0.50 2.00 1.52 0.00 1.00
temp 7 382 63.26 2.10 64.00 63.40 2.97 58.00
morning_evening_night* 8 382 2.36 1.42 3.00 2.20 2.97 1.00
set_time* 9 378 38.57 18.40 45.00 39.33 25.20 8.00
pull_time* 10 378 314.96 196.01 298.50 315.47 279.47 4.00
duration 11 378 5.31 3.00 4.15 4.86 1.42 2.07
mesh_inch 12 382 2.22 0.86 2.00 2.04 0.37 1.50
mesh 13 382 56.40 21.84 50.80 51.78 9.41 38.10
depth 14 377 100.09 21.27 98.00 100.29 23.72 46.00
lat* 15 382 676.63 410.54 737.50 679.51 555.23 6.00
long* 16 382 706.81 399.51 748.00 711.41 487.78 6.00
total_catch 17 382 18.32 23.85 11.00 13.81 13.34 0.00
catch_200 18 382 18.02 23.68 11.00 13.52 12.60 0.00
catch_effort 19 378 3.99 5.03 2.02 3.01 2.51 0.00
max range skew kurtosis se
lift* 32.00 31.00 -0.07 -1.16 0.46
net 379.00 378.00 0.01 -1.21 5.62
date* 86.00 84.00 0.06 -1.29 1.32
year 14.00 0.00 NaN NaN 0.00
week 3.00 2.00 -0.02 -1.50 0.04
north_South* 2.00 1.00 -0.07 -2.00 0.03
temp 66.00 8.00 -0.29 -0.54 0.11
morning_evening_night* 5.00 4.00 0.55 -0.90 0.07
set_time* 68.00 60.00 -0.12 -1.41 0.95
pull_time* 619.00 615.00 0.03 -1.22 10.08
duration 17.88 15.81 1.53 1.99 0.15
mesh_inch 5.00 3.50 2.36 5.12 0.04
mesh 127.00 88.90 2.36 5.12 1.12
depth 170.00 124.00 0.01 -0.75 1.10
lat* 1334.00 1328.00 -0.11 -1.30 21.00
long* 1374.00 1368.00 -0.13 -1.17 20.44
total_catch 233.00 233.00 3.50 20.35 1.22
catch_200 233.00 233.00 3.55 20.95 1.21
catch_effort 37.85 37.85 2.43 8.21 0.26
------------------------------------------------------------
group: 15
vars n mean sd median trimmed mad min
lift* 1 255 14.80 9.84 13.00 14.51 13.34 1.00
net 2 255 128.00 73.76 128.00 128.00 94.89 1.00
date* 3 255 43.25 27.09 39.00 42.25 37.06 3.00
year 4 255 15.00 0.00 15.00 15.00 0.00 15.00
week 5 255 2.01 0.81 2.00 2.01 1.48 1.00
north_South* 6 255 1.59 0.49 2.00 1.61 0.00 1.00
temp 7 255 63.25 1.76 62.00 63.07 0.00 61.00
morning_evening_night* 8 255 3.00 1.54 3.00 3.00 2.97 1.00
set_time* 9 255 39.59 17.06 27.00 39.39 10.38 17.00
pull_time* 10 255 327.26 215.94 310.00 331.15 309.86 5.00
duration 11 255 7.70 6.30 4.23 6.89 1.70 2.27
mesh_inch 12 255 2.00 0.37 2.00 1.97 0.37 1.50
mesh 13 255 50.82 9.39 50.80 50.15 9.41 38.10
depth 14 255 102.43 17.67 98.00 101.75 19.27 65.00
lat* 15 255 614.78 397.87 489.00 604.05 512.98 17.00
long* 16 255 601.19 441.04 495.00 588.77 603.42 7.00
total_catch 17 255 22.48 29.18 13.00 16.91 14.83 0.00
catch_200 18 255 22.11 28.80 13.00 16.63 13.34 0.00
catch_effort 19 255 3.82 5.34 2.18 2.71 2.42 0.00
max range skew kurtosis se
lift* 31.00 30.00 0.34 -1.29 0.62
net 255.00 254.00 0.00 -1.21 4.62
date* 100.00 97.00 0.24 -0.97 1.70
year 15.00 0.00 NaN NaN 0.00
week 3.00 2.00 -0.02 -1.47 0.05
north_South* 2.00 1.00 -0.36 -1.88 0.03
temp 69.00 8.00 0.89 -0.04 0.11
morning_evening_night* 5.00 4.00 0.00 -1.31 0.10
set_time* 62.00 45.00 0.34 -1.71 1.07
pull_time* 618.00 613.00 -0.09 -1.51 13.52
duration 26.98 24.71 1.13 -0.01 0.39
mesh_inch 2.75 1.25 0.48 -0.63 0.02
mesh 69.85 31.75 0.48 -0.63 0.59
depth 136.00 71.00 0.37 -1.03 1.11
lat* 1321.00 1304.00 0.23 -1.43 24.92
long* 1353.00 1346.00 0.20 -1.54 27.62
total_catch 282.00 282.00 3.80 24.81 1.83
catch_200 281.00 281.00 3.88 25.86 1.80
catch_effort 37.98 37.98 3.39 14.92 0.33
------------------------------------------------------------
group: 16
vars n mean sd median trimmed mad min
lift* 1 277 14.98 9.95 13.00 14.75 13.34 1.00
net 2 277 138.06 80.02 138.00 138.00 102.30 1.00
date* 3 277 62.62 36.48 87.00 64.66 29.65 1.00
year 4 277 16.00 0.00 16.00 16.00 0.00 16.00
week 5 277 2.01 0.82 2.00 2.01 1.48 1.00
north_South* 6 277 1.56 0.50 2.00 1.58 0.00 1.00
temp 7 265 65.42 2.14 65.00 65.31 2.97 62.00
morning_evening_night* 8 277 3.71 1.62 4.00 3.81 1.48 1.00
set_time* 9 277 36.58 15.65 31.00 35.52 5.93 20.00
pull_time* 10 277 335.35 222.01 377.00 340.99 296.52 5.00
duration 11 277 9.26 6.05 4.67 9.10 3.34 2.25
mesh_inch 12 277 2.27 0.93 2.00 2.06 0.37 1.50
mesh 13 277 57.75 23.63 50.80 52.25 9.41 38.10
depth 14 277 98.62 20.98 98.00 99.50 22.24 34.00
lat* 15 277 694.20 360.18 704.00 696.79 413.65 2.00
long* 16 277 723.40 376.16 740.00 728.36 440.33 1.00
total_catch 17 277 23.26 31.66 11.00 16.45 14.83 0.00
catch_200 18 277 22.04 29.85 10.00 15.65 13.34 0.00
catch_effort 19 277 3.42 5.86 1.34 2.22 1.81 0.00
max range skew kurtosis se
lift* 31.00 30.00 0.28 -1.35 0.60
net 276.00 275.00 0.00 -1.22 4.81
date* 107.00 106.00 -0.29 -1.57 2.19
year 16.00 0.00 NaN NaN 0.00
week 3.00 2.00 -0.02 -1.51 0.05
north_South* 2.00 1.00 -0.25 -1.94 0.03
temp 70.00 8.00 0.29 -0.74 0.13
morning_evening_night* 6.00 5.00 -0.55 -1.12 0.10
set_time* 62.00 42.00 0.85 -0.95 0.94
pull_time* 618.00 613.00 -0.20 -1.56 13.34
duration 17.93 15.68 0.13 -1.88 0.36
mesh_inch 5.00 3.50 2.15 3.76 0.06
mesh 127.00 88.90 2.15 3.76 1.42
depth 136.00 102.00 -0.31 -0.38 1.26
lat* 1337.00 1335.00 -0.02 -0.96 21.64
long* 1381.00 1380.00 -0.07 -0.94 22.60
total_catch 192.00 192.00 2.49 7.21 1.90
catch_200 192.00 192.00 2.46 7.17 1.79
catch_effort 57.31 57.31 4.56 30.58 0.35
Desc(fish_data_csv$total_catch) # more summary stats from the DescTools package for only total_catch. You can do it for all the data by naming only the dataframe but it has a lot of output
------------------------------------------------------------------------------
fish_data_csv$total_catch (integer)
length n NAs unique 0s mean meanCI'
2'351 2'351 0 153 184 22.04 20.65
100.0% 0.0% 7.8% 23.44
.05 .10 .25 median .75 .90 .95
0.00 1.00 4.00 11.00 26.00 50.00 81.00
range sd vcoef mad IQR skew kurt
554.00 34.51 1.57 13.34 22.00 5.15 45.68
lowest : 0 (184), 1 (147), 2 (117), 3 (128), 4 (97)
highest: 311, 346, 362, 378, 554
heap(?): remarkable frequency (7.8%) for the mode(s) (= 0)
' 95%-CI (classic)
More basic summary information
mean(fish_data_csv$total_catch) # show how to calculate a mean from the data frame fish_data_csv...dollar sign allows you to draw variables (elements) from a dataframe
[1] 22.04424
max(fish_data_csv$total_catch) # calcualte maximum value
[1] 554
min(fish_data_csv$total_catch) # calculate minimum value
[1] 0
var(fish_data_csv$total_catch) # variance
[1] 1191.002
sqrt(var(fish_data_csv$total_catch)) # square root of variance of total catch = standard deviation
[1] 34.5109
sd(fish_data_csv$total_catch) # same as SD of course
[1] 34.5109
ci_mean(fish_data_csv$total_catch) # calculates 95% ci (default) using confintr package
Two-sided 95% t confidence interval for the population mean
Sample estimate: 22.04424
Confidence interval:
2.5% 97.5%
20.64851 23.43997
ci_mean(fish_data_csv$total_catch, probs = c(0.1, 0.90)) # calculates 80% ci using confintr package
Two-sided 80% t confidence interval for the population mean
Sample estimate: 22.04424
Confidence interval:
10% 90%
21.13183 22.95664
ci_median(fish_data_csv$total_catch) # calculates the 95% ci for the median. I use median a fair amount in analyses using the confintr package
Two-sided 95% binomial confidence interval for the population median
Sample estimate: 11
Confidence interval:
2.5% 97.5%
10 12
ci_IQR(fish_data_csv$total_catch, boot_type = "basic") # calculates the 95% ci for the interquartile range (IQR is the range between the 25th and 75th quartiles)
Two-sided 95% bootstrap confidence interval for the population IQR
based on 9999 bootstrap replications and the basic method
Sample estimate: 22
Confidence interval:
2.5% 97.5%
19.5 23.0
unique(fish_data_csv$date) # show me the unique dates
[1] "8/23/10" "8/24/10" "8/25/10" "8/26/10" "8/27/10" "8/29/10" "8/30/10"
[8] "8/31/10" "9/1/10" "9/2/10" "9/3/10" "9/6/10" "9/7/10" "9/8/10"
[15] "9/9/10" "9/10/10" "8/22/11" "8/23/11" "8/24/11" "8/25/11" "8/26/11"
[22] "8/28/11" "8/29/11" "8/30/11" "8/31/11" "9/1/11" "9/2/11" "9/5/11"
[29] "9/6/11" "9/7/11" "9/8/11" "9/9/11" "8/12/12" "8/13/12" "8/14/12"
[36] "8/15/12" "8/16/12" "8/19/12" "8/20/12" "8/21/12" "8/22/12" "8/23/12"
[43] "8/26/12" "8/27/12" "8/28/12" "8/29/12" "8/30/12" "8/17/12" "8/24/12"
[50] "8/31/12" "8/11/13" "8/12/13" "8/13/13" "8/14/13" "8/15/13" "8/16/13"
[57] "8/18/13" "8/19/13" "8/20/13" "8/21/13" "8/22/13" "8/23/13" "8/26/13"
[64] "8/27/13" "8/28/13" "8/29/13" "8/30/13" "8/10/14" "8/11/14" "8/12/14"
[71] "8/13/14" "8/14/14" "8/15/14" "8/17/14" "8/18/14" "8/19/14" "8/20/14"
[78] "8/21/14" "8/22/14" "8/24/14" "8/25/14" "8/26/14" "8/27/14" "8/28/14"
[85] "8/29/14" "8/9/15" "8/10/15" "8/11/15" "8/12/15" "8/13/15" "8/14/15"
[92] "8/16/15" "8/17/15" "8/18/15" "8/19/15" "8/20/15" "8/24/15" "8/25/15"
[99] "8/26/15" "8/27/15" "8/28/15" "8/1/16" "8/2/16" "8/3/16" "8/4/16"
[106] "8/5/16" "8/14/16" "8/15/16" "8/16/16" "8/17/16" "8/18/16" "8/19/16"
[113] "8/29/16" "8/30/16" "8/31/16" "9/1/16" "9/2/16"
length(unique(fish_data_csv$date)) # multipe actions ok, find length of unique dates
[1] 117
<- pairs(~ year + mesh + catch_effort, data = fish_data_csv) # another way to quickly look at some of the data matrix_plot
How to make bins
Merge data sets
Missing values (NAs)
Missing values can cause problems, see below. It is always good to have an idea of what variables may have missing values and how many. I use the summary analysis above to evaluate the NAs for each variable. Missing values are different than zeros so make sure you know the difference in your data. Again, you can use the summary output from above to evaluate each variable for NAs versus zeros.
mean(fish_data_csv$duration) # show that you get NA because you have NAs in the variable duration
[1] NA
head(fish_data_csv$duration, 100) # see the NAs
[1] NA NA NA NA NA NA NA NA NA NA NA NA
[13] NA NA 2.83 3.00 3.30 3.55 3.83 4.40 4.65 14.88 15.12 15.38
[25] 15.60 15.83 16.12 5.18 5.43 5.65 5.85 6.07 6.28 6.52 6.75 3.47
[37] 3.73 4.08 4.52 4.77 4.97 5.33 5.83 15.58 16.12 16.67 17.45 18.20
[49] 18.53 14.67 15.10 16.18 16.75 17.13 17.77 18.88 19.30 3.67 3.90 4.15
[61] 4.40 4.60 4.80 5.02 5.22 5.45 5.67 5.98 6.27 6.50 6.72 3.00
[73] 3.30 3.57 3.90 4.18 4.48 10.35 10.78 11.20 11.58 11.90 12.27 6.75
[85] 7.18 7.53 7.75 8.02 8.50 5.37 5.58 5.80 6.08 6.32 6.52 6.80
[97] 6.80 3.35 3.57 3.73
mean(fish_data_csv$duration, na.rm = TRUE) # recalculate mean by removing NAs
[1] 7.003782
Changing the data type of a variable
Sometimes when you import data from a .csv or .xlsx the data type assigned by R isn’t what you want it to be. For example, you may want to change a numeric variable to a factor.
str(fish_data_csv)
'data.frame': 2351 obs. of 19 variables:
$ lift : chr "1" "1" "1" "1" ...
$ net : num 1 2 3 4 5 6 6.5 7 7.5 8 ...
$ date : chr "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
$ year : int 10 10 10 10 10 10 10 10 10 10 ...
$ week : int 1 1 1 1 1 1 1 1 1 1 ...
$ north_South : chr "South" "South" "South" "South" ...
$ temp : num 64 64 64 64 64 64 64 64 64 64 ...
$ morning_evening_night: chr "morning" "morning" "morning" "morning" ...
$ set_time : chr "4:40" "4:40" "4:40" "4:40" ...
$ pull_time : chr "8:35" "8:35" "8:35" "8:35" ...
$ duration : num NA NA NA NA NA NA NA NA NA NA ...
$ mesh_inch : num 2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
$ mesh : num 63.5 50.8 57.1 50.8 63.5 ...
$ depth : num 101 96 92 80.5 84 73 83 68 83 84.5 ...
$ lat : chr "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
$ long : chr "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
$ total_catch : int 2 38 8 6 5 2 7 14 120 154 ...
$ catch_200 : int 2 38 8 6 5 0 0 14 120 154 ...
$ catch_effort : num NA NA NA NA NA NA NA NA NA NA ...
See that mesh size is numeric and perhaps we would like it as a factor for some analyses.
$mesh_factor <- as.factor(fish_data_csv$mesh) # here is a new variable as a factor
fish_data_csvstr(fish_data_csv)
'data.frame': 2351 obs. of 20 variables:
$ lift : chr "1" "1" "1" "1" ...
$ net : num 1 2 3 4 5 6 6.5 7 7.5 8 ...
$ date : chr "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
$ year : int 10 10 10 10 10 10 10 10 10 10 ...
$ week : int 1 1 1 1 1 1 1 1 1 1 ...
$ north_South : chr "South" "South" "South" "South" ...
$ temp : num 64 64 64 64 64 64 64 64 64 64 ...
$ morning_evening_night: chr "morning" "morning" "morning" "morning" ...
$ set_time : chr "4:40" "4:40" "4:40" "4:40" ...
$ pull_time : chr "8:35" "8:35" "8:35" "8:35" ...
$ duration : num NA NA NA NA NA NA NA NA NA NA ...
$ mesh_inch : num 2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
$ mesh : num 63.5 50.8 57.1 50.8 63.5 ...
$ depth : num 101 96 92 80.5 84 73 83 68 83 84.5 ...
$ lat : chr "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
$ long : chr "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
$ total_catch : int 2 38 8 6 5 2 7 14 120 154 ...
$ catch_200 : int 2 38 8 6 5 0 0 14 120 154 ...
$ catch_effort : num NA NA NA NA NA NA NA NA NA NA ...
$ mesh_factor : Factor w/ 14 levels "38.1","44.45",..: 7 4 5 4 7 1 1 2 2 4 ...
Now let’s change it back to numeric.
$mesh_numeric <- as.numeric(as.character(fish_data_csv$mesh_factor)) # factor to character to numeric
fish_data_csv
str(fish_data_csv)
'data.frame': 2351 obs. of 21 variables:
$ lift : chr "1" "1" "1" "1" ...
$ net : num 1 2 3 4 5 6 6.5 7 7.5 8 ...
$ date : chr "8/23/10" "8/23/10" "8/23/10" "8/23/10" ...
$ year : int 10 10 10 10 10 10 10 10 10 10 ...
$ week : int 1 1 1 1 1 1 1 1 1 1 ...
$ north_South : chr "South" "South" "South" "South" ...
$ temp : num 64 64 64 64 64 64 64 64 64 64 ...
$ morning_evening_night: chr "morning" "morning" "morning" "morning" ...
$ set_time : chr "4:40" "4:40" "4:40" "4:40" ...
$ pull_time : chr "8:35" "8:35" "8:35" "8:35" ...
$ duration : num NA NA NA NA NA NA NA NA NA NA ...
$ mesh_inch : num 2.5 2 2.25 2 2.5 1.5 1.5 1.75 1.75 2 ...
$ mesh : num 63.5 50.8 57.1 50.8 63.5 ...
$ depth : num 101 96 92 80.5 84 73 83 68 83 84.5 ...
$ lat : chr "N47.55714" "N47.55502" "N47.92443" "N47.92434" ...
$ long : chr "W113.51997" "W113.51975" "W113.86772" "W113.86467" ...
$ total_catch : int 2 38 8 6 5 2 7 14 120 154 ...
$ catch_200 : int 2 38 8 6 5 0 0 14 120 154 ...
$ catch_effort : num NA NA NA NA NA NA NA NA NA NA ...
$ mesh_factor : Factor w/ 14 levels "38.1","44.45",..: 7 4 5 4 7 1 1 2 2 4 ...
$ mesh_numeric : num 63.5 50.8 57.1 50.8 63.5 ...
Summarize data with tidyverse
Tidyverse is a package that contains several packages (e.g., dplyr and ggplot2) that allow for data wrangling and graphing. We have found that students new to R find the tidyverse “language” a bit easier to learn than base R. Piping (%>%; keyboard shortcut for Mac is cmd+shift+M and for Windows is ctrl+shit+M) is a nice way to do several actions at once. Think of this symbol %>% as the word “then.” Notice the last line does not include the %>%. The latest version of RStudio has the native piping operator that is |>. You can change this in Global Options if you prefer the |> operator.
In the simple example below we create a new dataframe that will use the fish_data_csv dataframe “then” filter out the year 10 (2010), then only select the variables year, mesh, and catch_effort. You will see an new dataframe in the Environment tab called fish_data_csv_10 with only two variables (i.e., mesh and catch_effort) and 317 observations. Click on the dataframe icon to open the new dataframe and see it only has year 10.
<- fish_data_csv %>%
fish_data_csv_10 filter(year==10) %>% #filter, here we filter for year 10 (2010)
select(year, mesh, catch_effort) #we select only these variables, helps make the dataframe more manageable
Now let’s do more. For the new dataframe fish_data_csv_filter_catch_effort we filter years 10, 11, and 12; filter mesh size of 2.0 inch; select variables year, mesh_inch, catch_effort, total_catch, and catch_200; filter out NAs for catch_effort; rename the variable catch_effort to cpue; and create a new variable called ratio that is the catch_200 divided by catch_total.
<- fish_data_csv %>% # create new data frame from the fish_data_csv dataframe
fish_data_csv_filter_catch_effort filter(year %in% c(10, 11, 12) & mesh_inch == 2.00) %>% # filter for years and mesh size
select(year, mesh_inch, catch_effort, total_catch, catch_200) %>% # select variables we are interested in
filter(!is.na(catch_effort)) %>% # filter out NAs because they will cause a problem when we calculate a mean
rename(cpue = catch_effort) %>% # rename a variable to match with fisheries science terminology
mutate(ratio = catch_200 / total_catch) # create a new variable called ratio that is the catch of fish >= 200 mm divided by the total catch
head(fish_data_csv_filter_catch_effort, n = 5) # shows first 5 lines of new dataframe
year mesh_inch cpue total_catch catch_200 ratio
1 10 2 8.09 31 31 1
2 10 2 8.18 36 36 1
3 10 2 3.96 59 59 1
4 10 2 3.54 56 56 1
5 10 2 4.60 25 25 1
tail(fish_data_csv_filter_catch_effort, n = 5) # shows last 5 lines of new dataframe
year mesh_inch cpue total_catch catch_200 ratio
274 12 2 1.76 23 23 1.0000000
275 12 2 0.95 17 13 0.7647059
276 12 2 0.75 9 8 0.8888889
277 12 2 0.46 5 5 1.0000000
278 12 2 1.02 12 12 1.0000000
Now let’s make a summary table from a filtered data frame.
<- fish_data_csv %>%
fish_data_csv_summary filter(year %in% c(10, 11, 12, 13) & mesh_inch %in% c(2.00, 2.50)) %>%
group_by(year, mesh_inch) %>%
select(year, mesh_inch, catch_effort) %>% # select variables we are interested in
filter(!is.na(catch_effort)) %>% # filter out NAs because they will cause a problem when we calculate a mean
rename(cpue = catch_effort) %>% # rename a variable to match with fisheries science terminology
summarise(min_cpue = min(cpue), max_cpue = max(cpue), mean_cpue = mean(cpue), sd_cpue = sd(cpue), median_cpue = median(cpue))
%>%
fish_data_csv_summary mutate_if(is.numeric, round, digits = 1) %>% # changing level of precision
kbl(caption = "Summary Data by Year and Mesh") %>% # making a table
kable_classic(full_width = F) # table type
year | mesh_inch | min_cpue | max_cpue | mean_cpue | sd_cpue | median_cpue |
---|---|---|---|---|---|---|
10 | 2.0 | 0 | 30.0 | 6.6 | 6.9 | 4.2 |
10 | 2.5 | 0 | 23.0 | 2.4 | 4.2 | 1.3 |
11 | 2.0 | 0 | 22.2 | 2.6 | 3.0 | 1.8 |
11 | 2.5 | 0 | 4.6 | 0.6 | 0.9 | 0.3 |
12 | 2.0 | 0 | 33.4 | 6.3 | 7.4 | 3.2 |
12 | 2.5 | 0 | 14.5 | 2.2 | 3.0 | 1.1 |
13 | 2.0 | 0 | 52.4 | 4.5 | 6.7 | 2.5 |
13 | 2.5 | 0 | 7.6 | 2.7 | 2.5 | 1.6 |
Selecting specific values from a dataframe
1,2] # grabs value in first row and second column from the fish_data_csv dataframe fish_data_csv [
[1] 1
1,] #grabs first row fish_data_csv [
lift net date year week north_South temp morning_evening_night set_time
1 1 1 8/23/10 10 1 South 64 morning 4:40
pull_time duration mesh_inch mesh depth lat long total_catch
1 8:35 NA 2.5 63.5 101 N47.55714 W113.51997 2
catch_200 catch_effort mesh_factor mesh_numeric
1 2 NA 63.5 63.5
2] #grabs second column fish_data_csv [,
[1] 1.0 2.0 3.0 4.0 5.0 6.0 6.5 7.0 7.5 8.0 9.0 10.0
[13] 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0
[25] 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0
[37] 35.0 36.0 37.0 38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0
[49] 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0
[61] 59.0 60.0 61.0 62.0 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0
[73] 71.0 72.0 73.0 74.0 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0
[85] 83.0 84.0 85.0 86.0 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0
[97] 94.5 95.0 96.0 97.0 98.0 99.0 100.0 101.0 102.0 103.0 104.0 105.0
[109] 106.0 107.0 108.0 109.0 110.0 111.0 112.0 113.0 114.0 115.0 116.0 117.0
[121] 118.0 119.0 119.5 120.0 121.0 122.0 123.0 124.0 125.0 126.0 127.0 128.0
[133] 129.0 130.0 131.0 132.0 133.0 134.0 135.0 136.0 137.0 138.0 139.0 140.0
[145] 141.0 142.0 143.0 144.0 145.0 146.0 147.0 148.0 149.0 150.0 151.0 152.0
[157] 153.0 154.0 155.0 156.0 157.0 158.0 159.0 160.0 161.0 162.0 163.0 164.0
[169] 165.0 166.0 167.0 168.0 169.0 170.0 171.0 172.0 173.0 174.0 175.0 176.0
[181] 177.0 178.0 179.0 180.0 181.0 182.0 183.0 184.0 185.0 186.0 187.0 188.0
[193] 189.0 190.0 191.0 192.0 193.0 194.0 195.0 196.0 197.0 198.0 199.0 200.0
[205] 201.0 202.0 203.0 204.0 205.0 206.0 207.0 208.0 209.0 210.0 211.0 212.0
[217] 213.0 214.0 215.0 216.0 217.0 218.0 219.0 220.0 221.0 222.0 223.0 224.0
[229] 225.0 226.0 227.0 228.0 229.0 230.0 231.0 232.0 232.5 233.0 234.0 235.0
[241] 236.0 237.0 238.0 239.0 240.0 241.0 242.0 243.0 244.0 245.0 246.0 247.0
[253] 248.0 249.0 250.0 251.0 252.0 253.0 254.0 255.0 256.0 257.0 258.0 259.0
[265] 260.0 261.0 262.0 263.0 264.0 265.0 266.0 267.0 268.0 269.0 270.0 271.0
[277] 272.0 273.0 274.0 275.0 276.0 277.0 278.0 279.0 280.0 281.0 282.0 283.0
[289] 284.0 285.0 286.0 287.0 288.0 289.0 290.0 291.0 292.0 293.0 294.0 295.0
[301] 296.0 297.0 298.0 299.0 300.0 301.0 302.0 303.0 304.0 304.5 305.0 306.0
[313] 307.0 308.0 309.0 310.0 311.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
[325] 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0
[337] 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0
[349] 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 41.0 42.0 43.0
[361] 44.0 45.0 46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0 55.0
[373] 56.0 57.0 58.0 59.0 60.0 61.0 62.0 63.0 64.0 65.0 66.0 67.0
[385] 68.0 69.0 70.0 71.0 72.0 73.0 74.0 75.0 76.0 77.0 78.0 79.0
[397] 80.0 81.0 82.0 83.0 84.0 85.0 86.0 87.0 88.0 89.0 90.0 91.0
[409] 92.0 93.0 94.0 95.0 96.0 97.0 98.0 99.0 100.0 101.0 102.0 103.0
[421] 104.0 105.0 106.0 107.0 108.0 109.0 110.0 111.0 112.0 113.0 114.0 115.0
[433] 116.0 117.0 118.0 119.0 120.0 121.0 122.0 123.0 124.0 125.0 126.0 127.0
[445] 128.0 129.0 130.0 131.0 132.0 133.0 134.0 135.0 136.0 137.0 138.0 139.0
[457] 140.0 141.0 142.0 143.0 144.0 145.0 146.0 147.0 148.0 149.0 150.0 151.0
[469] 152.0 153.0 154.0 155.0 156.0 157.0 158.0 159.0 160.0 161.0 162.0 163.0
[481] 164.0 165.0 166.0 167.0 168.0 169.0 170.0 171.0 172.0 173.0 174.0 175.0
[493] 176.0 177.0 178.0 179.0 180.0 181.0 182.0 183.0 184.0 185.0 186.0 187.0
[505] 188.0 189.0 190.0 191.0 192.0 193.0 194.0 195.0 196.0 197.0 198.0 199.0
[517] 200.0 201.0 202.0 203.0 204.0 205.0 206.0 207.0 208.0 209.0 210.0 211.0
[529] 212.0 213.0 214.0 215.0 216.0 217.0 218.0 219.0 220.0 221.0 222.0 223.0
[541] 224.0 225.0 226.0 227.0 228.0 229.0 230.0 231.0 232.0 233.0 234.0 235.0
[553] 236.0 237.0 238.0 239.0 240.0 241.0 242.0 243.0 244.0 245.0 246.0 247.0
[565] 248.0 249.0 250.0 251.0 252.0 253.0 254.0 255.0 256.0 257.0 258.0 259.0
[577] 260.0 261.0 262.0 263.0 264.0 265.0 266.0 267.0 268.0 269.0 270.0 271.0
[589] 272.0 273.0 274.0 275.0 276.0 277.0 278.0 279.0 280.0 281.0 282.0 283.0
[601] 284.0 285.0 286.0 287.0 288.0 289.0 290.0 291.0 292.0 293.0 294.0 295.0
[613] 296.0 297.0 298.0 299.0 300.0 301.0 302.0 303.0 304.0 305.0 306.0 307.0
[625] 308.0 309.0 310.0 311.0 312.0 313.0 314.0 315.0 316.0 317.0 318.0 319.0
[637] 320.0 321.0 322.0 323.0 324.0 325.0 326.0 327.0 328.0 329.0 330.0 331.0
[649] 332.0 333.0 334.0 335.0 336.0 337.0 338.0 339.0 340.0 341.0 342.0 343.0
[661] 344.0 345.0 346.0 347.0 348.0 349.0 350.0 351.0 352.0 353.0 354.0 355.0
[673] 356.0 357.0 358.0 359.0 360.0 361.0 362.0 363.0 364.0 365.0 366.0 367.0
[685] 368.0 369.0 370.0 371.0 372.0 373.0 374.0 375.0 376.0 377.0 378.0 379.0
[697] 380.0 381.0 382.0 383.0 384.0 385.0 386.0 387.0 388.0 389.0 390.0 391.0
[709] 392.0 393.0 394.0 395.0 396.0 397.0 398.0 399.0 1.0 2.0 3.0 4.0
[721] 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0
[733] 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 41.0 42.0 43.0 44.0
[745] 45.0 46.0 47.0 48.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0
[757] 73.0 74.0 75.0 76.0 77.0 78.0 79.0 80.0 96.0 97.0 98.0 99.0
[769] 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 120.0 121.0 122.0 123.0
[781] 124.0 125.0 126.0 127.0 128.0 129.0 130.0 141.0 142.0 143.0 144.0 145.0
[793] 146.0 147.0 148.0 149.0 150.0 151.0 152.0 165.0 166.0 167.0 168.0 169.0
[805] 170.0 171.0 172.0 173.0 174.0 175.0 176.0 192.0 193.0 194.0 195.0 196.0
[817] 197.0 198.0 199.0 200.0 201.0 202.0 203.0 216.0 217.0 218.0 219.0 220.0
[829] 221.0 222.0 223.0 224.0 225.0 226.0 227.0 240.0 241.0 242.0 243.0 244.0
[841] 245.0 246.0 247.0 248.0 249.0 250.0 251.0 264.0 265.0 266.0 267.0 268.0
[853] 269.0 270.0 271.0 272.0 273.0 274.0 275.0 288.0 289.0 290.0 291.0 292.0
[865] 293.0 294.0 295.0 296.0 297.0 298.0 299.0 312.0 313.0 314.0 315.0 316.0
[877] 317.0 318.0 319.0 320.0 321.0 322.0 335.0 336.0 337.0 338.0 339.0 340.0
[889] 341.0 342.0 343.0 344.0 345.0 346.0 359.0 360.0 361.0 362.0 363.0 364.0
[901] 365.0 366.0 367.0 368.0 369.0 370.0 17.0 18.0 19.0 20.0 21.0 22.0
[913] 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 49.0 50.0
[925] 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 62.0
[937] 63.0 64.0 81.0 82.0 83.0 84.0 85.0 86.0 86.5 87.0 88.0 89.0
[949] 90.0 91.0 92.0 93.0 94.0 95.0 108.0 109.0 110.0 111.0 112.0 113.0
[961] 114.0 115.0 116.0 117.0 118.0 119.0 131.0 153.0 154.0 155.0 156.0 157.0
[973] 158.0 159.0 160.0 161.0 162.0 163.0 164.0 252.0 253.0 254.0 255.0 256.0
[985] 257.0 258.0 259.0 260.0 261.0 262.0 263.0 177.0 178.0 179.0 180.0 181.0
[997] 182.0 183.0 184.0 185.0 186.0 187.0 188.0 189.0 190.0 191.0 204.0 205.0
[1009] 206.0 207.0 208.0 209.0 210.0 211.0 212.0 213.0 214.0 215.0 228.0 229.0
[1021] 230.0 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.0 239.0 276.0 277.0
[1033] 278.0 279.0 280.0 281.0 282.0 283.0 284.0 285.0 286.0 287.0 300.0 301.0
[1045] 302.0 303.0 304.0 305.0 306.0 307.0 308.0 309.0 310.0 311.0 323.0 324.0
[1057] 325.0 326.0 327.0 328.0 329.0 330.0 331.0 332.0 333.0 334.0 347.0 348.0
[1069] 349.0 350.0 351.0 352.0 353.0 354.0 355.0 356.0 357.0 358.0 371.0 372.0
[1081] 373.0 374.0 375.0 376.0 377.0 378.0 379.0 380.0 381.0 382.0 1.0 2.0
[1093] 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0
[1105] 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0
[1117] 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0
[1129] 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 50.0
[1141] 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 62.0
[1153] 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0 73.0 74.0
[1165] 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 83.0 84.0 85.0 86.0
[1177] 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0 97.0 98.0
[1189] 99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0
[1201] 111.0 112.0 113.0 114.0 115.0 116.0 117.0 118.0 119.0 120.0 121.0 122.0
[1213] 123.0 124.0 125.0 126.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0 134.0
[1225] 135.0 136.0 137.0 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0 146.0
[1237] 147.0 148.0 149.0 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0 158.0
[1249] 159.0 160.0 161.0 162.0 163.0 164.0 165.0 166.0 167.0 168.0 169.0 170.0
[1261] 171.0 172.0 173.0 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0 182.0
[1273] 183.0 184.0 185.0 186.0 187.0 188.0 189.0 190.0 191.0 192.0 193.0 194.0
[1285] 195.0 196.0 197.0 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0 206.0
[1297] 207.0 208.0 209.0 210.0 211.0 212.0 213.0 214.0 215.0 216.0 217.0 218.0
[1309] 219.0 220.0 221.0 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0 230.0
[1321] 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.0 239.0 240.0 241.0 242.0
[1333] 243.0 244.0 245.0 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0 254.0
[1345] 255.0 256.0 257.0 258.0 259.0 260.0 261.0 262.0 263.0 264.0 265.0 266.0
[1357] 267.0 268.0 269.0 270.0 271.0 272.0 273.0 274.0 275.0 276.0 277.0 278.0
[1369] 279.0 280.0 281.0 282.0 283.0 284.0 285.0 286.0 287.0 288.0 289.0 290.0
[1381] 291.0 292.0 293.0 294.0 295.0 296.0 297.0 298.0 299.0 300.0 301.0 302.0
[1393] 303.0 304.0 305.0 306.0 307.0 308.0 309.0 310.0 311.0 312.0 313.0 314.0
[1405] 315.0 316.0 317.0 318.0 319.0 320.0 321.0 322.0 323.0 324.0 325.0 326.0
[1417] 327.0 328.0 329.0 330.0 331.0 332.0 333.0 334.0 335.0 336.0 337.0 338.0
[1429] 339.0 340.0 341.0 342.0 343.0 344.0 345.0 346.0 347.0 1.0 2.0 3.0
[1441] 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0
[1453] 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0
[1465] 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0
[1477] 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 50.0 51.0
[1489] 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 60.1 61.1 62.1
[1501] 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0 73.0 74.0
[1513] 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 83.0 84.0 85.0 86.0
[1525] 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0 97.0 98.0
[1537] 98.1 99.1 100.1 101.1 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0
[1549] 110.0 111.0 112.0 113.0 114.1 115.1 116.0 117.0 118.0 119.0 120.0 120.1
[1561] 121.0 122.0 123.0 124.0 125.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0
[1573] 134.0 135.1 136.1 137.0 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0
[1585] 146.0 147.0 148.0 149.0 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0
[1597] 158.0 159.0 160.0 161.1 162.1 163.0 164.0 165.0 166.0 167.0 168.0 169.0
[1609] 170.0 171.0 172.0 173.0 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0
[1621] 182.0 183.0 184.0 185.0 186.0 187.1 188.1 189.0 190.0 191.0 192.0 193.0
[1633] 194.0 195.0 196.0 197.0 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0
[1645] 206.0 207.0 208.0 209.0 210.0 211.0 212.0 213.1 214.1 215.0 216.0 217.0
[1657] 218.0 219.0 220.0 221.0 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0
[1669] 230.0 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.1 239.1 240.0 241.0
[1681] 242.0 243.0 244.0 245.0 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0
[1693] 254.0 255.0 256.0 257.0 258.0 259.0 260.0 261.0 262.0 263.0 264.1 265.1
[1705] 266.0 267.0 268.0 269.0 270.0 271.0 272.0 273.0 274.0 275.0 276.0 277.0
[1717] 278.0 279.0 280.0 281.0 282.0 283.0 284.0 285.0 286.0 287.0 288.0 289.0
[1729] 290.0 291.0 292.0 293.0 294.0 295.0 296.0 297.0 298.0 299.0 300.0 301.0
[1741] 302.1 303.0 304.0 305.0 306.0 307.0 308.0 309.0 310.0 311.0 312.0 313.0
[1753] 314.0 315.0 316.0 317.0 318.0 319.0 320.0 321.0 322.0 323.0 324.0 325.0
[1765] 326.0 327.1 328.1 329.0 330.0 331.0 332.0 333.0 334.0 335.0 336.0 337.0
[1777] 338.0 339.0 340.0 341.0 342.0 343.0 344.0 345.0 346.0 347.0 348.0 349.0
[1789] 350.0 351.0 352.0 353.1 354.1 355.0 356.0 357.0 358.0 359.0 360.0 361.0
[1801] 362.0 363.0 364.0 365.0 366.0 367.1 368.1 369.0 370.0 371.0 372.0 373.0
[1813] 373.1 374.0 375.0 376.0 377.0 378.0 379.0 1.0 2.0 3.0 4.0 5.0
[1825] 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0
[1837] 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0
[1849] 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 41.0
[1861] 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0
[1873] 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 62.0 63.0 64.0 65.0
[1885] 66.0 67.0 68.0 69.0 70.0 71.0 72.0 73.0 74.0 75.0 76.0 77.0
[1897] 78.0 79.0 80.0 81.0 82.0 83.0 84.0 85.0 86.0 87.0 88.0 89.0
[1909] 90.0 91.0 92.0 93.0 94.0 95.0 96.0 97.0 98.0 99.0 100.0 101.0
[1921] 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0 111.0 112.0 113.0
[1933] 114.0 115.0 116.0 117.0 118.0 119.0 120.0 121.0 122.0 123.0 124.0 125.0
[1945] 126.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0 134.0 135.0 136.0 137.0
[1957] 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0 146.0 147.0 148.0 149.0
[1969] 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0 158.0 159.0 160.0 161.0
[1981] 162.0 163.0 164.0 165.0 166.0 167.0 168.0 169.0 170.0 171.0 172.0 173.0
[1993] 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0 182.0 183.0 184.0 185.0
[2005] 186.0 187.0 188.0 189.0 190.0 191.0 192.0 193.0 194.0 195.0 196.0 197.0
[2017] 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0 206.0 207.0 208.0 209.0
[2029] 210.0 211.0 212.0 213.0 214.0 215.0 216.0 217.0 218.0 219.0 220.0 221.0
[2041] 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0 230.0 231.0 232.0 233.0
[2053] 234.0 235.0 236.0 237.0 238.0 239.0 240.0 241.0 242.0 243.0 244.0 245.0
[2065] 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0 254.0 255.0 1.0 2.0
[2077] 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0
[2089] 15.0 15.5 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0
[2101] 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0
[2113] 38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0
[2125] 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0
[2137] 62.0 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0 73.0
[2149] 74.0 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 83.0 84.0 85.0
[2161] 86.0 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0 97.0
[2173] 98.0 99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0
[2185] 110.0 111.0 112.0 113.0 114.0 115.0 116.0 117.0 118.0 119.0 120.0 121.0
[2197] 122.0 123.0 124.0 125.0 126.0 127.0 128.0 129.0 130.0 131.0 132.0 133.0
[2209] 134.0 135.0 136.0 137.0 138.0 139.0 140.0 141.0 142.0 143.0 144.0 145.0
[2221] 146.0 147.0 148.0 149.0 150.0 151.0 152.0 153.0 154.0 155.0 156.0 157.0
[2233] 158.0 159.0 160.0 161.0 162.0 163.0 164.0 165.0 166.0 167.0 168.0 169.0
[2245] 170.0 171.0 172.0 173.0 174.0 175.0 176.0 177.0 178.0 179.0 180.0 181.0
[2257] 182.0 183.0 184.0 185.0 186.0 187.0 188.0 189.0 190.0 191.0 192.0 193.0
[2269] 194.0 195.0 196.0 197.0 198.0 199.0 200.0 201.0 202.0 203.0 204.0 205.0
[2281] 206.0 207.0 208.0 209.0 210.0 211.0 212.0 213.0 214.0 215.0 216.0 217.0
[2293] 218.0 219.0 220.0 221.0 222.0 223.0 224.0 225.0 226.0 227.0 228.0 229.0
[2305] 230.0 231.0 232.0 233.0 234.0 235.0 236.0 237.0 238.0 239.0 240.0 241.0
[2317] 242.0 243.0 244.0 245.0 246.0 247.0 248.0 249.0 250.0 251.0 252.0 253.0
[2329] 254.0 255.0 256.0 257.0 258.0 259.0 260.0 261.0 262.0 263.0 264.0 265.0
[2341] 266.0 267.0 268.0 269.0 270.0 271.0 272.0 273.0 274.0 275.0 276.0
1:4,1:5] #grabs rows 1-4 and columns 1-5 fish_data_csv [
lift net date year week
1 1 1 8/23/10 10 1
2 1 2 8/23/10 10 1
3 1 3 8/23/10 10 1
4 1 4 8/23/10 10 1
Working with Dates
How to calculate net set times for overnight sets
Random subset
Pooling variable
Functions
For Loops
Graphing
You can make wonderful publication figures in R. You can use base R or ggplot2 (part of the tidyverse package) to make figures. Similar to data wrangling, we find beginners to R find the ggplot2 syntax easier to learn for making figures. Thus, the code below uses the ggplot2 syntax.
There are times where you simply want to inspect the data. In those cases I don’t worry about all the formatting for a publication quality figure.
Below is code for a boxplot using ggplot. Notice the boxplot is not seperated by year that is because year is an integer and not a factor. A boxplot want the x variable as a factor. So, we need to change the data type for year.
<- ggplot(data = fish_data_csv, aes(x=year, y=catch_effort)) +
cpue_boxplot geom_boxplot()
cpue_boxplot
We change year to factor and we get a better plot. Now we have a boxplot that is perfect for looking over the data. That is, we don’t believe it is publication quality. The defaults in ggplot2 are not suitable for publication.
$year_factor <- as.factor(fish_data_csv$year)
fish_data_csv
<- ggplot(data = fish_data_csv, aes(x = year_factor, y = catch_effort)) +
cpue_boxplot_2 geom_boxplot()
cpue_boxplot_2
The code below makes the figure publication quality. The code looks intimidating for a figure but it allows for us to control all aspects of the figure. You can make all the lines below ylab an object named mytheme, for example. Then you wouldn’t need to include that code for each figure.
For a detail explanation of the code below see:https://doi.org/10.1002/fsh.10272
<- ggplot(data = fish_data_csv, aes(y=catch_effort, x=year_factor, fill = year_factor)) +
pub_boxplot_cpue geom_point (aes(y = , color = year_factor), position = position_jitter(width = .35), size = 1.5, alpha = .4) +
geom_boxplot (width = .8, size = .5, outlier.shape = NA, alpha = 0, notch = FALSE) +
stat_boxplot (geom ='errorbar', width = 0.2) +
stat_summary (fun = mean, geom = "point", shape = 21, size = 3, fill = "black") +
scale_color_manual (values = c("#8c510a", "#d8b365", "#A8925E", "#c7eae5", "#5ab4ac", "#01665e", "red")) +
scale_y_continuous (limits = c(-0.01,60), expand = c(0,0),breaks=seq(0,60,5)) +
scale_x_discrete (labels = c("2010", "2011", "2012", "2013", "2014", "2015", "2016")) +
ylab("Catch per unit effort") + xlab("Year") +
theme_bw() +
theme (axis.title.y = element_text(size = 20, vjust = 4, colour = "black"),
axis.title.x = element_text(size = 20, vjust = -2, colour = "black"),
panel.border = element_blank(),
legend.position = "none",
plot.margin = unit(c(1.5, 1.5, 1.5, 1.5), "cm"),
plot.title = element_text(size = 18, hjust = 0.5, vjust = 8),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks.x = element_line(size = 0.8),
axis.ticks.y = element_line(size = 0.8),
axis.ticks.length = unit(0.2,"cm"),
axis.text.x = element_text(colour = "black",size = 18, angle = 0, vjust = -1, hjust = 0.5),
axis.text.y = element_text(colour = "black",size = 18),
axis.line = element_line(colour = "black", size = 0.8, lineend = "square"))
ggsave("figures/pub_boxplot_cpue.jpeg", width=11, height=8.5,dpi=300)
pub_boxplot_cpue
<- ggplot(data = fish_data_csv, aes(y=catch_effort, x=depth)) +
pub_bivariate_cpue_depth geom_point (size = 1.5, alpha = .5, shape = 2) +
scale_y_continuous (limits = c(-0.01,60), expand = c(0,0),breaks=seq(0,60,5)) +
scale_x_continuous (limits = c(0, 200), expand = c(0,0), breaks = seq(0,200,20)) +
ylab("Catch per unit effort") + xlab("Depth (ft)") +
theme_bw() +
theme (axis.title.y = element_text(size = 20, vjust = 4, colour = "black"),
axis.title.x = element_text(size = 20, vjust = -2, colour = "black"),
panel.border = element_blank(),
legend.position = "none",
plot.margin = unit(c(1.5, 1.5, 1.5, 1.5), "cm"),
plot.title = element_text(size = 18, hjust = 0.5, vjust = 8),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks.x = element_line(size = 0.8),
axis.ticks.y = element_line(size = 0.8),
axis.ticks.length = unit(0.2,"cm"),
axis.text.x = element_text(colour = "black",size = 18, angle = 0, vjust = -1, hjust = 0.5),
axis.text.y = element_text(colour = "black",size = 18),
axis.line = element_line(colour = "black", size = 0.8, lineend = "square"))
ggsave("figures/pub_bivariate_cpue_depth .jpeg", width=11, height=8.5,dpi=600)
pub_bivariate_cpue_depth
Common graphing issues
Reporting values
Often R will output a value fair beyond the level of precision for how the parameter was collected. For example, you measure a fish length to the nearest millimeter (e.g., 120 mm) in the field, but when you calculate a mean for a population R outputs the value to the one millionth place or greater. Thus, be careful about how you report output from R. I think it is best to report the value at the same level of precision as originally collected.
Excellent Websites
Mangiafico: Stats Ecology Handbook
R Companion Handbook of Biological Statistics
RStudio Cheatsheets
Keyboard Shortcuts
Comment large blocks (Mac); highlight section then shift+cmd+C same to remove comment large sections.
%>% symbol (Mac); shift+cmd+M