Introduction to R & RStudio: Analysing Animal Science Research Data

In data-driven fields, the ability to extract meaningful conclusions from experimental observations is a crucial professional skill. In agricultural and animal sciences, researchers regularly conduct trials to assess how nutritional treatments, housing environments, or genetic strains influence livestock performance. To process this data accurately, scientists rely on R—a powerful, open-source programming language for statistical computing—and RStudio, its most popular integrated development environment (IDE).

This tutorial provides a step-by-step guide to installing R/RStudio, navigating the interface, loading experimental records, computing growth statistics, and conducting analysis of variance (ANOVA) tests. We will walk through this workflow using a practical, real-world case study from a livestock feeding trial.

1. Why Use R and RStudio for Scientific Research?

While basic spreadsheets are useful for manual calculations, they lack the computational power, transparency, and statistical rigor needed for scientific research. R excels in several key areas:

Reproducibility: Instead of point-and-click edits, you write code scripts. This allows anyone (including your future self) to reproduce the exact analysis by running the script again.
Statistical Sophistication: R was built by statisticians for statisticians, offering advanced testing, linear modeling, and regression out of the box.
Data Visualization: Packages like ggplot2 allow you to generate publication-ready plots with total control over styling, scales, and labels.

2. Case Study: The Beef Cattle Nutrition Experiment

Imagine you are a researcher studying ruminant nutrition. You recently completed a 90-day feeding trial with 30 beef steers to investigate the effect of three diets on weight gain. The animals were randomly assigned to one of three groups (10 steers per group):

Control: Standard basal forage diet.
Diet_A: Basal forage supplemented with 15% high-protein oilseed meal.
Diet_B: Basal forage supplemented with 15% fermented crop by-products.

At the end of the trial, you collected the following measurements for each steer: ID, Treatment (Diet), Initial Weight (kg), and Final Weight (kg). Your goal is to determine if supplementary feeding significantly increases the **Average Daily Gain (ADG)** and which supplement performs best.

3. Setting Up Your Environment

Before writing code, download and install the required tools:

Go to the Comprehensive R Archive Network (CRAN), download the installer for your operating system (Windows/Mac/Linux), and run it.
Go to Posit Desktop, download the free version of RStudio Desktop, and install it.
Open RStudio. You will see four panes: the **Source Editor** (top left, for writing scripts), the **Console** (bottom left, where code executes), the **Environment/History** (top right, lists active variables), and the **Files/Plots/Packages/Help** pane (bottom right).

Pro Tip: Always organize your project. In RStudio, go to File -> New Project -> New Directory -> New Project. Name it Cattle_Nutrition_Analysis. This sets your working directory automatically, making file loading seamless.

4. Importing Data and R Basics

Suppose your raw data is saved in a comma-separated values (CSV) file named steer_growth.csv. The first few lines of the file look like this:

 steer_id,diet,initial_wt,final_wt
 S01,Control,320,405
 S02,Control,315,398
 ...
 S11,Diet_A,322,438
 ...
 S21,Diet_B,318,442

Let's write an R script to load this dataset and compute the Average Daily Gain (ADG). Copy the following code into your R script editor and run it:

# Install and load the tidyverse library (contains ggplot2 and readr)
if(!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)

# Load the dataset
growth_data <- read_csv("steer_growth.csv")

# View the first 6 rows of the dataset
head(growth_data)

# Calculate the Average Daily Gain (ADG) over 90 days
# Formula: (Final Weight - Initial Weight) / 90 days
growth_data <- growth_data %>%
  mutate(adg = (final_wt - initial_wt) / 90)

# View modified dataset summary
summary(growth_data)

5. Descriptive Statistics

Before conducting hypothesis testing, we must calculate the descriptive statistics (mean and standard deviation) for ADG across each of the feeding groups. In R, we achieve this by combining group_by() and summarise() functions:

# Calculate descriptive stats grouped by Diet
summary_stats <- growth_data %>%
  group_by(diet) %>%
  summarise(
    Count = n(),
    Mean_Initial_Wt = mean(initial_wt),
    Mean_Final_Wt = mean(final_wt),
    Mean_ADG_kg = mean(adg),
    SD_ADG_kg = sd(adg)
  )

# Print the results summary table
print(summary_stats)

The resulting console output displays the performance metrics for each diet. Let's assume the experimental results yielded the following averages:

Diet Group	Steer Count	Mean Initial Wt (kg)	Mean Final Wt (kg)	Mean ADG (kg/day)	SD ADG (kg/day)
Control	10	318.5	402.2	0.93	0.08
Diet_A	10	320.1	435.3	1.28	0.10
Diet_B	10	319.4	443.6	1.38	0.09

6. Hypothesis Testing: Analysis of Variance (ANOVA)

Looking at the descriptive table, Diet A and Diet B steers appear to have gained weight faster than the Control steers. However, in scientific research, we must verify if this difference is statistically significant or if it occurred due to chance. Since we are comparing the means of three independent groups, we conduct a **One-Way Analysis of Variance (ANOVA)**.

# Fit the one-way ANOVA model
anova_model <- aov(adg ~ diet, data = growth_data)

# Display the ANOVA test summary
summary(anova_model)

The output of the ANOVA model displays the F-statistic and the corresponding p-value (denoted as Pr(>F)). If the p-value is less than our significance level (typically α = 0.05), we reject the null hypothesis and conclude that diet had a statistically significant effect on average daily weight gain.

Post-Hoc Analysis: Tukey's HSD Test

If the ANOVA p-value is significant, it tells us that *at least one group* is different, but it doesn't specify *which pairs* differ. To find out, we run the **Tukey Honest Significant Difference (HSD)** post-hoc test:

# Run post-hoc pairwise comparisons
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)

The Tukey HSD output lists pairwise comparisons (e.g., Diet_A vs. Control, Diet_B vs. Control, Diet_B vs. Diet_A) alongside their confidence intervals and adjusted p-values (p adj). If the adjusted p-value is less than 0.05, that specific comparison is statistically significant.

7. Data Visualization with ggplot2

Visualizing data distribution is crucial for publication reports. Boxplots are the gold standard for presenting grouping comparisons because they display the median, quartiles, and outliers clearly. Let's write the code to render a publication-quality ggplot2 boxplot:

# Plot average daily gain by diet group
ggplot(growth_data, aes(x = diet, y = adg, fill = diet)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 16, outlier.size = 2) +
  geom_jitter(width = 0.15, color = "#475569", alpha = 0.6) +
  scale_fill_manual(values = c("#64748b", "#0e7490", "#0284c7")) +
  labs(
    title = "Effect of Dietary Supplements on Beef Cattle Growth",
    subtitle = "Average Daily Gain (ADG) over a 90-day feeding trial",
    x = "Dietary Treatment",
    y = "Average Daily Gain (kg/day)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "none",
    plot.title = element_text(face = "bold", color = "#0a192f"),
    axis.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

8. Interpreting the Output

Based on our statistical calculations and plot output:

ANOVA Results: The F-value was large, and the p-value was significantly lower than 0.001 (P < 0.05). We reject the null hypothesis; feeding treatments significantly alter growth performance.
Pairwise Comparisons (Tukey HSD):
- **Diet_A vs. Control** and **Diet_B vs. Control** are both statistically significant (P < 0.01), proving that supplementation increases animal performance over standard forage.
- **Diet_B vs. Diet_A** has an adjusted p-value of 0.12. Since this is greater than 0.05, we conclude that although Diet B steers had a slightly higher numerical average (1.38 kg vs 1.28 kg), the difference is not statistically significant. Both supplements perform comparably.

Summary & Takeaway

You have successfully written a reproducible data analysis pipeline in R! Using this script, you imported cattle growth records, computed daily weight gains, summarized parameters, ran a formal analysis of variance, and generated a publication-ready boxplot.

These reproducible pipelines form the bedrock of modern agricultural science, bioinformatics, and data analytics. Building expertise in R and RStudio sets a solid foundation for handling complex agricultural trials, animal genetic assessments, or other future data analysis roles.

Citations & References

R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Wickham, H. et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.