Data Frame Tips

Data frames are the most important data structures in R, so you will be using them a lot. Here are some useful tricks to manipulate your data frames from within R. Once you get used to these, you’ll wonder why you wasted all those years with your spreadsheet program.

Practice using the newt data from week 2:
femalenewtsampledata and malenewtsampledata.

Open the file in Microsoft Excel and then save it as a comma-delimited text file (.csv) before importing it into R. Some of the commands below may spill onto multiple lines depending on the width of your web browser. Each command should be on one line in R (most commands have a “<-” in them — if “<-” appears twice, then you are looking at two commands).

# Load the data:

male_newts <- read.csv(“MaleNewtSampleData.csv”)

female_newts <- read.csv(“FemaleNewtSampleData.csv”)

# Keep just svl, tht and numberofmates:
keepers <- c(“svl”, “tht”, “numberofmates”)
males_new <- male_newts[keepers]

# Keep just the first three columns:
males_new <- male_newts[1:3]

# Add a column with a transformed variable:
male_newts$relative_fitness <- male_newts$numberofoffspring / mean(male_newts$numberofoffspring)

# Exclude a variable by name:
losers <- names(male_newts) %in% c(“MaleID”, “FemNum”, “Tank”)
males_new <- male_newts[!losers]

# Exclude variables by column number:
males_new <- male_newts[c(-1,-3)] #Excludes column 1 and 3
males_new <- male_newts[c(-1:-5)] #Excludes columns 1 through 5

# Dropping columns by name:
males_new$cs <- NULL

# Selecting rows of data:
males_new <- male_newts[1:5,] #select first 5 rows
males_new <- male_newts[12:17,] #select rows 12-17

#Using subset() to select rows with certain characteristics:
males_new <- subset(male_newts, numberofmates > 1) #select all rows with numberofmates > 1

males_new <- subset(male_newts, svl >= 75 & svl <= 80) # select males with svl from 75-80, inclusive

males_new <- subset(male_newts, svl <= 75 | svl >= 80) # select males with svl 75 or less and 80 or higher

males_new <- subset(male_newts, svl < 75 & Tank == 12) # select males from tank 12 under 75 svl

males_new <- subset(male_newts, Tank == 9, select=c(MaleID, PITtag)) # Get just the ID and PITtag for males from Tank 9

males_new <- subset(male_newts, numberofmates==2, select=svl:legs) # Get columns svl-legs for males with 2 mates

males_new <- subset(male_newts, MaleID==”CTM071″, select=MaleID:numberofmates) # Keep some columns just for MaleID “CTM071”

#Randomly choose some rows from my data:
newt_sample <- male_newts[sample(1:nrow(male_newts), 10, replace=FALSE),] # Randomly samples 10 different rows

#Create a new data frame with males and females, with svl and numberofmates:
sex <- c(rep(“male”, length(male_newts$svl)), rep(“female”, length(female_newts$svl)))
svl <- c(male_newts$svl, female_newts$svl)
numberofmates <- c(male_newts$numberofmates, female_newts$numberofmates)
new_table <- data.frame(sex, svl, numberofmates)
write.csv(new_table, file=”new_table.csv”)

#Save data frame to file:
write.csv(new_table, file=”new_table.csv”)

Advertisements