我有以下数据集。我的目的是了解两种类型的地点中的数量以及哺乳动物和鸟类的情况。
df1:
Location Type Cat Mouse Dog Chicken Turkey Horse
1 1 0 0 1 0 1
1 0 0 1 0 1 0
2 1 1 1 1 1 1
2 0 1 0 0 0 0
1 1 1 0 0 1 0
我希望它读为
df2:
Location Type M M M B B M
1 1 0 0 1 0 1
1 0 0 1 0 1 0
2 1 1 1 1 1 1
2 0 1 0 0 0 0
1 1 1 0 0 1 0
“ M”代表哺乳动物,“ B”代表鸟类
我尝试将数据手动输入到.csv文件中并在R中使用,但是该文件的读取方式为
df2:
Location Type M M1 M2 B B1 M3
1 1 0 0 1 0 1
1 0 0 1 0 1 0
2 1 1 1 1 1 1
2 0 1 0 0 0 0
1 1 1 0 0 1 0
我不确定为什么每个“ M”或“ B”列都单独编号,如何防止这种情况发生
或
在下面的另一个数据框中,我也将动物分类为哺乳动物和鸟类
dfanimal:
Name of Animal Mammal/Bird
Cat Mammal
Dog Mammal
Mouse Mammal
Chicken Bird
Turkey Bird
Horse Mammal
如果我有办法直接使用数据框df1和dfanimal?
非常感谢您的帮助。
答案 0 :(得分:1)
手动更改列名称后,可以在导入csv时使用check.names = FALSE
。由于不建议在数据框中使用重复的列名,因此默认情况下,这些后缀由R添加。
df1 <- read.csv('location/of/file.csv', check.names = FALSE)
如果您想使用df_animal
更改列名,我们可以使用match
names(df1)[-1] <- substr(df_animal$Mammal.Bird[match(names(df1)[-1],
df_animal$Name_of_Animal)], 1, 1)
df1
# Location M M M B B M
#1 1 1 0 0 1 0 1
#2 1 0 0 1 0 1 0
#3 2 1 1 1 1 1 1
#4 2 0 1 0 0 0 0
#5 1 1 1 0 0 1 0
数据
df1 <- structure(list(Location = c(1L, 1L, 2L, 2L, 1L), Cat = c(1L,
0L, 1L, 0L, 1L), Mouse = c(0L, 0L, 1L, 1L, 1L), Dog = c(0L, 1L,
1L, 0L, 0L), Chicken = c(1L, 0L, 1L, 0L, 0L), Turkey = c(0L,
1L, 1L, 0L, 1L), Horse = c(1L, 0L, 1L, 0L, 0L)), class = "data.frame",
row.names = c(NA, -5L))
df_animal <- structure(list(Name_of_Animal = structure(c(1L, 3L, 5L, 2L, 6L,
4L), .Label = c("Cat", "Chicken", "Dog", "Horse", "Mouse", "Turkey"
), class = "factor"), Mammal.Bird = structure(c(2L, 2L, 2L, 1L,
1L, 2L), .Label = c("Bird", "Mammal"), class = "factor")), class = "data.frame",
row.names = c(NA, -6L))