我是StackOverflow和R stats的新手,所以请耐心等待。 我在SAS编程方面有很多经验,但我正在努力学习R. 我通常使用SAS和R来转换大型数据集,我有一个按研究地点矩阵划分的物种如下:
Species Status Role Site1 Site2 Site3...Site25 A_a S P 0 0 0 1 A_b SO X 1 25 0 0 B_a S P 0 2 1 1 B_b S X 0 1 0 0 ...
我想转换这个表并创建两个名为" Site"和"伯爵"基于站点变量名称和每个站点内的计数数据:
Species Status Role Site Count A_a S P Site1 0 A_a S P Site2 0 A_a S P Site3 0 A_a S P Site25 1 A_b SO X Site1 1 A_b SO X Site2 25 A_b SO X Site3 0 A_b SO X Site25 0 ... B_b S X Site25 0
我认为这可能超出了简单的t()函数,并且已经查看了包重构和reshape2,但是对于如何继续而言有点迷失。有没有人会有这样的情况,可以帮助编码? 谢谢,JimH
答案 0 :(得分:1)
你可以使用dplyr / tidyr这样做:
install.packages(c("tidyr", "dplyr"), dependencies = TRUE)
library(dplyr)
library(tidyr)
df %>% gather(Site, Count, grep('Site', names(df))) %>% arrange(Species)
答案 1 :(得分:1)
或者基础R中有点老派(我意识到代码可以更简洁,请随意优化),
df <- structure(list(Species = structure(1:4, .Label = c("A_a", "A_b",
"B_a", "B_b"), class = "factor"), Status = structure(c(1L, 2L,
1L, 1L), .Label = c("S", "SO"), class = "factor"), Role = structure(c(1L,
2L, 1L, 2L), .Label = c("P", "X"), class = "factor"), Site1 = c(0L,
1L, 0L, 0L), Site2 = c(0L, 25L, 2L, 1L), Site3 = c(0L, 0L, 1L,
0L)), .Names = c("Species", "Status", "Role", "Site1", "Site2",
"Site3"), class = "data.frame", row.names = c(NA, -4L))
df
#> Species Status Role Site1 Site2 Site3
#> 1 A_a S P 0 0 0
#> 2 A_b SO X 1 25 0
#> 3 B_a S P 0 2 1
#> 4 B_b S X 0 1 0
reshape(df,
varying = c("Site1", "Site2", "Site3"),
v.names = "Count",
timevar = "Site",
times = c("Site1", "Site2", "Site3"),
new.row.names = 1:1000,
direction = "long")
#> Species Status Role Site Count id
#> 1 A_a S P Site1 0 1
#> 2 A_b SO X Site1 1 2
#> 3 B_a S P Site1 0 3
#> 4 B_b S X Site1 0 4
#> 5 A_a S P Site2 0 1
#> 6 A_b SO X Site2 25 2
#> 7 B_a S P Site2 2 3
#> 8 B_b S X Site2 1 4
#> 9 A_a S P Site3 0 1
#> 10 A_b SO X Site3 0 2
#> 11 B_a S P Site3 1 3
#> 12 B_b S X Site3 0 4