表格转换:从列表到在场/缺席

时间:2015-07-21 11:28:22

标签: r

我正在寻找一个转换此表的解决方案:

____V1______V2______V3______V4______V5______V6
1:  SP1     SP2     SP3      NA      NA      NA
2:  SP1     SP3     SP6      NA      NA      NA
3:  SP3     SP5     SP7      SP8     SP9     SP10
4:  SP4     SP5     SP6      SP7     NA      NA

进入这一个(每个物种的存在/不存在):

___SP1___SP2___SP3___SP4___SP5___SP6___SP7___SP8___SP9___SP10
1:   1     1     1     0     0     0     0     0     0      0
2:   1     0     1     0     0     1     0     0     0      0
3:   0     0     1     0     1     0     1     1     1      1
4:   0     0     0     1     1     1     1     0     0      0

据说我在表1中有很多行和很多物种(我不知道有多少)。

有什么想法吗?

2 个答案:

答案 0 :(得分:6)

尝试

library(qdapTools)
res1 <- mtabulate(as.data.frame(t(df1)))

或者

library(reshape2)
res2 <- table(melt(as.matrix(df1), na.rm=TRUE)[,-2])
res2New <- res2[,paste0('SP',1:10)]
res2New
#     value
# Var1 SP1 SP2 SP3 SP4 SP5 SP6 SP7 SP8 SP9 SP10
#   1   1   1   1   0   0   0   0   0   0    0
#   2   1   0   1   0   0   1   0   0   0    0
#   3   0   0   1   0   1   0   1   1   1    1
#   4   0   0   0   1   1   1   1   0   0    0

如果我们需要转换为&#39; data.frame&#39;

 as.data.frame.matrix(res2New)

数据

 df1 <- structure(list(V1 = c("SP1", "SP1", "SP3", "SP4"), V2 = c("SP2", 
 "SP3", "SP5", "SP5"), V3 = c("SP3", "SP6", "SP7", "SP6"), V4 = c(NA, 
 NA, "SP8", "SP7"), V5 = c(NA, NA, "SP9", NA), V6 = c(NA, NA, 
 "SP10", NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6"), 
 class = "data.frame", row.names = c(NA, -4L))

答案 1 :(得分:3)

使用reshape2回答:

data <- read.table(text="V1 V2 V3 V4 V5 V6 
                   1: SP1 SP2 SP3 NA NA NA 
                   2: SP1 SP3 SP6 NA NA NA 
                   3: SP3 SP5 SP7 SP8 SP9 SP10 
                   4: SP4 SP5 SP6 SP7 NA NA")

#identify lines
data$line <- 1:nrow(data)
#turn data into long format
melt_data <- melt(data,id.var="line", variable.name="column",
                  value.name="species")
#rearrange levels species as otherwise SP10 comes after SP1
melt_data$species_fact <- factor(melt_data$species, 
                                    levels=paste0("SP",1:10))

#turn into - different- wide format for result
result <- dcast(data=melt_data[!is.na(melt_data$species_fact),],
                formula=line~species_fact,value.var="species_fact",
                fun.aggregate=length)
result

产量

> result
  line SP1 SP2 SP3 SP4 SP5 SP6 SP7 SP8 SP9 SP10
1    1   1   1   1   0   0   0   0   0   0    0
2    2   1   0   1   0   0   1   0   0   0    0
3    3   0   0   1   0   1   0   1   1   1    1
4    4   0   0   0   1   1   1   1   0   0    0