使用带有mdply的table()的R问题

时间:2014-10-08 10:00:55

标签: r plyr

我想计算数据帧每行元素的出现次数。我正在尝试与mdply一起使用表来这样做。在我的测试数据上,这很好。

library(plyr)
Data_frame_s1 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell1=c("state_1", "state_2" ,"unclassified"))
Data_frame_s2 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell2=c("state_1", "state_2", "unclassified"))
Data_frame_s3 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell3=c("state_2", "unclassified", "unclassified"))
Data_frame_s4 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell4=c("unclassified", "unclassified", "unclassified"))
temp = list(Data_frame_s1, Data_frame_s2, Data_frame_s3, Data_frame_s4)
Data_frame <- join_all(temp, by="Gene_symbol", type="full")
add_table <- function(row) {r = table(unname(unlist(row[-1])))}
Data_frame <- adply(Data_frame, 1, add_table)
Data_frame

这给了我

  Gene_symbol        cell1        cell2        cell3        cell4 state_1
1       geneA      state_1      state_1      state_2 unclassified       2
2       geneB      state_2      state_2 unclassified unclassified       0
3       geneC unclassified unclassified unclassified unclassified       0
  state_2 unclassified
1       1            1
2       2            2
3       0            4

根据需要。

但是,当我对我的真实数据执行相同操作时,我收到错误:

Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
  Results do not have equal lengths
Calls: adply -> ldply -> list_to_dataframe
Execution halted

奇怪的是,我可以通过将数据保存到文件,加载它们然后调用mdply来解决这个问题。所以

Data_frame <- join_all(temp, by="Gene_symbol", type="full")
write.table(Data_frame, "test.txt")
Data_frame <- read.table("test.txt", header=TRUE)
Data_frame <- adply(Data_frame, 1, add_table)
Data_frame

为我提供了真实数据的预期结果。

有人对此行为有解释吗?我知道这个问题难以重现,但正如我所说,我无法使用任何测试数据自行复制它,而且当我保存并重新加载数据时它甚至都不会存在。但也许有人从我上面描述的内容中得到了一个想法。

0 个答案:

没有答案