我想计算数据帧每行元素的出现次数。我正在尝试与mdply一起使用表来这样做。在我的测试数据上,这很好。
library(plyr)
Data_frame_s1 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell1=c("state_1", "state_2" ,"unclassified"))
Data_frame_s2 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell2=c("state_1", "state_2", "unclassified"))
Data_frame_s3 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell3=c("state_2", "unclassified", "unclassified"))
Data_frame_s4 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"), cell4=c("unclassified", "unclassified", "unclassified"))
temp = list(Data_frame_s1, Data_frame_s2, Data_frame_s3, Data_frame_s4)
Data_frame <- join_all(temp, by="Gene_symbol", type="full")
add_table <- function(row) {r = table(unname(unlist(row[-1])))}
Data_frame <- adply(Data_frame, 1, add_table)
Data_frame
这给了我
Gene_symbol cell1 cell2 cell3 cell4 state_1
1 geneA state_1 state_1 state_2 unclassified 2
2 geneB state_2 state_2 unclassified unclassified 0
3 geneC unclassified unclassified unclassified unclassified 0
state_2 unclassified
1 1 1
2 2 2
3 0 4
根据需要。
但是,当我对我的真实数据执行相同操作时,我收到错误:
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :
Results do not have equal lengths
Calls: adply -> ldply -> list_to_dataframe
Execution halted
奇怪的是,我可以通过将数据保存到文件,加载它们然后调用mdply来解决这个问题。所以
Data_frame <- join_all(temp, by="Gene_symbol", type="full")
write.table(Data_frame, "test.txt")
Data_frame <- read.table("test.txt", header=TRUE)
Data_frame <- adply(Data_frame, 1, add_table)
Data_frame
为我提供了真实数据的预期结果。
有人对此行为有解释吗?我知道这个问题难以重现,但正如我所说,我无法使用任何测试数据自行复制它,而且当我保存并重新加载数据时它甚至都不会存在。但也许有人从我上面描述的内容中得到了一个想法。