我有一个SNP文件,我想计算每列中的数量。从列表中写表时,它显示错误为"参数意味着不同的行数"。我想要一个解决方案,以便我可以将列表写入表格。 请帮我。 输入文件:image file is added 输入文件包含830行和210列 #1 R代码
require(gdata)
library(plyr)
df = read.xls ("jTest_file.xlsx", sheet = 1, header = TRUE)
combine = c()
for(i in 1:v){
vec = count(df[,i])
colnames(vec) <- c (colnames(df[i]),"freq")
combine = c(combine,vec)
}
write.table(combine,file="test_output.xls",sep="\t",quote=FALSE,row.names =FALSE)
但输入中有一些空白值,所以我用XX替换空白,以便行号可以维持但不起作用。 #2 R代码
require(gdata)
library(plyr)
df = read.xls ("jTest_file.xlsx", sheet = 1, header = TRUE)
combine = c()
for(i in 1:v){
data=sub("^$", "XX", df[,i])
vec = count(data)
colnames(vec) <- c (colnames(df[i]),"freq")
combine = c(combine,vec)
}
write.table(combine,file="test_output.xls",sep="\t",quote=FALSE,row.names =FALSE)
答案 0 :(得分:0)
使用dplyr
和tidyr
软件包可以更清晰地进行这些计数。
由于您没有提供样本数据,我将先做一些:
#Make sample data
li = lapply(1:10, function(X) {
sample(x = c("A", "C", "G", "T"), size = 10,
replace = TRUE)
})
df = data.frame(li, stringsAsFactors = FALSE)
names(df) = paste("X", 1:10, sep = "")
head(df, 3)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 T G C T C A T T C T
# 2 A A A G G G T G C A
# 3 C C A T A A C A T G
现在真正的答案 - 做点数:
library(tidyr)
library(dplyr)
df_long = gather(df, var, value)
df_groups = group_by(df_long, var, value)
df_counts = summarise(df_groups, count = n())
df_wide = spread(df_counts, value, count, fill = 0)
df_wide
# Source: local data frame [10 x 5]
# Groups: var [10]
#
# var A C G T
# * <chr> <dbl> <dbl> <dbl> <dbl>
# 1 X1 3 4 0 3
# 2 X10 5 0 2 3
# 3 X2 3 2 2 3
# 4 X3 4 3 1 2
# 5 X4 2 1 4 3
# 6 X5 2 3 3 2
# 7 X6 4 2 1 3
# 8 X7 2 4 2 2
# 9 X8 2 3 2 3
# 10 X9 2 2 2 4
我建议您探索各个步骤(df_long
,df_groups
,df_counts
,df_wide
)。这将让您了解数据的运行情况。