我这里有一个列表如下:
head(h)
[[1]]
[1] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[[2]]
character(0)
[[3]]
[1] "locus_tag=CD630_05950" "location=719777..720313"
[[4]]
[1] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
我在尝试操作此列表以创建包含三列的data.frame时遇到问题。对于缺少基因信息的行,我想将它们列为"gene=unnamed"
并将空行完全删除到矩阵中,如下所示:
[,1] [,2] [,3]
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[2,] "gene=thrA" "locus_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
这就是我现在所拥有的,但是我在基因列中遗漏了错误。有什么建议吗?
h <- data.frame(h[lapply(h,length)>0])
h <- t(h)
rownames(h) <- NULL
答案 0 :(得分:1)
有许多方法可以绑定长度不等的列表。请参阅bind_rows
的{{1}},dplyr
的{{1}}或rbind.fill
的{{1}}。这是使用基数R
plyr
答案 1 :(得分:1)
# Data
l <- list(c("gene=dnaA","locus_tag=CD630_00010", "location=1..1320"),
character(0), c("locusc_tag=CD630_05950", "location=719777..720313"),
c("gene=dnrA","locus_tag=CD630_00010" ,"location=50..1320" ))
# Manipulation
n <- sapply(l, length)
seq.max <- seq_len(max(n))
df <- t(sapply(l, "[", i = seq.max))
df <- t(apply(df,1,function(x){
c(x[is.na(x)],x[!is.na(x)])}))
df <- df[rowSums(!is.na(df))>0, ]
df[is.na(df)] <- "gen=unnamed"
输出:
[,1] [,2] [,3]
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[2,] "gen=unnamed" "locusc_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"