Question

我这里有一个列表如下：

head(h)
[[1]]
[1] "gene=dnaA"             "locus_tag=CD630_00010" "location=1..1320"     

[[2]]
character(0)

[[3]]
[1] "locus_tag=CD630_05950"   "location=719777..720313"

[[4]]
[1] "gene=dnrA"             "locus_tag=CD630_00010" "location=50..1320"

我在尝试操作此列表以创建包含三列的data.frame时遇到问题。对于缺少基因信息的行，我想将它们列为"gene=unnamed"并将空行完全删除到矩阵中，如下所示：

     [,1]        [,2]                    [,3]                             
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"              
[2,] "gene=thrA" "locus_tag=CD630_05950" "location=719777..720313"             
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"

这就是我现在所拥有的，但是我在基因列中遗漏了错误。有什么建议吗？

  h <- data.frame(h[lapply(h,length)>0])
  h <- t(h)
  rownames(h) <- NULL

Answer 1

有许多方法可以绑定长度不等的列表。请参阅bind_rows的{{1}}，dplyr的{{1}}或rbind.fill的{{1}}。这是使用基数R

plyr

Answer 2

# Data

l <- list(c("gene=dnaA","locus_tag=CD630_00010", "location=1..1320"),
character(0), c("locusc_tag=CD630_05950", "location=719777..720313"),
c("gene=dnrA","locus_tag=CD630_00010" ,"location=50..1320" ))

# Manipulation

n <- sapply(l, length)
seq.max <- seq_len(max(n))
df <-  t(sapply(l, "[", i = seq.max))
df <- t(apply(df,1,function(x){
  c(x[is.na(x)],x[!is.na(x)])}))
df <- df[rowSums(!is.na(df))>0, ]     
df[is.na(df)] <- "gen=unnamed"

输出：

     [,1]          [,2]                     [,3]                     
[1,] "gene=dnaA"   "locus_tag=CD630_00010"  "location=1..1320"       
[2,] "gen=unnamed" "locusc_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA"   "locus_tag=CD630_00010"  "location=50..1320"

R：从缺少值的列表创建数据框。

2 个答案: