
时间:2016-02-07 23:25:18

标签: r tidyr

我有一个看起来像这样的数据集 -


1 ATOM      1  N   ILE A  12      67.611  47.640  52.312  1.00 12.44           N  
2 ATOM      2  CA  ILE A  12      66.381  47.660  51.520  1.00 25.25           C  

它有一个名为col1的列。我想分成12列,我正在使用以下命令 -

try=separate(subset,col1,c("name","S.No","Atom Name","Residue Name","Symbol","Residue Number","X-cor","Y-cor","Z-cor","Uk1","Uk2","Symbol"), sep= " ")

但我继续收到以下错误,我不明白 -


警告消息:3929个位置的值太多:1,2,3,4,5,6,   7,8,9,10,11,12,13,14,15,16,17,18,19,20 ......

它给了我以下输出 -

name S.No Atom Name Residue Name Symbol Residue Number X-cor Y-cor Z-cor Uk1 Uk2 Symbol

1 ATOM                                                       1           N            ILE

2 ATOM                                                       2          CA     ILE      A


我假设您的数据集名称为data.frame(do.call(rbind, unlist(apply(subset, 1, function(x) strsplit(x, "\\s+")),recursive=FALSE))) 。对于data.frame的每一行,您将其按空格分开,即此部分subset。其余的基本上是将它放在data.frame中。


刚刚想出来,在您的代码中只需将strsplit(x, "\\s+"))替换为sep= " "即可。 sep= "\\s+"至少在空格上陈述,而你的\\s+恰好是一个空格。

解决方案: - 不要使用&#34; sep&#34;如果你想分割由&#34;。&#34;



> df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
> df %>% separate(x, c("A", "B"))
  A    B
  1 <NA> <NA>
  2    a    b
  3    a    d
  4    b    c

#Reason for warning:

> x="Sepal.Width"
> strsplit(x,split=".")
[1] "" "" "" "" "" "" "" "" "" "" ""

> str_detect(x,".")
[1] TRUE
> str_replace(x,".","_")
[1] "_epal.Width"
> str_replace_all(x,".","_")
[1] "___________"