r字符串分离问题

时间:2016-12-22 15:44:21

标签: r string split

我正在处理几个字符串,如下所示

Col1
--------------------------
554 - partial-completion_3
4011 - structure painted
5459 - 1 int mam-corrosion issue
996 - cast iron countershock

我的目标是将这些字符串分成两部分,例如

Col1_Id   Col2_Desc
--------------------------
554       partial-completion_3
4011      structure painted
5459      1 int mam-corrosion issue
996       cast iron countershock

我尝试使用seperate函数

df_sep =   df %>% 
  separate(Col1, c("Col1_ID", "Col2_Desc"), "-")

仅当字符串中只有一个 - 时才有效,如果有两个 - 例如,在字符串中

       `5459 - 1 int mam-corrosion issue`

然后单独的功能会在第二个 - 之后删除描述,输出看起来像这样

       `5459 - 1 int mam` 

这不是我所期待的。我期待下面的输出

    Col1_Id   Col2_Desc
    --------------------------
    554       partial-completion_3
    4011      structure painted
    5459      1 int mam-corrosion issue
    996       cast iron countershock

非常感谢任何提示或建议。

2 个答案:

答案 0 :(得分:3)

我们可以使用sub将第一个-替换为,,然后使用read.csv

进行阅读
read.csv(text= sub("-", ",", df1$Col1), header=FALSE, 
          col.names=c("Col1_Id",   "Col2_Desc"), stringsAsFactors=FALSE)
#   Col1_Id                  Col2_Desc
#1     554       partial-completion_3
#2    4011          structure painted
#3    5459  1 int mam-corrosion issue
#4     996     cast iron countershock

separate的情况下,有一个extra参数,可以用来解决这个问题

library(tidyr)
separate(df1, Col1, into = c("Col1_Id", "Col2_Desc"), extra="merge")
#  Col1_Id                 Col2_Desc
#1     554      partial-completion_3
#2    4011         structure painted
#3    5459 1 int mam-corrosion issue
#4     996    cast iron countershock

数据

df1 <- structure(list(Col1 = c("554 - partial-completion_3", "4011 - structure painted", 
"5459 - 1 int mam-corrosion issue", "996 - cast iron countershock"
)), .Names = "Col1", class = "data.frame", row.names = c(NA, 
-4L))

答案 1 :(得分:0)

一个基本R替代方法strsplit将列拆分为一个列表,然后使用rbind.data.frame构建一个data.frame。 SetNames用于方便地在同一行中设置名称。

setNames(do.call(rbind.data.frame, strsplit(df1$Col1, split=" - ")),
         c("Col1_Id", "Col2_Desc"))

  Col1_Id                 Col2_Desc
1     554      partial-completion_3
2    4011         structure painted
3    5459 1 int mam-corrosion issue
4     996    cast iron countershock