我正在处理几个字符串,如下所示
Col1
--------------------------
554 - partial-completion_3
4011 - structure painted
5459 - 1 int mam-corrosion issue
996 - cast iron countershock
我的目标是将这些字符串分成两部分,例如
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
我尝试使用seperate
函数
df_sep = df %>%
separate(Col1, c("Col1_ID", "Col2_Desc"), "-")
仅当字符串中只有一个 - 时才有效,如果有两个 - 例如,在字符串中
`5459 - 1 int mam-corrosion issue`
然后单独的功能会在第二个 - 之后删除描述,输出看起来像这样
`5459 - 1 int mam`
这不是我所期待的。我期待下面的输出
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
非常感谢任何提示或建议。
答案 0 :(得分:3)
我们可以使用sub
将第一个-
替换为,
,然后使用read.csv
read.csv(text= sub("-", ",", df1$Col1), header=FALSE,
col.names=c("Col1_Id", "Col2_Desc"), stringsAsFactors=FALSE)
# Col1_Id Col2_Desc
#1 554 partial-completion_3
#2 4011 structure painted
#3 5459 1 int mam-corrosion issue
#4 996 cast iron countershock
在separate
的情况下,有一个extra
参数,可以用来解决这个问题
library(tidyr)
separate(df1, Col1, into = c("Col1_Id", "Col2_Desc"), extra="merge")
# Col1_Id Col2_Desc
#1 554 partial-completion_3
#2 4011 structure painted
#3 5459 1 int mam-corrosion issue
#4 996 cast iron countershock
df1 <- structure(list(Col1 = c("554 - partial-completion_3", "4011 - structure painted",
"5459 - 1 int mam-corrosion issue", "996 - cast iron countershock"
)), .Names = "Col1", class = "data.frame", row.names = c(NA,
-4L))
答案 1 :(得分:0)
一个基本R替代方法strsplit
将列拆分为一个列表,然后使用rbind.data.frame
构建一个data.frame。 SetNames
用于方便地在同一行中设置名称。
setNames(do.call(rbind.data.frame, strsplit(df1$Col1, split=" - ")),
c("Col1_Id", "Col2_Desc"))
Col1_Id Col2_Desc
1 554 partial-completion_3
2 4011 structure painted
3 5459 1 int mam-corrosion issue
4 996 cast iron countershock