Question

嘿所以我有一个头像（）打印像这样：

# A tibble: 6 × 1
                                   id.make.model.year
                                             <chr>
1  27550?????AM General?????DJ Po Vehicle 2WD?????1984
2  28426?????AM General?????DJ Po Vehicle 2WD?????1984
3   27549?????AM General?????FJ8c Post Office?????1984
4   28425?????AM General?????FJ8c Post Office?????1984
5 1032?????AM General?????Post Office DJ5 2WD?????1985
6 1033?????AM General?????Post Office DJ8 2WD?????1985

只有一列。我想将这四个列名称分成四列。我尝试使用separate()

A %>% 
  separate(id.make.model.year,into=c("id","make"),sep="?????")

和

A %>% 
  separate(id.make.model.year,into=c("id","make"),sep="\\?????")

但它们都返回以下错误：

stringi :: stri_split_regex（value，sep，n_max）中的错误：正则表达式模式中的语法错误。（U_REGEX_RULE_SYNTAX）

又一次尝试......：

A %>% 
  separate(id.make.model.year,into=c("id","make"),sep="[?????]")

返回

# A tibble: 33,439 × 2
      id  make
*  <chr> <chr>
1  27550      
2  28426      
3  27549      
4  28425      
5   1032      
6   1033      
7   3347      
8  13309      
9  13310      
10 13311      
# ... with 33,429 more rows
Warning message:
Too many values at 33439 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...

我也尝试删除sep，但所有空格都被明确计为分隔符。

这样做的正确方法是什么？提前谢谢。

Answer 1

匹配一个问号的正则表达式为\?或[?]。但是，如果您有五个，[?????]仍然只有一个匹配该字符的一个匹配项，因为[...]定义了一个字符类。就像[aaaaa]只匹配一个字母a，而不是五个。

所以要抓住我认为你需要的五个重复\?{5}或[?]{5}（或\?\?\?\?\?或[?][?][?][?][?]）。

在您使用dput()发布数据之前，我无法确认。

Answer 2

以下是splitstackshape和data.table个套餐的解决方案。您使用cSplit()拆分列。由于您需要四列，因此您希望在函数中指定direction = "wide"。创建四列后，您需要更改列名称。我使用strsplit()拆分原始列名，并创建了四个您想要的名称。

library(splitstackshape)
library(data.table)

mydf <- data.frame(id.make.model.year = c("27550?????AM General?????DJ Po Vehicle 2WD?????1984",
                                          "28426?????AM General?????DJ Po Vehicle 2WD?????1984"),
                   stringsAsFactors = F)

temp <- cSplit(mydf, splitCols = "id.make.model.year", sep = "?????", direction = "wide")
setnames(temp, unlist(strsplit(names(mydf), "[.]")))


#      id       make             model year
#1: 27550 AM General DJ Po Vehicle 2WD 1984
#2: 28426 AM General DJ Po Vehicle 2WD 1984

如何使用单独的（）分隔数据与5个问号分隔符？

2 个答案: