编辑

Question

我有一个名为原因的数据框，其中某些行的列中有带括号的数字。格式是这样的。

concern                          notaware           scenery
(2) chat community (4) more      
(1) didn't know                  (1) beautiful      (3) stunning
(3) often                                           (1) always

可复制的版本：

structure(list(concern = c("(2) chat community (4) more", "(1) didn't know", 
"(3) often"), notaware = c("", "(1) beautiful", ""), scenery = c("", 
"(3) stunning", "(1) always")), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

我想要一个仅带括号和数字的新数据框

concern                          notaware            scenery
(2) (4) 
(1)                             (1)                (3) 
(3)                             (1)

我意识到这里有一个类似的问题，但是数据不在列中

Extracting data into new columns using R

，但这似乎不适用于数据框

Extract info inside all parenthesis in R

根据我所查找的问题，我试图解决一种变通方法。我尝试过

reasons %>% mutate(concern1 = str_match(concern, pattern = "\\(.*?\\)"))

这导致数据帧保持不变。

还有这个

reasons$concern1 <- sub(regmatches(reasons$concern, gregexpr(pat, reasons$concern, perl=TRUE)))

这是哪个

Error in sub(regmatches(UltraCodes$concern, gregexpr(pat, 
UltraCodes$concern,  : 
argument "x" is missing, with no default

我看着这个，我知道它是第二个问题的重复，但是对我来说更有意义。

Using R to parse and return text in parenthesis

我用过

pat <- "(?<=\\()([^()]*)(?=\\))"
concern1 <- regmatches(reasons$concern, gregexpr(pat, reasons$concern, 
perl=TRUE))

这给了我一个带有名称，类型和值的列表-尽管值是'2'而不是（2），但这些值仍然是我想要的

因此，我认为我可以制作多个列表，并尝试将它们放入一个数据框，以便从notaware列中创建一个notaware1列表，依此类推。我有一种感觉，空白值在尝试时会引发错误。

reasons1 <-data.frame(concern1, notaware1)
reasons1 <-as.data.frame(concern1, notaware1)

哪个给我

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = 
TRUE,  : 
arguments imply differing number of rows: 0, 1, 2

我不太了解，因为我所有的名单都一样长，我觉得我对这里的一些基本知识误解了。

接下来，我想我可以通过将列表导出到csv来进行环绕操作，但是我发现的答案似乎想让我先将列表转换为数据框，这是我的问题。

然后我找到了

reasons$concern3 <-paste(concern1)

哪个确实将列表添加到了数据框中，我可以对所有列表重复此操作。

但是有点混乱，因为现在空白以character（0）给出，一个括号是单个数字，而其中两个括号是c（“ 2”，“ 9”），所以我的列现在看起来像这样< / p>

concern                          adventure          scenery
c("2", "9")                      character(0)       character(0)
1                                1                  3
3                                1                  character(0)

但是我可以将一些内容整理到一个csv文件中。

有没有更简单的方法？

Answer 1

我们在这里所做的是按列循环data.frame并使用str_extract_all包中的stringr提取括号中的所有数字。

由于可以从单个单元格中提取多个值，因此我们需要str_extract_all和simplify=T参数，该参数返回每一列的data.frame（行是{{1}中的行}，其中包含找到的每个匹配项的列）。

然后，我们需要使用df浏览这些表，以将每一行绑定到一个字符向量中（此处用空格分隔，但是您可以更改）。现在，我们每列只有一个apply，因此vector可以将它们很好地缝合到data.frame中。

apply

Answer 2

您在寻找吗？

 data.frame(gsub("[^()0-9]","",as.matrix(dat)))

  concern notaware scenery
1  (2)(4)                 
2     (1)      (1)     (3)
3     (3)              (1)

编辑

 data.frame(gsub("(?<!\\))(?:\\w+|[^()])(?!\\))","",as.matrix(dat),perl=T))
   concern notaware scenery
1 (2) (4)                  
2     (1)      (1)     (3) 
3     (3)              (1)

Answer 3

使用gsub删除数字和方括号之外的所有内容：

     data <- cbind("concern" = c("(2) chat community (4) more ", "(1) didn't know ", "(3) often  "), notaware=c("", "(2) chat community", "" ) )  

      gsub("[^0-9\\(\\)]", "", data)

将数据框内各列之间的括号之间的文本提取到数据框内新的各列

3 个答案:

编辑