Question

我有一个非常脏的txt文件（.json）我是一列。我想为n个部分剪切这个文件。应该用来剪切它的模式是字符串，例如＆＃34; 2018-02-19 10:49:50＆＃34; （日期和时间当然不同）。我应该用grep？

我有这样的数据：

      text
1    2018-02-19 10:49:50 fgdfhdsgfhdsgfh 2018-02-19 10:49:50 abd abd adjskfjs 
     2018-02-19 10:51:21 jfhdsjfdsf

我想要的输出是：

      textA                 textB             textC
1    fgdfhdsgfhdsgfh   abd abd adjskfjs     jfhdsjfdsf

Answer 1

我们可以根据指示日期和时间的模式拆分字符串，然后修剪空白区域。

text <- "2018-02-19 10:49:50 fgdfhdsgfhdsgfh 2018-02-19 10:49:50 abd abd adjskfjs 2018-02-19 10:51:21 jfhdsjfdsf"

text2 <- trimws(strsplit(text, split = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}")[[1]][-1])

text2
# [1] "fgdfhdsgfhdsgfh"  "abd abd adjskfjs" "jfhdsjfdsf"

<强>更新

如果我们正在处理数据框中的列，并且我们希望输出位于不同的列中，我们可以使用stringr包中的str_split函数。请注意，在以下示例中，我复制了原始文本以形成具有一列和两行的数据框。

library(stringr)
text <- "2018-02-19 10:49:50 fgdfhdsgfhdsgfh 2018-02-19 10:49:50 abd abd adjskfjs 2018-02-19 10:51:21 jfhdsjfdsf"
text_df <- data.frame(text = rep(text, 2), stringsAsFactors = FALSE)
m1 <- str_split(text_df$text, pattern = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", simplify = TRUE)
m2 <- m1[, 2:ncol(m1)]
m3 <- apply(m2, 2, trimws)
m3
#      [,1]              [,2]               [,3]        
# [1,] "fgdfhdsgfhdsgfh" "abd abd adjskfjs" "jfhdsjfdsf"
# [2,] "fgdfhdsgfhdsgfh" "abd abd adjskfjs" "jfhdsjfdsf"

使用grep和模式切割零件的文本

1 个答案:

使用grep和模式​​切割零件的文本

1 个答案:

使用grep和模式切割零件的文本