Question

底码有更优雅的解决方案吗？基本上，我想在一个字符向量上进行strsplit。我想知道是否有更好的解决方案，例如使用％in％或其他。

data_d <- data.frame(id = c('A', 'B', 'C'),
                     sentence = c('1. this is A sentence',
                                  '2. this is B sentence',
                                  '3. this is C sentence'),
                     stringsAsFactors = F)
listasd <- c('A', 'B', 'C')
data_d$first <- NA
for (i in listasd)
  data_d$first <-  ifelse(str_detect(data_d$sentence, i),
                          sapply(strsplit(data_d$sentence, i), "[", 1),
                          data_d$first)

Answer 1

也许考虑使用stringi包？

所以也许是一个更优雅的解决方案：

listasd <- c('C', 'A', 'B')
stri_split_regex(data_d$sentence, stri_paste(listasd, collapse="|"), n=2, simplify = TRUE)[,1]

它返回一个有趣的句子部分的向量，而不使用sapply：

[1] "1. this is " "2. this is " "3. this is "

所以你可以在没有循环的情况下制作一个解决方案，这在R：

中非常慢

data_d$first <- stri_split_regex(data_d$sentence, stri_paste(listasd, collapse="|"), n=2, simplify = TRUE)[,1]

Answer 2

这给出了相同的输出：

private static java.util.HashMap<String, BusinessCode_Enum> _table_ = new java.util.HashMap<String, BusinessCode_Enum>();

根据sapply(strsplit(data_d$sentence, c('A','B','C')),'[',1) # [1] "1. this is " "2. this is " "3. this is "，?split参数可以采用沿split回收的字符向量。

如果您尝试：

仍然有效，因为在第1和第3个字符串中没有任何内容可以拆分。

Answer 3

您可以使用gsub。正则表达式从大写字母到行尾。如果你的句子中有其他大写字母，你需要调整它。

data_d$first <- gsub('[A-Z].*$', '', data_d$sentence)

> data_d
  id              sentence       first
1  A 1. this is A sentence 1. this is 
2  B 2. this is B sentence 2. this is 
3  C 3. this is C sentence 3. this is

字符串从列表中的任何字符拆分

3 个答案: