Question

我的数据集如下：

   A              B                              C
   hello          Radiation therapy              NA
   Hello1         hello2 for neurology           hello3 radiation

还有更多行。

现在我打算删除“for”之后的所有文本，如“神经病学”以及所有包含“辐射”的文本。所以我期待输出为：

   A              B                            C
   hello          therapy                      NA
   Hello1         hello2                       hello3

Answer 1

示例数据框：

df <- data.frame(B = c("Radiation therapy", "hello2 for neurology"))

然后代码从数据帧的B列中分出你想要的字符串：

df$B <- gsub("Radiation | for.*", "", df$B)

Answer 2

请尝试以下操作。

dat <-
structure(list(A = c("hello", "Hello1"), B = c("Radiation therapy", 
"hello2 for neurology"), C = c(NA, "hello3 radiation")), .Names = c("A", 
"B", "C"), row.names = c(NA, -2L), class = "data.frame")

dat[] <- lapply(dat, function(x) gsub("radiation|for.*", "", x, ignore.case = TRUE))
dat
       A        B       C
1  hello  therapy    <NA>
2 Hello1  hello2  hello3

从r中的数据集中删除特定单元格

2 个答案: