我有一个单词的数据框,我想过滤掉R中单词列中具有数字的行

时间:2019-08-05 18:55:11

标签: r filter dplyr

所以我有一个df,上面有单词及其频率的列表。我想用数字过滤掉行;因为它主要是字符,但是R会将每个条目识别为字符。

我尝试过:

right

但这没用。

test <- test %>%
filter(word == as.character(word)

此外,有没有办法使所有条目都小写? 我希望看到一个没有行的df,行中有一个数字以及所有小写的条目(以后将被分组)。

3 个答案:

答案 0 :(得分:2)

最简单的是基础R解决方案。使用greplword列中搜索一位。取反结果(!)并提取那些行。

test[!grepl('[[:digit:]]', test$word), ]
## A tibble: 18 x 2
#   word            n
#   <chr>       <int>
# 1 data          213
# 2 summit        131
# 3 research      101
# 4 program        98
# 5 analysis       90
# 6 study          84
# 7 evaluation     82
# 8 minority       82
# 9 experience     76
#10 department     72
#11 statistical    65
#12 Experience     63
#13 business       60
#14 design         58
#15 education      58
#16 response       58
#17 sampling       55
#18 learning       50

编辑。

该问题还要求输出小写单词。

test$word <- tolower(test$word)

答案 1 :(得分:1)

一种选择是根据'单词'中出现一个或多个数字(filter\\d+行,然后取反(!)以仅保留行没有任何数字。

library(dplyr)
library(stringr)
test %>% 
  mutate(word = tolower(word)) %>%
  filter(!str_detect(word, "\\d+"))

或与grep

test %>%
     mutate(word = tolower(word)) %>%
     slice(grep("\\d+", word, invert = TRUE))
# A tibble: 18 x 2
#   word            n
#   <chr>       <int>
# 1 data          213
# 2 summit        131
# 3 research      101
# 4 program        98
# 5 analysis       90
# 6 study          84
# 7 evaluation     82
# 8 minority       82
# 9 experience     76
#10 department     72
#11 statistical    65
#12 experience     63
#13 business       60
#14 design         58
#15 education      58
#16 response       58
#17 sampling       55
#18 learning       50

答案 2 :(得分:1)

您可以这样做:

test %>%
 mutate(word = tolower(word)) %>%
 filter(!grepl("[^A-Za-z]", word))

   word            n
   <chr>       <int>
 1 data          213
 2 summit        131
 3 research      101
 4 program        98
 5 analysis       90
 6 study          84
 7 evaluation     82
 8 minority       82
 9 experience     76
10 department     72
11 statistical    65
12 experience     63
13 business       60
14 design         58
15 education      58
16 response       58
17 sampling       55
18 learning       50