所以我有一个df,上面有单词及其频率的列表。我想用数字过滤掉行;因为它主要是字符,但是R会将每个条目识别为字符。
我尝试过:
right
但这没用。
test <- test %>%
filter(word == as.character(word)
此外,有没有办法使所有条目都小写? 我希望看到一个没有行的df,行中有一个数字以及所有小写的条目(以后将被分组)。
答案 0 :(得分:2)
最简单的是基础R解决方案。使用grepl
在word
列中搜索一位。取反结果(!
)并提取那些行。
test[!grepl('[[:digit:]]', test$word), ]
## A tibble: 18 x 2
# word n
# <chr> <int>
# 1 data 213
# 2 summit 131
# 3 research 101
# 4 program 98
# 5 analysis 90
# 6 study 84
# 7 evaluation 82
# 8 minority 82
# 9 experience 76
#10 department 72
#11 statistical 65
#12 Experience 63
#13 business 60
#14 design 58
#15 education 58
#16 response 58
#17 sampling 55
#18 learning 50
编辑。
该问题还要求输出小写单词。
test$word <- tolower(test$word)
答案 1 :(得分:1)
一种选择是根据'单词'中出现一个或多个数字(filter
)\\d+
行,然后取反(!
)以仅保留行没有任何数字。
library(dplyr)
library(stringr)
test %>%
mutate(word = tolower(word)) %>%
filter(!str_detect(word, "\\d+"))
或与grep
test %>%
mutate(word = tolower(word)) %>%
slice(grep("\\d+", word, invert = TRUE))
# A tibble: 18 x 2
# word n
# <chr> <int>
# 1 data 213
# 2 summit 131
# 3 research 101
# 4 program 98
# 5 analysis 90
# 6 study 84
# 7 evaluation 82
# 8 minority 82
# 9 experience 76
#10 department 72
#11 statistical 65
#12 experience 63
#13 business 60
#14 design 58
#15 education 58
#16 response 58
#17 sampling 55
#18 learning 50
答案 2 :(得分:1)
您可以这样做:
test %>%
mutate(word = tolower(word)) %>%
filter(!grepl("[^A-Za-z]", word))
word n
<chr> <int>
1 data 213
2 summit 131
3 research 101
4 program 98
5 analysis 90
6 study 84
7 evaluation 82
8 minority 82
9 experience 76
10 department 72
11 statistical 65
12 experience 63
13 business 60
14 design 58
15 education 58
16 response 58
17 sampling 55
18 learning 50