Question

考虑以下代码来计算每个单词中字母'a'的出现次数：

data <- data.frame(number=1:4, string=c("this.is.a.great.word", "Education", "Earth.Is.Round", "Pinky), stringsAsFactors = F)

library(stringr)

data$Count_of_a <- str_count(data$string, "a")

data

这会产生这样的结果：

  number               string Count_of_a
1      1 this.is.a.great.word          2
2      2            Education          1
3      3       Earth.Is.Round          1
4      4       Pinky                   0

我试图做更多的事情：

计算每个单词中元音的总数
总数没有。每个单词中的字母
单词是否以元音开头，然后是1，否则为0
单词是否以元音结尾，然后是1，否则为0

问题是如果我使用nchar（数据$ string），它也会计算点'。' 我也找不到上述4项要求的帮助。

最终数据我想看起来像这样：

number    string                 starts_with_vowel   ends_with_vowel   TotalLtrs
1         this.is.a.great.word          0                 0             16
2         Education                     1                 0             9
3         Earth.Is.Round                1                 0             12
4         Pinky                         0                 1             5

Answer 1

您想要一组正则表达式

library(tidyverse)
data %>%
  mutate(
    nvowels = str_count(tolower(string), "[aeoiu]"),
    total_letters = str_count(tolower(string), "\\w"),
    starts_with_vowel = grepl("^[aeiou]", tolower(string)),
    ends_with_vowel = grepl("[aeiou]$", tolower(string))
  )


# number               string nvowels total_letters starts_with_vowel ends_with_vowel
# 1      1 this.is.a.great.word       6            16             FALSE           FALSE
# 2      2            Education       5             9              TRUE           FALSE
# 3      3       Earth.Is.Round       5            12              TRUE           FALSE
# 4      4                Pinky       1             5             FALSE           FALSE

如果您将y视为元音，请将其添加为

nvowels = str_count(tolower(string), "[aeoiuy]")
starts_with_vowel = grepl("^[aeiouy]", tolower(string))
ends_with_vowel = grepl("[aeiouy]$", tolower(string))

Answer 2

library(stringr)
str_count(df$string, "a|e|i|o|u|A|E|I|O|U")
[1] 6 5 5 1

str_count(df$string, paste0(c(letters,LETTERS), collapse = "|"))
[1] 16  9 12  5

ifelse(substr(df$string, 1, 1) %in% c("a", "e", "i", "o", "u", "A", "E", "I", "O", "U"), 1, 0)
[1] 0 1 1 0

ifelse(substr(df$string, nchar(df$string), nchar(df$string)) %in% c("a", "e", "i", "o", "u", "A", "E", "I", "O", "U"), 1, 0)
[1] 0 0 0 0

计算元音的问题，检查以单词中的元音开头或结尾

2 个答案: