计算元音的问题,检查以单词中的元音开头或结尾

时间:2018-06-12 21:41:11

标签: r regex dplyr

考虑以下代码来计算每个单词中字母'a'的出现次数:

data <- data.frame(number=1:4, string=c("this.is.a.great.word", "Education", "Earth.Is.Round", "Pinky), stringsAsFactors = F)

library(stringr)

data$Count_of_a <- str_count(data$string, "a")

data

这会产生这样的结果:

  number               string Count_of_a
1      1 this.is.a.great.word          2
2      2            Education          1
3      3       Earth.Is.Round          1
4      4       Pinky                   0

我试图做更多的事情:

  1. 计算每个单词中元音的总数
  2. 总数没有。每个单词中的字母
  3. 单词是否以元音开头,然后是1,否则为0
  4. 单词是否以元音结尾,然后是1,否则为0
  5. 问题是如果我使用nchar(数据$ string),它也会计算点'。' 我也找不到上述4项要求的帮助。

    最终数据我想看起来像这样:

    number    string                 starts_with_vowel   ends_with_vowel   TotalLtrs
    1         this.is.a.great.word          0                 0             16
    2         Education                     1                 0             9
    3         Earth.Is.Round                1                 0             12
    4         Pinky                         0                 1             5
    

2 个答案:

答案 0 :(得分:2)

您想要一组正则表达式

library(tidyverse)
data %>%
  mutate(
    nvowels = str_count(tolower(string), "[aeoiu]"),
    total_letters = str_count(tolower(string), "\\w"),
    starts_with_vowel = grepl("^[aeiou]", tolower(string)),
    ends_with_vowel = grepl("[aeiou]$", tolower(string))
  )


# number               string nvowels total_letters starts_with_vowel ends_with_vowel
# 1      1 this.is.a.great.word       6            16             FALSE           FALSE
# 2      2            Education       5             9              TRUE           FALSE
# 3      3       Earth.Is.Round       5            12              TRUE           FALSE
# 4      4                Pinky       1             5             FALSE           FALSE

如果您将y视为元音,请将其添加为

nvowels = str_count(tolower(string), "[aeoiuy]")
starts_with_vowel = grepl("^[aeiouy]", tolower(string))
ends_with_vowel = grepl("[aeiouy]$", tolower(string))

答案 1 :(得分:1)

library(stringr)
str_count(df$string, "a|e|i|o|u|A|E|I|O|U")
[1] 6 5 5 1

str_count(df$string, paste0(c(letters,LETTERS), collapse = "|"))
[1] 16  9 12  5

ifelse(substr(df$string, 1, 1) %in% c("a", "e", "i", "o", "u", "A", "E", "I", "O", "U"), 1, 0)
[1] 0 1 1 0

ifelse(substr(df$string, nchar(df$string), nchar(df$string)) %in% c("a", "e", "i", "o", "u", "A", "E", "I", "O", "U"), 1, 0)
[1] 0 0 0 0