我正在尝试使用dplyr
和str_match
来提取元音周围的字符。当我尝试下面的代码时,Error in mutate_impl(.data, dots) :
Column `near_vowel` must be length 150 (the number of rows) or one, not 450
函数会抛出错误:
library(tidyverse)
library(magrittr)
library(stringr)
iris %>%
select(Species) %>%
mutate(name_length = str_length(Species),
near_vowel = str_match(Species, "(.)[aeiou](.)"))
最小示例代码:
library(dplyr)
df <- read.table(text = "
s_MC13_B2_Cd.Ni s_MC13_B3_Cd.Ni s_MC13_B4_Cd.Ni GENE_ID
9.854759 10.216916 9.722329 GENE:JGI_V11_100009
7.863938 8.075640 7.894878 GENE:JGI_V11_100009
9.448034 9.177245 9.053654 GENE:JGI_V11_100036
9.333245 9.208673 9.159947 GENE:JGI_V11_100036
9.360540 9.374757 9.273236 GENE:JGI_V11_100036
8.983222 9.023339 9.112987 GENE:JGI_V11_100044
", header = TRUE, stringsAsFactors = FALSE)
df %>%
group_by(GENE_ID) %>%
summarise_if(is.numeric, mean)
# # A tibble: 3 x 4
# GENE_ID s_MC13_B2_Cd.Ni s_MC13_B3_Cd.Ni s_MC13_B4_Cd.Ni
# <chr> <dbl> <dbl> <dbl>
# 1 GENE:JGI_V11_100009 8.86 9.15 8.81
# 2 GENE:JGI_V11_100036 9.38 9.25 9.16
# 3 GENE:JGI_V11_100044 8.98 9.02 9.11
我希望,例如“virginica”,它会提取“vir”,“gin”,“nic”。
答案 0 :(得分:1)
您需要解决的问题有很多,但是,根据您在问题中提供的内容,我会提供整洁的方法。
主要问题是您为near_vowel
每行返回多个值,我们可以通过嵌套结果来解决这个问题。其次,你需要rowwise
处理你的变异是明智的......第三(正如@Psidom所说)你的正则表达式将不会产生你想要的输出。解决前两个问题,这是你问题的核心......
library(dplyr)
library(stringr)
df <- iris %>%
select(Species) %>%
mutate(
name_length = str_length(Species),
near_vowel = str_extract_all(Species, "[^aeiou][aeiou][^aeiou]")
)
head(df)
# Species name_length near_vowel
# 1 setosa 6 set
# 2 setosa 6 set
# 3 setosa 6 set
# 4 setosa 6 set
# 5 setosa 6 set
# 6 setosa 6 set
head(df[df$Species == "virginica", ]$near_vowel)
# [[1]]
# [1] "vir" "gin"
#
# [[2]]
# [1] "vir" "gin"
#
# [[3]]
# [1] "vir" "gin"
#
# [[4]]
# [1] "vir" "gin"
#
# [[5]]
# [1] "vir" "gin"
#
# [[6]]
# [1] "vir" "gin"
编辑:更新了@neilfws提供的
str_extract_all
方法,这样可以放弃rowwise
操作。