R根据数值和单词对字符串进行分组

时间:2020-05-16 22:21:02

标签: r string sorting numeric

我有一个数据集,其中包含名称(如字符串,年份,顺序和分组名称)。每个名称都有一个与名称关联的值。我需要根据每个分组名称的升序/年份对“名称”进行重新排序,并为该分组中的平均值分配一个值。

我的问题更加复杂,因为有些名字说的是拼写名字而不是使用数字。例如:“ 2020年中期第一轮比赛”与“ 2020年1.05-1.08选秀权”相当。

这是我的数据框示例:

Data <- data.frame(Value = c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,5),
 Name = c("2020 1.01 Draft Pick", "2020 1.04 Draft Pick", "2020 1.02 Draft Pick", "2020 1.03 Draft Pick",
"2020 1.06 Draft Pick","2020 1.04 Draft Pick","2020 1.05 Draft Pick","2020 1.04 Draft Pick",
"2020 Mid 1st Rounder","2020 1.04 Draft Pick","2020 1.08 Draft Pick","2020 1.03 Draft Pick",
"2020 Last Round","2020 1.04 Draft Pick","2020 1.07 Draft Pick","2020 Early 1st Rounder"))

我唯一想做到这一点的操作需要大量手动更改(str_replace(“ 1.05”,“ Mid 1st Rounder”,Names),然后将字符串拆分以重新排序,我知道必须有更好的方法。谢谢!

编辑:使用@Akrun的方法,这是我得到的输出: Image of output using @Akrun's method.  This is very close to what I need, but I want the average grouped by (1.01-1.04&Early 1st Round),(1.05-1.08&Mid 1st Round),(1.09-1.12 &Last 1st Round), (2.01-2.04&Early 2nd Round) , &etc.

我使用的确切代码是:

temp <- Output_table[!(Output_table$`Draft Pick.Name`==""), c('Player.Value', 'Draft Pick.Name')] %>%
  group_by('Player.Value', year = readr::parse_number(as.character(`Draft Pick.Name`))) %>% 
  mutate(averagePergroup = mean(as.numeric(str_replace(`Player.Value`, "^\\d+\\s+([0-9.]+)\\s+.*", "\\1")), na.rm = TRUE))

0 个答案:

没有答案