希望你们能帮助我。我一直在寻找网络,我无法找到答案。 这是我的数据框:
name city state stars main_category
A Pittsburgh PA 5.0 Soul Food
B Houston TX 3.0 Professional Services
C Lafayette IN 3.0 NA
D Los Angeles CA 4.0 Local Services
E Los Angeles CA 3.0 Local Services
F Lafayette IN 3.5 Mongolian
G Pittsburgh PA 5.0 Doctors
H Pittsburgh PA 4.0 Soul Food
I Houston TX 4.0 Professional Services
我想要它做的是通过将城市(按字母顺序)与州分组来输出等级,然后按得到的星数进行排名。这就是我所希望的:
name city state stars main_category rank
I Houston TX 4.0 Professional Services 1
B Houston TX 3.0 Professional Services 2
F Lafayette IN 3.5 Mongolian 1
D Los Angeles CA 4.0 Local Services 1
E Los Angeles CA 3.0 Local Services 2
G Pittsburgh PA 5.0 Doctors 1
A Pittsburgh PA 5.0 Soul Food 1
H Pittsburgh PA 4.0 Soul Food 2
这是我的代码行。
l <- ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))
这并没有删除拉斐特所拥有的NA。而且我不知道该放什么,我也试过na.omit,但是当我尝试这个时,排名列没有显示出来。
答案 0 :(得分:1)
这是一个基础R解决方案。不确定你是否已经开始使用dplyr,但这似乎有效。我认为最后一行应该排名3,因为有两个第一个值排在1
no <- na.omit(dat)
new <- no[do.call(order, with(no, list(city, state, -stars))),]
within(new, {
rank <- Reduce(c, Map(rank, split(-stars, city), ties.method = "min"))
})
# name city state stars main_category rank
# 9 I Houston TX 4.0 Professional Services 1
# 2 B Houston TX 3.0 Professional Services 2
# 6 F Lafayette IN 3.5 Mongolian 1
# 4 D Los Angeles CA 4.0 Local Services 1
# 5 E Los Angeles CA 3.0 Local Services 2
# 1 A Pittsburgh PA 5.0 Soul Food 1
# 7 G Pittsburgh PA 5.0 Doctors 1
# 8 H Pittsburgh PA 4.0 Soul Food 3
答案 1 :(得分:0)
使用dplyr
library(dplyr)
filter(dat, complete.cases(dat)) %>%
group_by(city) %>%
arrange(city, state, desc(stars)) %>%
mutate(rank= min_rank(desc(stars)))
# name city state stars main_category rank
#1 I Houston TX 4.0 Professional Services 1
#2 B Houston TX 3.0 Professional Services 2
#3 F Lafayette IN 3.5 Mongolian 1
#4 D Los Angeles CA 4.0 Local Services 1
#5 E Los Angeles CA 3.0 Local Services 2
#6 A Pittsburgh PA 5.0 Soul Food 1
#7 G Pittsburgh PA 5.0 Doctors 1
#8 H Pittsburgh PA 4.0 Soul Food 3
答案 2 :(得分:0)
na.rm ,ddply会进入 .fun ,在你的情况下是在排名内。
你对NA的态度如下:
ddply(d,c(&#34; city&#34;,&#34;州&#34;,&#34; main_category&#34;), na.rm = T ,变换,排名=排名(-stars,ties.method =&#34; max&#34;))
在 .fun 中传递参数,应该修复它。至少它对我有用:
ddply(d, c("city", "state", "main_category"), transform,
rank=rank(-stars, na.last = TRUE, ties.method="max"))