> str(b)
'data.frame': 2720 obs. of 3 variables:
$ State : chr "AL" "AL" "AL" "AL" ...
$ Hospital.Name: chr "SOUTHEAST ALABAMA MEDICAL CENTER" "MARSHALL MEDICAL
CENTER SOUTH" "ELIZA COFFEE MEMORIAL HOSPITAL" "ST VINCENT'S EAST" ...
$ heart attack : num 14.3 18.5 18.1 17.7 18 15.9 19.6 17.3 17.8 17.5 ...
以上是我的数据框。我希望按州分组并在每个组内进行心脏病发作,所以我的代码如下:
c <- group_by(b,State) %>%
mutate(rank = order(order('heart attack')))
但是我得到了一个结果,其中rank列中的所有值都等于1:
> c
# A tibble: 2,720 x 4
# Groups: State [54]
State Hospital.Name `heart attack` rank
<chr> <chr> <dbl> <int>
1 AL SOUTHEAST ALABAMA MEDICAL CENTER 14.3 1
2 AL MARSHALL MEDICAL CENTER SOUTH 18.5 1
3 AL ELIZA COFFEE MEMORIAL HOSPITAL 18.1 1
4 AL ST VINCENT'S EAST 17.7 1
5 AL DEKALB REGIONAL MEDICAL CENTER 18.0 1
6 AL SHELBY BAPTIST MEDICAL CENTER 15.9 1
7 AL HELEN KELLER MEMORIAL HOSPITAL 19.6 1
8 AL DALE MEDICAL CENTER 17.3 1
9 AL BAPTIST MEDICAL CENTER SOUTH 17.8 1
10 AL JACKSON HOSPITAL & CLINIC INC 17.5 1
# ... with 2,710 more rows
任何人都可以帮我弄清楚它为什么不起作用?
答案 0 :(得分:3)
来自alistaire的评论很好,我经常发现在这样的链式dplyr命令上逐行执行是非常有用的调试。我使用iris作为样本数据集:
library(dplyr)
temp <- iris %>%
group_by(Species) %>%
arrange(Sepal.Length) %>%
mutate(rank = order(Sepal.Length))
返回
R> head(temp)
# A tibble: 6 x 6
# Groups: Species [1]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
<dbl> <dbl> <dbl> <dbl> <fctr> <int>
1 4.3 3.0 1.1 0.1 setosa 1
2 4.4 2.9 1.4 0.2 setosa 2
3 4.4 3.0 1.3 0.2 setosa 3
4 4.4 3.2 1.3 0.2 setosa 4
5 4.5 2.3 1.3 0.3 setosa 5
6 4.6 3.1 1.5 0.2 setosa 6
您还可以使用R:
中的rank()
功能
temp2 <- iris %>%
group_by(Species) %>%
mutate(rank = rank(Sepal.Length))
R> head(temp2)
# A tibble: 6 x 6
# Groups: Species [1]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
<dbl> <dbl> <dbl> <dbl> <fctr> <dbl>
1 5.1 3.5 1.4 0.2 setosa 32.5
2 4.9 3.0 1.4 0.2 setosa 18.5
3 4.7 3.2 1.3 0.2 setosa 10.5
4 4.6 3.1 1.5 0.2 setosa 7.5
5 5.0 3.6 1.4 0.2 setosa 24.5
6 5.4 3.9 1.7 0.4 setosa 43.0