r中的组内排名

时间:2017-10-21 01:53:14

标签: r dplyr

 > str(b)
'data.frame':   2720 obs. of  3 variables:
$ State        : chr  "AL" "AL" "AL" "AL" ...
$ Hospital.Name: chr  "SOUTHEAST ALABAMA MEDICAL CENTER" "MARSHALL MEDICAL 
CENTER SOUTH" "ELIZA COFFEE MEMORIAL HOSPITAL" "ST VINCENT'S EAST" ...
$ heart attack : num  14.3 18.5 18.1 17.7 18 15.9 19.6 17.3 17.8 17.5 ...

以上是我的数据框。我希望按州分组并在每个组内进行心脏病发作,所以我的代码如下:

c <- group_by(b,State) %>%
    mutate(rank = order(order('heart attack')))

但是我得到了一个结果,其中rank列中的所有值都等于1:

> c
# A tibble: 2,720 x 4
# Groups:   State [54]
   State                    Hospital.Name `heart attack`  rank
   <chr>                            <chr>          <dbl> <int>
 1    AL SOUTHEAST ALABAMA MEDICAL CENTER           14.3     1
 2    AL    MARSHALL MEDICAL CENTER SOUTH           18.5     1
 3    AL   ELIZA COFFEE MEMORIAL HOSPITAL           18.1     1
 4    AL                ST VINCENT'S EAST           17.7     1
 5    AL   DEKALB REGIONAL MEDICAL CENTER           18.0     1
 6    AL    SHELBY BAPTIST MEDICAL CENTER           15.9     1
 7    AL   HELEN KELLER MEMORIAL HOSPITAL           19.6     1
 8    AL              DALE MEDICAL CENTER           17.3     1
 9    AL     BAPTIST MEDICAL CENTER SOUTH           17.8     1
10    AL    JACKSON HOSPITAL & CLINIC INC           17.5     1
# ... with 2,710 more rows

任何人都可以帮我弄清楚它为什么不起作用?

1 个答案:

答案 0 :(得分:3)

来自alistaire的评论很好,我经常发现在这样的链式dplyr命令上逐行执行是非常有用的调试。我使用iris作为样本数据集:

library(dplyr)

temp <- iris %>%
  group_by(Species) %>%
  arrange(Sepal.Length) %>%
  mutate(rank = order(Sepal.Length))

返回

R> head(temp)
# A tibble: 6 x 6
# Groups:   Species [1]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  rank
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr> <int>
1          4.3         3.0          1.1         0.1  setosa     1
2          4.4         2.9          1.4         0.2  setosa     2
3          4.4         3.0          1.3         0.2  setosa     3
4          4.4         3.2          1.3         0.2  setosa     4
5          4.5         2.3          1.3         0.3  setosa     5
6          4.6         3.1          1.5         0.2  setosa     6

您还可以使用R:

中的rank()功能
temp2 <- iris %>%
group_by(Species) %>%
mutate(rank = rank(Sepal.Length))

R> head(temp2)
# A tibble: 6 x 6
# Groups:   Species [1]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  rank
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr> <dbl>
1          5.1         3.5          1.4         0.2  setosa  32.5
2          4.9         3.0          1.4         0.2  setosa  18.5
3          4.7         3.2          1.3         0.2  setosa  10.5
4          4.6         3.1          1.5         0.2  setosa   7.5
5          5.0         3.6          1.4         0.2  setosa  24.5
6          5.4         3.9          1.7         0.4  setosa  43.0