Question

我有一个值的数据集，其中包含多列（针对不同的站点）和行（针对不同的天数），我尝试使用R对每天进行排名。我希望排名为每列（站点）的数据从一天内的网站总数（所以基于每一行的排名）。可以在Excel中完成，但显然需要很长时间。以下是我试图实现的[小得多]的例子：

date - site1 - site2 - site3 - site4
1/1/00 - 24 - 33 - 10 - 13
2/1/00 - 13 - 25 - 6 - 2
~~ leading to:
date - site1 - site2 - site3 - site4
1/1/00 - 2 - 1 - 4 - 3
2/1/00 - 2 - 1 - 3 - 4

希望有一些简单的命令，非常感谢！

Answer 1

您可以使用rank来提供数据的排名。

# your data
mydf <- read.table(text="date - site1 - site2 - site3 - site4
1/1/00 - 24 - 33 - 10 - 13
2/1/00 - 13 - 25 - 6 - 2", sep="-", header=TRUE)

# find ranks
t(apply(-mydf[-1], 1, rank))

# add to your dates
mydf.rank <- cbind(mydf[1], t(apply(-mydf[-1], 1, rank)))

关于代码

mydf[-1] # removes the first column

-mydf[-1] #using the `-` negates the values -so the rank goes in decreasing order

带有MARGIN = 1的

apply查找跨行的排名

t转换矩阵以根据需要提供输出

Answer 2

这是一种整洁的方式。

重塑为长格式，排序（排列），分组和传播。唯一棘手的部分是知道排序组意味着你自动对它们进行排名（升序或降序）。函数row_number承认这一点。

library(tidyverse)
library(lubridate)

# Data   
df <- tribble(
  ~date,    ~site1,   ~site2,    ~site3,    ~site4,
  mdy("1/1/2000"),   24,       33,        10,          13,
  mdy("2/1/2000"),   13,       25,         6,           2
) 

df %>% 
  gather(site, days, -date) %>%       #< Make Tidy
  arrange(date, desc(days)) %>%       #< Sort relevant columns
  group_by(date) %>% 
  mutate(ranking = row_number()) %>%  #< Ranking function
  select(-days) %>%                   #< Remove unneeded column. Worth keeping in tidy format!
  spread(site, ranking)

#> # A tibble: 2 x 5
#> # Groups:   date [2]
#>   date       site1 site2 site3 site4
#>   <date>     <int> <int> <int> <int>
#> 1 2000-01-01     2     1     4     3
#> 2 2000-02-01     2     1     3     4

Created on 2018-03-06 by the reprex package (v0.2.0).

对R中的行进行排名

2 个答案: