如何通过逐渐增加数据的顺序组合来突变新列?

时间:2018-10-27 17:12:18

标签: r dplyr apply

df的示例:

df <- tibble(name = LETTERS[1:10],
              x = rnorm(10, mean = 10), 
              y = rnorm(10, 10), 
              z = rnorm(10, 10))

我想先对x的排名列进行突变,然后对xy列的总和,然后xy和{{1 }},其中较大的数字排名1,然后最小的数字排名10。

z开始,我可以做类似的事情:

x

哪个计算df %<>% mutate(rank_01 = min_rank(-x)) 的排名列,但是我不确定计算后面的列的最佳过程是什么。我想以某种方式利用向量化的优势,但是我的编程技巧在这里受到限制。

在我的真实df中,我要使用的cols总数大于50,因此自动处理是理想的选择!

预期输出:

x

3 个答案:

答案 0 :(得分:2)

set_of_configurations=itertools.islice(itertools.product(np.arange(0.0, 1.1, 0.1), repeat=30),0,10000)

您可能还想将列名设置为cbind(df, apply(-apply(df[, -1], 1, cumsum), 1, rank)) # name x y z x y z # 1 A 10.049312 10.424365 9.286644 5 4 5 # 2 B 10.010068 10.996667 8.754025 6 1 4 # 3 C 9.813097 9.493180 10.651993 9 7 3 # 4 D 10.702742 9.657496 9.838946 3 5 2 # 5 E 9.936206 9.047051 8.938002 7 10 10 # 6 F 9.833105 9.205973 10.627177 8 9 6 # 7 G 11.310733 9.262942 8.931759 2 3 7 # 8 H 11.316306 8.576866 12.390953 1 6 1 # 9 I 9.044812 10.251189 9.606649 10 8 9 # 10 J 10.495743 10.174724 8.458670 4 2 8 rank_x等,有关详细信息,请参见Cumulatively paste (concatenate) values grouped by another variable。例如,

_rank_xy

答案 1 :(得分:1)

使用tidyversereshape2的另一种方法:

df %>% 
  gather(var, val, -name) %>% 
  arrange(name) %>% 
  group_by(name) %>% 
  mutate(temp = cumsum(val)) %>% 
  ungroup() %>%
  dcast(name~var, value.var = "temp") %>%
  mutate_at(vars(-name), funs(rank = dense_rank(desc(.)))) %>%
  select(matches("(_rank)|(name)")) %>%
  left_join(df, by = c("name" = "name"))

   name x_rank y_rank z_rank         x         y         z
1     A      1      3      9 11.668095  9.645292  6.977697
2     B      3      1      1 11.085743 12.395033  9.130904
3     C      4      4      3 10.557528 10.551010  9.586108
4     D     10      8      2  8.363167 11.248786 11.989218
5     E      6      7      6  9.728462 10.049470  9.921010
6     F      2      5      7 11.091799  9.544451  8.516171
7     G      7      6      4  9.686247 10.657889  9.713129
8     H      8     10     10  9.317976  8.514533  9.098976
9     I      5      2      5 10.052081 11.469185  8.425983
10    J      9      9      8  9.290704  9.778239  9.331685

或者如果您想要指示累积的列名称:

df %>% 
  gather(var, val, -name) %>% 
  arrange(name) %>% 
  group_by(name) %>% 
  mutate(temp = cumsum(val),
         var = paste0(Reduce(paste0, var, accumulate = TRUE))) %>% 
  ungroup() %>%
  dcast(name~var, value.var = "temp") %>%
  mutate_at(vars(-name), funs(rank = dense_rank(desc(.)))) %>%
  select(matches("(_rank)|(name)")) %>%
  left_join(df, by = c("name" = "name"))

   name x_rank xy_rank xyz_rank         x         y         z
1     A      1       3        9 11.668095  9.645292  6.977697
2     B      3       1        1 11.085743 12.395033  9.130904
3     C      4       4        3 10.557528 10.551010  9.586108
4     D     10       8        2  8.363167 11.248786 11.989218
5     E      6       7        6  9.728462 10.049470  9.921010
6     F      2       5        7 11.091799  9.544451  8.516171
7     G      7       6        4  9.686247 10.657889  9.713129
8     H      8      10       10  9.317976  8.514533  9.098976
9     I      5       2        5 10.052081 11.469185  8.425983
10    J      9       9        8  9.290704  9.778239  9.331685

答案 2 :(得分:1)

使用tidyverse

的另一种方法
library(tidyverse)
pmap(df[,-1], ~ cumsum(c(...)) %>%
          as.tibble) %>% 
          bind_cols %>% 
          pmap(., ~ -c(...) %>% 
                rank%>% 
                as.tibble) %>%
     bind_cols(df, .) %>% 
     rename_at(vars(matches("value")), ~ paste0("rank", sprintf("_%02d", 1:3)))