df的示例:
df <- tibble(name = LETTERS[1:10],
x = rnorm(10, mean = 10),
y = rnorm(10, 10),
z = rnorm(10, 10))
我想先对x
的排名列进行突变,然后对x
和y
列的总和,然后x
和y
和{{1 }},其中较大的数字排名1,然后最小的数字排名10。
从z
开始,我可以做类似的事情:
x
哪个计算df %<>% mutate(rank_01 = min_rank(-x))
的排名列,但是我不确定计算后面的列的最佳过程是什么。我想以某种方式利用向量化的优势,但是我的编程技巧在这里受到限制。
在我的真实df中,我要使用的cols总数大于50,因此自动处理是理想的选择!
预期输出:
x
答案 0 :(得分:2)
set_of_configurations=itertools.islice(itertools.product(np.arange(0.0, 1.1, 0.1), repeat=30),0,10000)
您可能还想将列名设置为cbind(df, apply(-apply(df[, -1], 1, cumsum), 1, rank))
# name x y z x y z
# 1 A 10.049312 10.424365 9.286644 5 4 5
# 2 B 10.010068 10.996667 8.754025 6 1 4
# 3 C 9.813097 9.493180 10.651993 9 7 3
# 4 D 10.702742 9.657496 9.838946 3 5 2
# 5 E 9.936206 9.047051 8.938002 7 10 10
# 6 F 9.833105 9.205973 10.627177 8 9 6
# 7 G 11.310733 9.262942 8.931759 2 3 7
# 8 H 11.316306 8.576866 12.390953 1 6 1
# 9 I 9.044812 10.251189 9.606649 10 8 9
# 10 J 10.495743 10.174724 8.458670 4 2 8
,rank_x
等,有关详细信息,请参见Cumulatively paste (concatenate) values grouped by another variable。例如,
_rank_xy
答案 1 :(得分:1)
使用tidyverse
和reshape2
的另一种方法:
df %>%
gather(var, val, -name) %>%
arrange(name) %>%
group_by(name) %>%
mutate(temp = cumsum(val)) %>%
ungroup() %>%
dcast(name~var, value.var = "temp") %>%
mutate_at(vars(-name), funs(rank = dense_rank(desc(.)))) %>%
select(matches("(_rank)|(name)")) %>%
left_join(df, by = c("name" = "name"))
name x_rank y_rank z_rank x y z
1 A 1 3 9 11.668095 9.645292 6.977697
2 B 3 1 1 11.085743 12.395033 9.130904
3 C 4 4 3 10.557528 10.551010 9.586108
4 D 10 8 2 8.363167 11.248786 11.989218
5 E 6 7 6 9.728462 10.049470 9.921010
6 F 2 5 7 11.091799 9.544451 8.516171
7 G 7 6 4 9.686247 10.657889 9.713129
8 H 8 10 10 9.317976 8.514533 9.098976
9 I 5 2 5 10.052081 11.469185 8.425983
10 J 9 9 8 9.290704 9.778239 9.331685
或者如果您想要指示累积的列名称:
df %>%
gather(var, val, -name) %>%
arrange(name) %>%
group_by(name) %>%
mutate(temp = cumsum(val),
var = paste0(Reduce(paste0, var, accumulate = TRUE))) %>%
ungroup() %>%
dcast(name~var, value.var = "temp") %>%
mutate_at(vars(-name), funs(rank = dense_rank(desc(.)))) %>%
select(matches("(_rank)|(name)")) %>%
left_join(df, by = c("name" = "name"))
name x_rank xy_rank xyz_rank x y z
1 A 1 3 9 11.668095 9.645292 6.977697
2 B 3 1 1 11.085743 12.395033 9.130904
3 C 4 4 3 10.557528 10.551010 9.586108
4 D 10 8 2 8.363167 11.248786 11.989218
5 E 6 7 6 9.728462 10.049470 9.921010
6 F 2 5 7 11.091799 9.544451 8.516171
7 G 7 6 4 9.686247 10.657889 9.713129
8 H 8 10 10 9.317976 8.514533 9.098976
9 I 5 2 5 10.052081 11.469185 8.425983
10 J 9 9 8 9.290704 9.778239 9.331685
答案 2 :(得分:1)
使用tidyverse
library(tidyverse)
pmap(df[,-1], ~ cumsum(c(...)) %>%
as.tibble) %>%
bind_cols %>%
pmap(., ~ -c(...) %>%
rank%>%
as.tibble) %>%
bind_cols(df, .) %>%
rename_at(vars(matches("value")), ~ paste0("rank", sprintf("_%02d", 1:3)))