在向量之后为数据帧中的每个重复集合对数据帧列进行排序

时间:2018-02-13 15:38:27

标签: r

我有以下数据框:

col1 <- 1:10
col2 <- rep(c("COL","CIP","CHL","GEN","TMP"), 2)
col3 <- rep(c("spec1", "spec2"), each = 5)
df <- data.frame(col1, col2, col3, stringsAsFactors = F)

我想在“order_vector”之后为col3中的每个“spec”排序col2的顺序。我尝试了以下内容,但它只适用于其中一个“规范”,因为另一个已从数据框中删除:

library(dplyr)
order_vector <- c("CHL","GEN","COL","CIP","TMP")

df <- df %>%
  slice(match(order_vector, col2))

返回以下数据框:

col1   col2   col3
3      CHL    spec1
4      GEN    spec1
1      COL    spec1
2      CIP    spec1
5      TMP    spec1

但是,我希望这适用于col3中的所有因子值,最好使用dplyr。

3 个答案:

答案 0 :(得分:1)

如果您将col2设置为order_vector级别的因素,则可以按其排序。

library(dplyr)
df %>% mutate_at("col2",factor,levels=order_vector) %>%
  arrange(col3,col2) %>%
  mutate_at("col2",as.character) # if you want to go back to characters, but maybe you shouldn't

# col1 col2  col3
# 1     3  CHL spec1
# 2     4  GEN spec1
# 3     1  COL spec1
# 4     2  CIP spec1
# 5     5  TMP spec1
# 6     8  CHL spec2
# 7     9  GEN spec2
# 8     6  COL spec2
# 9     7  CIP spec2
# 10   10  TMP spec2

或者更简单,受到CPak回答的启发:

df %>% arrange(col3,factor(col2,levels=order_vector))

您还可以使用dplyr加入保留顺序的事实:

df %>%
  right_join(data.frame(col2=order_vector)) %>%
  arrange(col3)

#    col1 col2  col3
# 1     3  CHL spec1
# 2     4  GEN spec1
# 3     1  COL spec1
# 4     2  CIP spec1
# 5     5  TMP spec1
# 6     8  CHL spec2
# 7     9  GEN spec2
# 8     6  COL spec2
# 9     7  CIP spec2
# 10   10  TMP spec2

答案 1 :(得分:1)

您可以使用[OnBefore(MethodToBeExecutedBefore)] public void MethodExecutedNormally() { //method code }

forcats::fct_relevel

答案 2 :(得分:0)

没有col2因素的选项是在group_by来电之前添加match声明:

library(dplyr)
col1 <- 1:10
col2 <- rep(c("COL","CIP","CHL","GEN","TMP"), 2)
col3 <- rep(c("spec1", "spec2"), each = 5)
df <- data.frame(col1, col2, col3, stringsAsFactors = F)
order_vector <- c("CHL","GEN","COL","CIP","TMP")
df <- df %>%
  group_by(col3) %>% 
  slice(match(order_vector, col2))
df
# A tibble: 10 x 3
# Groups:   col3 [2]
    col1 col2  col3 
   <int> <chr> <chr>
 1     3 CHL   spec1
 2     4 GEN   spec1
 3     1 COL   spec1
 4     2 CIP   spec1
 5     5 TMP   spec1
 6     8 CHL   spec2
 7     9 GEN   spec2
 8     6 COL   spec2
 9     7 CIP   spec2
10    10 TMP   spec2

根据col3的唯一值的多少或col2的行数转换为一个因子并返回到一个字符向量,其中一个或多个可能在计算上更有效率,我想。