Question

我有一个数据框，其中包含一列ID和许多列，每个ID的值不同，其中许多是NA。例如：

     ID  w  x  y  z
1 User1  3  2 NA NA
2 User2  7  9 NA  4
3 User3 NA NA  1 NA
4 User4  3 NA NA  5

是否有一种方法来获取列标题的每个ID的列表，从最小值到最大值并删除NA？

例如：

用户1：x，w

用户2：z，w，x

User3：y

到目前为止，我对此一无所知。我试图像这样通过by_row获取行的顺序：

ordered <- by_row(moves.df, function(order) list(order[,2:ncol(moves.df)]), .collate = "list")$.out

但是它的输出只是每行单观察数据帧的列表，没有以任何方式进行排序。

ordered2 <- moves.df %>% rowwise() %>% mutate(placelist = list(rank(moves.df[,2:ncol(moves.df)])))

这给了我一列列表，但是列表中包含我不认识的数字。

任何帮助将不胜感激！

Answer 1

我们可以逐行使用apply（margin = 1），对值进行排序并获得列names

apply(df[-1], 1, function(x) names(sort(x)))

#$`1`
#[1] "x" "w"

#$`2`
#[1] "z" "w" "x"

#$`3`
#[1] "y"

#$`4`
#[1] "w" "z"

Answer 2

这里是使用tidyverse的选项。我们gather将数据集转换为“长”格式，arrange按“ ID”，“ val”，按“ ID”分组，paste将“键”的元素组合在一起

library(tidyverse)
df %>% 
  gather(key, val, -ID, na.rm = TRUE) %>% 
  arrange(ID, val) %>%
  group_by(ID) %>% 
  summarise(key = toString(key))
# A tibble: 4 x 2
#  ID    key    
#  <chr> <chr>  
#1 User1 x, w   
#2 User2 z, w, x
#3 User3 y      
#4 User4 w, z

或者，如果我们需要list输出作为'key'，则将list包装在summarise

中

df %>% 
   gather(key, val, -ID, na.rm = TRUE) %>%
   arrange(ID, val) %>% 
   group_by(ID) %>% 
   summarise(key = list(key))

数据

df <- structure(list(ID = c("User1", "User2", "User3", "User4"), w = c(3L, 
7L, NA, 3L), x = c(2L, 9L, NA, NA), y = c(NA, NA, 1L, NA), z = c(NA, 
4L, NA, 5L)), .Names = c("ID", "w", "x", "y", "z"),
  class = "data.frame", row.names = c("1", 
"2", "3", "4"))

为数据框中的每一行按升序获取列名列表

2 个答案:

数据