使用R,如何根据一列以及要选择的列名称从不同的列中选择值?

时间:2019-07-19 10:02:21

标签: r

我需要在数据框中创建一个新列,并从另一列中选择值,每行的名称不同。

例如,我有一个类似的数据框df,它描述了在7天中某人(id 10、12、13..23)发生了什么事情(例如“ a”,“ h”等事件)。一周:

id  day mon tue wed thu fri sat sun
10  wed a   y   b   j   j   b   a
12  wed b   e   h   y   b   h   b
13  tue h   y   j   b   h   j   h
14  thu j   u   b   h   j   b   j
16  thu y   i   h   j   y   h   y
19  fri e   y   j   y   a   j   e
20  sun y   e   y   a   b   y   y
21  mon u   y   a   b   h   a   u
23  mon i   u   b   h   j   b   i

我需要一个新列“ val”,以显示“ day”变量中提到的那一天的值。

因此,像这样:

id  day val mon tue wed thu fri sat sun
10  wed b   a   y   b   j   j   b   a
12  wed h   b   e   h   y   b   h   b
13  tue y   h   y   j   b   h   j   h
14  thu h   j   u   b   h   j   b   j
16  thu j   y   i   h   j   y   h   y
19  fri a   e   y   j   y   a   j   e
20  sun y   y   e   y   a   b   y   y
21  mon u   u   y   a   b   h   a   u
23  mon i   i   u   b   h   j   b   i

我尝试制作一个可以应用于一列的函数以产生一个新列

lookupfunction <- function(x) {
  rownumberofx <- which(x=x)
  dayvalue <- df[rownumberofx,"day"]
  dayvalue
    rownumberofx <- NULL

}
df$val <- lookupfunction(df$day)

我希望学习一段代码以产生“ val”列

2 个答案:

答案 0 :(得分:1)

您可以将子集与索引矩阵一起使用(请参见help("[")

#make sure that factor levels are in the same order as the columns
DF$day <- factor(DF$day, levels = names(DF)[-(1:2)])

#index matrix (does as.integer(DF$day) automatically)
ind <- cbind(seq_len(nrow(DF)), DF$day)
#     [,1] [,2]
#[1,]    1    3
#[2,]    2    3
#[3,]    3    2
#[4,]    4    4
#[5,]    5    4
#[6,]    6    5
#[7,]    7    7
#[8,]    8    1
#[9,]    9    1

#subset
DF[,-(1:2)][ind]
#[1] "b" "h" "y" "h" "j" "a" "y" "u" "i"

答案 1 :(得分:0)

通常以工作日或日期等作为列会使分析更加困难。通常,将数据帧转换为“长”字有帮助。试试:

代码

library(dplyr)
library(tidyr)

df %>% 
  gather(weekday, letter, -id, -day) %>% 
  group_by(id) %>% 
  mutate(val = letter[day == weekday]) %>% 
  spread(weekday, letter)

结果

     id day   val   fri   mon   sat   sun   thu   tue   wed  
  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1    10 wed   b     j     a     b     a     j     y     b    
2    12 wed   h     b     b     h     b     y     e     h    
3    13 tue   y     h     h     j     h     b     y     j    
4    14 thu   h     j     j     b     j     h     u     b    
5    16 thu   j     y     y     h     y     j     i     h    
6    19 fri   a     a     e     j     e     y     y     j    
7    20 sun   y     b     y     y     y     a     e     y    
8    21 mon   u     h     u     a     u     b     y     a    
9    23 mon   i     j     i     b     i     h     u     b   

数据

df <- structure(list(id = c(10L, 12L, 13L, 14L, 16L, 19L, 20L, 21L, 
                            23L), day = c("wed", "wed", "tue", "thu", "thu", "fri", "sun", 
                                          "mon", "mon"), mon = c("a", "b", "h", "j", "y", "e", "y", "u", 
                                                                 "i"), tue = c("y", "e", "y", "u", "i", "y", "e", "y", "u"), wed = c("b", 
                                                                                                                                     "h", "j", "b", "h", "j", "y", "a", "b"), thu = c("j", "y", "b", 
                                                                                                                                                                                      "h", "j", "y", "a", "b", "h"), fri = c("j", "b", "h", "j", "y", 
                                                                                                                                                                                                                             "a", "b", "h", "j"), sat = c("b", "h", "j", "b", "h", "j", "y", 
                                                                                                                                                                                                                                                          "a", "b"), sun = c("a", "b", "h", "j", "y", "e", "y", "u", "i"
                                                                                                                                                                                                                                                          )), .Names = c("id", "day", "mon", "tue", "wed", "thu", "fri", 
                                                                                                                                                                                                                                                                         "sat", "sun"), row.names = c(NA, -9L), class = c("data.table", 
                                                                                                                                                                                                                                                                                                                          "data.frame"))