我需要在数据框中创建一个新列,并从另一列中选择值,每行的名称不同。
例如,我有一个类似的数据框df,它描述了在7天中某人(id 10、12、13..23)发生了什么事情(例如“ a”,“ h”等事件)。一周:
id day mon tue wed thu fri sat sun
10 wed a y b j j b a
12 wed b e h y b h b
13 tue h y j b h j h
14 thu j u b h j b j
16 thu y i h j y h y
19 fri e y j y a j e
20 sun y e y a b y y
21 mon u y a b h a u
23 mon i u b h j b i
我需要一个新列“ val”,以显示“ day”变量中提到的那一天的值。
因此,像这样:
id day val mon tue wed thu fri sat sun
10 wed b a y b j j b a
12 wed h b e h y b h b
13 tue y h y j b h j h
14 thu h j u b h j b j
16 thu j y i h j y h y
19 fri a e y j y a j e
20 sun y y e y a b y y
21 mon u u y a b h a u
23 mon i i u b h j b i
我尝试制作一个可以应用于一列的函数以产生一个新列
lookupfunction <- function(x) {
rownumberofx <- which(x=x)
dayvalue <- df[rownumberofx,"day"]
dayvalue
rownumberofx <- NULL
}
df$val <- lookupfunction(df$day)
我希望学习一段代码以产生“ val”列
答案 0 :(得分:1)
您可以将子集与索引矩阵一起使用(请参见help("[")
)
#make sure that factor levels are in the same order as the columns
DF$day <- factor(DF$day, levels = names(DF)[-(1:2)])
#index matrix (does as.integer(DF$day) automatically)
ind <- cbind(seq_len(nrow(DF)), DF$day)
# [,1] [,2]
#[1,] 1 3
#[2,] 2 3
#[3,] 3 2
#[4,] 4 4
#[5,] 5 4
#[6,] 6 5
#[7,] 7 7
#[8,] 8 1
#[9,] 9 1
#subset
DF[,-(1:2)][ind]
#[1] "b" "h" "y" "h" "j" "a" "y" "u" "i"
答案 1 :(得分:0)
通常以工作日或日期等作为列会使分析更加困难。通常,将数据帧转换为“长”字有帮助。试试:
代码
library(dplyr)
library(tidyr)
df %>%
gather(weekday, letter, -id, -day) %>%
group_by(id) %>%
mutate(val = letter[day == weekday]) %>%
spread(weekday, letter)
结果
id day val fri mon sat sun thu tue wed
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 wed b j a b a j y b
2 12 wed h b b h b y e h
3 13 tue y h h j h b y j
4 14 thu h j j b j h u b
5 16 thu j y y h y j i h
6 19 fri a a e j e y y j
7 20 sun y b y y y a e y
8 21 mon u h u a u b y a
9 23 mon i j i b i h u b
数据
df <- structure(list(id = c(10L, 12L, 13L, 14L, 16L, 19L, 20L, 21L,
23L), day = c("wed", "wed", "tue", "thu", "thu", "fri", "sun",
"mon", "mon"), mon = c("a", "b", "h", "j", "y", "e", "y", "u",
"i"), tue = c("y", "e", "y", "u", "i", "y", "e", "y", "u"), wed = c("b",
"h", "j", "b", "h", "j", "y", "a", "b"), thu = c("j", "y", "b",
"h", "j", "y", "a", "b", "h"), fri = c("j", "b", "h", "j", "y",
"a", "b", "h", "j"), sat = c("b", "h", "j", "b", "h", "j", "y",
"a", "b"), sun = c("a", "b", "h", "j", "y", "e", "y", "u", "i"
)), .Names = c("id", "day", "mon", "tue", "wed", "thu", "fri",
"sat", "sun"), row.names = c(NA, -9L), class = c("data.table",
"data.frame"))