为了对长数据进行GROUP VARIABLE,我想将多个值组合成一个新值。
我已经有一个解决方案,但我觉得可以有更好的实施方案。
set.seed(1337)
df <- data.frame(coli = sample(rep(1:6,2)), newi = 0 )
replaceList <- list(oneAndTwo=1:2, threeAndFour=3:4, fiveAndSix=5:6)
> df
coli newi
1 1 0
2 6 0
3 1 0
4 5 0
5 3 0
6 2 0
7 6 0
8 2 0
9 4 0
10 4 0
11 3 0
12 5 0
> replaceList
$oneAndTwo
[1] 1 2
$threeAndFour
[1] 3 4
$fiveAndSix
[1] 5 6
coli newi
1 1 oneAndTwo
2 6 fiveAndSix
3 1 oneAndTwo
4 5 fiveAndSix
5 3 threeAndFour
6 2 oneAndTwo
7 6 fiveAndSix
8 2 oneAndTwo
9 4 threeAndFour
10 4 threeAndFour
11 3 threeAndFour
12 5 fiveAndSix
mapply(function(fnd,rplc){IND=df$coli %in% fnd;df$newi[IND]<<-rplc},fnd=replaceList,rplc=names(replaceList))
如果有更好的做法,也就如何设置replaceList
我很乐意学习。
你会如何解决/处理这样的问题?
答案 0 :(得分:5)
我们可以stack
list
一个键/值数据集(&#39; df2&#39;)然后在大肠杆菌&#39之间进行match
; &#39; df&#39;有&#39;值&#39; &#39; df2&#39;的列获得&#39; ind&#39;的相应索引并将其分配给&#39; newi&#39;
df2 <- stack(replaceList)
df$newi <- df2$ind[match(df$coli, df2$values)]
df
# coli newi
#1 4 threeAndFour
#2 3 threeAndFour
#3 6 fiveAndSix
#4 1 oneAndTwo
#5 2 oneAndTwo
#6 1 oneAndTwo
#7 5 fiveAndSix
#8 2 oneAndTwo
#9 4 threeAndFour
#10 6 fiveAndSix
#11 3 threeAndFour
#12 5 fiveAndSix
答案 1 :(得分:3)
创建一个名为的向量而不是replaceList
列表,然后按名称匹配:
set.seed(1337);df <- data.frame(coli = sample(rep(1:6,2)), newi = 0 )
# make a named vector
myLookup <- setNames(c("oneAndTwo","oneAndTwo","threeAndFour","threeAndFour","fiveAndSix","fiveAndSix"),
1:6)
# then match by name
df$newi <- myLookup[ df$coli ]
# check
head(df)
# coli newi
# 1 1 oneAndTwo
# 2 6 fiveAndSix
# 3 1 oneAndTwo
# 4 5 fiveAndSix
# 5 3 threeAndFour
# 6 2 oneAndTwo
其他( preffered )选项是使用 cut ,并获取 factor 列:
# using cut, no need for lookup
df$newiFactor <- cut(df$coli, c(0, 2, 4, 6))
# check
head(df[order(df$coli), ])
# coli newi newiFactor
# 1 1 oneAndTwo (0,2]
# 3 1 oneAndTwo (0,2]
# 6 2 oneAndTwo (0,2]
# 8 2 oneAndTwo (0,2]
# 5 3 threeAndFour (2,4]
# 11 3 threeAndFour (2,4]
注意:我们可以对labels
使用cut
选项,并获得您想要的命名"oneAndTwo", etc
。同样,在这种情况下,我更喜欢看看数字的名字:"(0,2]", etc
。