根据模板多次替换值

时间:2018-03-13 10:15:04

标签: r

为了对长数据进行GROUP VARIABLE,我想将多个值组合成一个新值。

我已经有一个解决方案,但我觉得可以有更好的实施方案。

set.seed(1337)
df <- data.frame(coli = sample(rep(1:6,2)), newi = 0 )

replaceList <- list(oneAndTwo=1:2, threeAndFour=3:4, fiveAndSix=5:6)

数据如下:

> df
   coli newi
1     1    0
2     6    0
3     1    0
4     5    0
5     3    0
6     2    0
7     6    0
8     2    0
9     4    0
10    4    0
11    3    0
12    5    0

查找模板如下所示:

> replaceList
$oneAndTwo
[1] 1 2

$threeAndFour
[1] 3 4

$fiveAndSix
[1] 5 6

期望的结果:

   coli         newi
1     1    oneAndTwo
2     6   fiveAndSix
3     1    oneAndTwo
4     5   fiveAndSix
5     3 threeAndFour
6     2    oneAndTwo
7     6   fiveAndSix
8     2    oneAndTwo
9     4 threeAndFour
10    4 threeAndFour
11    3 threeAndFour
12    5   fiveAndSix 

我的工作尝试

mapply(function(fnd,rplc){IND=df$coli %in% fnd;df$newi[IND]<<-rplc},fnd=replaceList,rplc=names(replaceList))

如果有更好的做法,也就如何设置replaceList我很乐意学习。

你会如何解决/处理这样的问题?

2 个答案:

答案 0 :(得分:5)

我们可以stack list一个键/值数据集(&#39; df2&#39;)然后在大肠杆菌&#39之间进行match ; &#39; df&#39;有&#39;值&#39; &#39; df2&#39;的列获得&#39; ind&#39;的相应索引并将其分配给&#39; newi&#39;

df2 <- stack(replaceList)
df$newi <- df2$ind[match(df$coli, df2$values)]
df
#   coli         newi
#1     4 threeAndFour
#2     3 threeAndFour
#3     6   fiveAndSix
#4     1    oneAndTwo
#5     2    oneAndTwo
#6     1    oneAndTwo
#7     5   fiveAndSix
#8     2    oneAndTwo
#9     4 threeAndFour
#10    6   fiveAndSix
#11    3 threeAndFour
#12    5   fiveAndSix

答案 1 :(得分:3)

创建一个名为的向量而不是replaceList 列表,然后按名称匹配:

set.seed(1337);df <- data.frame(coli = sample(rep(1:6,2)), newi = 0 )

# make a named vector
myLookup <- setNames(c("oneAndTwo","oneAndTwo","threeAndFour","threeAndFour","fiveAndSix","fiveAndSix"),
                   1:6)

# then match by name
df$newi <- myLookup[ df$coli ]

# check
head(df)
#   coli         newi
# 1    1    oneAndTwo
# 2    6   fiveAndSix
# 3    1    oneAndTwo
# 4    5   fiveAndSix
# 5    3 threeAndFour
# 6    2    oneAndTwo

其他( preffered )选项是使用 cut ,并获取 factor 列:

# using cut, no need for lookup
df$newiFactor <- cut(df$coli, c(0, 2, 4, 6))

# check
head(df[order(df$coli), ])
#    coli         newi newiFactor
# 1     1    oneAndTwo      (0,2]
# 3     1    oneAndTwo      (0,2]
# 6     2    oneAndTwo      (0,2]
# 8     2    oneAndTwo      (0,2]
# 5     3 threeAndFour      (2,4]
# 11    3 threeAndFour      (2,4]

注意:我们可以对labels使用cut选项,并获得您想要的命名"oneAndTwo", etc。同样,在这种情况下,我更喜欢看看数字的名字:"(0,2]", etc