(链接的Vlookup线程没有回答这个问题)
我正在寻找一种方法来替换一个数据帧(DF2)中的值与来自另一个(DF1)的值,其中DF2包含重复的条目,但我想保留这些重复项。
作为一个例子:
假设我有2个数据帧。其中一个名为DF1,包含不同日期酒店遮阳伞的正确数字。
我们在5月20日,5月25日,6月1日以及相关联的保护伞上提供了希尔顿_A的订单项。 与Hilton_B和Hilton_C相同。
这是DF1的参数,参考数据帧:
structure(list(Date = structure(c(15852, 15859, 15852, 15859,
15852, 15859, 15852), class = "Date"), Hotel = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 4L), .Label = c("Hilton_A", "Hilton_B", "Hilton_C",
"Hilton_D"), class = "factor"), Umbrellas = c(9340L, 6401L, 9089L,
7716L, 5542L, 5565L, 8158L), datename = c("2013-05-27_Hilton_A",
"2013-06-03_Hilton_A", "2013-05-27_Hilton_B", "2013-06-03_Hilton_B",
"2013-05-27_Hilton_C", "2013-06-03_Hilton_C", "2013-05-27_Hilton_D"
)), .Names = c("Date", "Hotel", "Umbrellas", "datename"), row.names = c(NA,
-7L), class = "data.frame")
DF2包含不同日期的许多其他酒店的信息,以及DF1中Hiltons的信息。问题是,DF2中的伞#对于希尔顿来说是错误的,我需要用DF1中的#替换它们。
这是DF2的输入,包括不正确的希尔顿号码,以及其他一些我不想触及的数据:
structure(list(Date = structure(c(15845, 15852, 15859, 15852,
15859, 15845, 15859, 15845, 15845, 15852, 15845, 15845, 15882
), class = "Date"), Hotel = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("Hilton_A", "Hilton_B",
"Hilton_C", "Hilton_D", "RedRoof_A", "RedRoof_D", "Sheraton_D"
), class = "factor"), Umbrellas = c(263L, 287L, 258L, 110L, 234L,
212L, 265L, 542L, 81L, 51L, 162L, 232L, 493L), datename = c("2013-05-20_Hilton_A",
"2013-05-27_Hilton_A", "2013-06-03_Hilton_A", "2013-05-27_Hilton_A",
"2013-06-03_Hilton_A", "2013-05-20_Hilton_B", "2013-06-03_Hilton_B",
"2013-05-20_Hilton_B", "2013-05-20_Hilton_C", "2013-05-27_Hilton_D",
"2013-05-20_RedRoof_A", "2013-05-20_RedRoof_D", "2013-06-26_Sheraton_D"
)), .Names = c("Date", "Hotel", "Umbrellas", "datename"), row.names = c(NA,
-13L), class = "data.frame")
通常这会起作用:
DF2$Umbrellas<- replace(DF2$Umbrellas, DF2$datename%in% DF1$datename, DF1$Umbrellas)
(其中“datename”只是酒店和日期的串联,因为同一家酒店有多个日期的信息(因此我们可以“唯一= ify”列表))
但DF2实际上对我想保留的每个酒店和日期有多个观察结果(即5/27的Hilton_A在DF2中显示2次)。
所以当我尝试将DF中的UAG#替换为DF2时,我收到错误消息:
Warning message:
In replace(DF2$Umbrellas, DF2$hoteldatename %in% DF1$hoteldatename , :
number of items to replace is not a multiple of replacement length
这些数字都错了。
有人知道这里发生了什么,以及我如何获取DF1中的数字来替换DF2中所有适用的观测值?
答案 0 :(得分:1)
df3$Umbrellas<-df1$Umbrellas[match(df2$datename,df1$datename)]
> df3
Date Hotel Umbrellas datename
1 2013-05-20 Hilton_A NA 2013-05-20_Hilton_A
2 2013-05-27 Hilton_A 9340 2013-05-27_Hilton_A
3 2013-06-03 Hilton_A 6401 2013-06-03_Hilton_A
4 2013-05-27 Hilton_A 9340 2013-05-27_Hilton_A
5 2013-06-03 Hilton_A 6401 2013-06-03_Hilton_A
6 2013-05-20 Hilton_B NA 2013-05-20_Hilton_B
7 2013-06-03 Hilton_B 7716 2013-06-03_Hilton_B
8 2013-05-20 Hilton_B NA 2013-05-20_Hilton_B
9 2013-05-20 Hilton_C NA 2013-05-20_Hilton_C
10 2013-05-27 Hilton_D 8158 2013-05-27_Hilton_D
11 2013-05-20 RedRoof_A NA 2013-05-20_RedRoof_A
12 2013-05-20 RedRoof_D NA 2013-05-20_RedRoof_D
13 2013-06-26 Sheraton_D NA 2013-06-26_Sheraton_D
df3$Umbrellas<-ifelse(is.na(df3$Umbrellas),df2$Umbrellas,df3$Umbrellas)
> df3
Date Hotel Umbrellas datename
1 2013-05-20 Hilton_A 263 2013-05-20_Hilton_A
2 2013-05-27 Hilton_A 9340 2013-05-27_Hilton_A
3 2013-06-03 Hilton_A 6401 2013-06-03_Hilton_A
4 2013-05-27 Hilton_A 9340 2013-05-27_Hilton_A
5 2013-06-03 Hilton_A 6401 2013-06-03_Hilton_A
6 2013-05-20 Hilton_B 212 2013-05-20_Hilton_B
7 2013-06-03 Hilton_B 7716 2013-06-03_Hilton_B
8 2013-05-20 Hilton_B 542 2013-05-20_Hilton_B
9 2013-05-20 Hilton_C 81 2013-05-20_Hilton_C
10 2013-05-27 Hilton_D 8158 2013-05-27_Hilton_D
11 2013-05-20 RedRoof_A 162 2013-05-20_RedRoof_A
12 2013-05-20 RedRoof_D 232 2013-05-20_RedRoof_D
13 2013-06-26 Sheraton_D 493 2013-06-26_Sheraton_D