用第二个数据帧更新一个数据帧

时间:2019-03-19 00:47:05

标签: r

我有一个空数据框,它代表选择城市的时间跨度和三个测试结果:

cities <- c('Boston', 'Chicago', 'Denver', 'HOuston', 'LosAngeles', 'Miami', 'NewYork', 'WashingtonDC')
years <- 2014:2018
df <- expand.grid(Year=years, City=cities, TestA=0, TestB=0, TestC=0)
df[with(df, order(Year, City)),]
head(df,12)

#    Year    City TestA TestB TestC
#1  2014  Boston     0     0     0
#2  2015  Boston     0     0     0
#3  2016  Boston     0     0     0
#4  2017  Boston     0     0     0
#5  2018  Boston     0     0     0
#6  2014 Chicago     0     0     0
#7  2015 Chicago     0     0     0
#8  2016 Chicago     0     0     0
#9  2017 Chicago     0     0     0
#10 2018 Chicago     0     0     0
#11 2014  Denver     0     0     0
#12 2015  Denver     0     0     0

我想使用第二个数据帧来更新它,如下所示:

dfUpdate <- data.frame(Year=c(2016, 2015), City=c('Boston', 'Chicago'), 
TestA=c(12.23, 16.01), TestB=c('Joe', 'Sally'), TestC=c(1000, 1500) )
dfUpdate

#  Year    City TestA TestB TestC
#1 2016  Boston 12.23   Joe  1000
#2 2015 Chicago 16.01 Sally  1500

更新后,原始数据框应如下所示:

#    Year    City TestA TestB TestC
# 1  2014  Boston     0     0     0
# 2  2015  Boston     0     0     0
# 3  2016  Boston 12.23   Joe  1000
# 4  2017  Boston     0     0     0
# 5  2018  Boston     0     0     0
# 6  2014 Chicago     0     0     0
# 7  2015 Chicago 16.01 Sally  1500
# 8  2016 Chicago     0     0     0
# 9  2017 Chicago     0     0     0
# 10 2018 Chicago     0     0     0
# ...

更新将始终具有“ df”中的年份和城市值。 由于会有很多年和许多城市,实际的应用将对“ df”进行一万多个观测。 更新数据帧“ dfUpdate”可能有数百个观测值。

我在Stack Overflow上看到了其他一些解决方案,但是它们的不同之处在于数据帧具有单个索引。

3 个答案:

答案 0 :(得分:1)

这样的事情怎么办?

library(tidyverse)
df %>%
    mutate_if(is.factor, as.character) %>%
    gather(k, v, -Year, -City) %>%
    distinct(Year, City, k) %>%
    left_join(dfUpdate %>% mutate_if(is.factor, as.character) %>% gather(k, v, -Year, -City)) %>%
    spread(k, v, fill = 0) %>%
    arrange(City, Year)
#   Year         City TestA TestB TestC
#1  2014       Boston     0     0     0
#2  2015       Boston     0     0     0
#3  2016       Boston 12.23   Joe  1000
#4  2017       Boston     0     0     0
#5  2018       Boston     0     0     0
#6  2014      Chicago     0     0     0
#7  2015      Chicago 16.01 Sally  1500
#8  2016      Chicago     0     0     0
#9  2017      Chicago     0     0     0
#10 2018      Chicago     0     0     0
#11 2014       Denver     0     0     0
#12 2015       Denver     0     0     0
#13 2016       Denver     0     0     0
#14 2017       Denver     0     0     0
#15 2018       Denver     0     0     0
#16 2014      HOuston     0     0     0
#17 2015      HOuston     0     0     0
#18 2016      HOuston     0     0     0
#19 2017      HOuston     0     0     0
#20 2018      HOuston     0     0     0
#21 2014   LosAngeles     0     0     0
#22 2015   LosAngeles     0     0     0
#23 2016   LosAngeles     0     0     0
#24 2017   LosAngeles     0     0     0
#25 2018   LosAngeles     0     0     0
#26 2014        Miami     0     0     0
#27 2015        Miami     0     0     0
#28 2016        Miami     0     0     0
#29 2017        Miami     0     0     0
#30 2018        Miami     0     0     0
#31 2014      NewYork     0     0     0
#32 2015      NewYork     0     0     0
#33 2016      NewYork     0     0     0
#34 2017      NewYork     0     0     0
#35 2018      NewYork     0     0     0
#36 2014 WashingtonDC     0     0     0
#37 2015 WashingtonDC     0     0     0
#38 2016 WashingtonDC     0     0     0
#39 2017 WashingtonDC     0     0     0
#40 2018 WashingtonDC     0     0     0

答案 1 :(得分:1)

在读取数据帧时,请确保使用stringsAsFactors = F,以避免将字符串转换为因数。然后在基本for中使用R循环

for(i in 1:nrow(dfUpdate)) {
  df[df$Year == dfUpdate$Year[i] & df$City == dfUpdate$City[i], -c(1:2)] = dfUpdate[i, -c(1:2)]
}

> df
Year         City TestA TestB TestC
1  2014       Boston  0.00     0     0
2  2015       Boston  0.00     0     0
3  2016       Boston 12.23   Joe  1000
4  2017       Boston  0.00     0     0
5  2018       Boston  0.00     0     0
6  2014      Chicago  0.00     0     0
7  2015      Chicago 16.01 Sally  1500
8  2016      Chicago  0.00     0     0
9  2017      Chicago  0.00     0     0
10 2018      Chicago  0.00     0     0
11 2014       Denver  0.00     0     0
12 2015       Denver  0.00     0     0
13 2016       Denver  0.00     0     0
14 2017       Denver  0.00     0     0
15 2018       Denver  0.00     0     0
16 2014      HOuston  0.00     0     0
17 2015      HOuston  0.00     0     0
18 2016      HOuston  0.00     0     0
19 2017      HOuston  0.00     0     0
20 2018      HOuston  0.00     0     0
21 2014   LosAngeles  0.00     0     0
22 2015   LosAngeles  0.00     0     0
23 2016   LosAngeles  0.00     0     0
24 2017   LosAngeles  0.00     0     0
25 2018   LosAngeles  0.00     0     0
26 2014        Miami  0.00     0     0
27 2015        Miami  0.00     0     0
28 2016        Miami  0.00     0     0
29 2017        Miami  0.00     0     0
30 2018        Miami  0.00     0     0
31 2014      NewYork  0.00     0     0
32 2015      NewYork  0.00     0     0
33 2016      NewYork  0.00     0     0
34 2017      NewYork  0.00     0     0
35 2018      NewYork  0.00     0     0
36 2014 WashingtonDC  0.00     0     0
37 2015 WashingtonDC  0.00     0     0
38 2016 WashingtonDC  0.00     0     0
39 2017 WashingtonDC  0.00     0     0
40 2018 WashingtonDC  0.00     0     0

答案 2 :(得分:-1)

您可以使用this.$template_document = $("#templates").find(".doc_whitebox").prop('outerHTML') this.$template_block = $("#templates").find(".block_container_list").prop('outerHTML') let el = $.parseHTML(this.$template_document) $(el).find(".title_whiteBox_archive").find("span").text(doc_version.title) (...) 包来实现此目的,使列兼容的类,也不要将字符串设置为因素:

dplyr