通过在R中合并多个data.frame来重建新data.frame的任何方法?

时间:2018-04-04 17:39:12

标签: r dataframe dplyr

我有多个data.frame,其中每个都有相同的气象站'坐标但包含不同年份的温度观测值。但是,我打算构建新的data.frame,其中station'坐标将保持不变,但相应的年度温度列将从原始多个data.frame以编程方式添加。也许使用dplyr包可能有所帮助,但我有一些问题要连接YearAnnual_Temp列并以编程方式构造新列。因为我有35个data.frames,其中每个都有相同的IDlonglat,但Annual_Temp彼此不同。我需要通过合并data.frame来构建干净的表格数据。我怎样才能在R中实现这一点?有没有办法使用dplyr来完成这项工作?有什么想法吗?

例如,这里是前三个data.frame:

的头部
> multiple_DF

$air_temp.1980
      Year         ID long   lat Annual_Temp
34090 1980 6.25_51.75 6.25 51.75   10.709091
34091 1980 6.25_51.25 6.25 51.25   10.581818
34092 1980 6.25_50.75 6.25 50.75    9.500000
34224 1980 6.75_51.75 6.75 51.75   10.354545
34225 1980 6.75_51.25 6.75 51.25   10.636364
34226 1980 6.75_50.75 6.75 50.75    9.872727

$air_temp.1981
       Year         ID long   lat Annual_Temp
119884 1981 6.25_51.75 6.25 51.75   10.727273
119885 1981 6.25_51.25 6.25 51.25   10.563636
119886 1981 6.25_50.75 6.25 50.75    9.654545
120018 1981 6.75_51.75 6.75 51.75   10.409091
120019 1981 6.75_51.25 6.75 51.25   10.654545
120020 1981 6.75_50.75 6.75 50.75    9.954545

$air_temp.1982
       Year         ID long   lat Annual_Temp
205678 1982 6.25_51.75 6.25 51.75    11.80909
205679 1982 6.25_51.25 6.25 51.25    11.58182
205680 1982 6.25_50.75 6.25 50.75    10.61818
205812 1982 6.75_51.75 6.75 51.75    11.44545
205813 1982 6.75_51.25 6.75 51.25    11.73636
205814 1982 6.75_50.75 6.75 50.75    10.85455

所需输出(更新)

我想生成新的data.frame,其中Annual_Temp将添加为必须连接Annual_TempYear的新列。这是我想要的data.frame:

      ID long   lat Ann_temp_1980 Ann_temp_1981 Ann_temp_1982
1 6.25_51.75 6.25 51.75     10.709091     10.727273        11.80909
2 6.25_51.25 6.25 51.25     10.581818     10.563636        11.58182
3 6.25_50.75 6.25 50.75      9.500000      9.654545        10.61818
4 6.75_51.75 6.75 51.75     10.354545     10.409091        11.44545
5 6.75_51.25 6.75 51.25     10.636364     10.654545        11.73636
6 6.75_50.75 6.75 50.75      9.872727      9.954545        10.85455

如何在R中以编程方式实现这一目标?任何的想法?

重新制作示例数据:

multiple_DF = structure(list(air_temp.1980 = structure(list(Year = c(1980L, 
1980L, 1980L, 1980L, 1980L, 1980L), ID = c("6.25_51.75", "6.25_51.25", 
"6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"), long = c(6.25, 
6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75, 51.25, 50.75, 51.75, 
51.25, 50.75), Annual_Temp = c(10.709091, 10.581818, 9.5, 10.354545, 
10.636364, 9.872727)), .Names = c("Year", "ID", "long", "lat", 
"Annual_Temp"), row.names = c(NA, -6L), class = "data.frame"), 
    air_temp.1981 = structure(list(Year = c(1981L, 1981L, 1981L, 
    1981L, 1981L, 1981L), ID = c("6.25_51.75", "6.25_51.25", 
    "6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"), 
        long = c(6.25, 6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75, 
        51.25, 50.75, 51.75, 51.25, 50.75), Annual_Temp = c(10.727273, 
        10.563636, 9.654545, 10.409091, 10.654545, 9.954545)), .Names = c("Year", 
    "ID", "long", "lat", "Annual_Temp"), row.names = c(NA, -6L
    ), class = "data.frame"), air_temp.1982 = structure(list(
        Year = c(1982L, 1982L, 1982L, 1982L, 1982L, 1982L), ID = c("6.25_51.75", 
        "6.25_51.25", "6.25_50.75", "6.75_51.75", "6.75_51.25", 
        "6.75_50.75"), long = c(6.25, 6.25, 6.25, 6.75, 6.75, 
        6.75), lat = c(51.75, 51.25, 50.75, 51.75, 51.25, 50.75
        ), Annual_Temp = c(11.80909, 11.58182, 10.61818, 11.44545, 
        11.73636, 10.85455)), .Names = c("Year", "ID", "long", 
    "lat", "Annual_Temp"), row.names = c(NA, -6L), class = "data.frame")), .Names = c("air_temp.1980", 
"air_temp.1981", "air_temp.1982"))

2 个答案:

答案 0 :(得分:4)

首先,以长格式组合表格:

library(data.table)
L = lapply(multiple_DF, data.table)

bigDT = rbindlist(L, id="src")

              src Year         ID long   lat Annual_Temp
 1: air_temp.1980 1980 6.25_51.75 6.25 51.75   10.709091
 2: air_temp.1980 1980 6.25_51.25 6.25 51.25   10.581818
 3: air_temp.1980 1980 6.25_50.75 6.25 50.75    9.500000
 4: air_temp.1980 1980 6.75_51.75 6.75 51.75   10.354545
 5: air_temp.1980 1980 6.75_51.25 6.75 51.25   10.636364
 6: air_temp.1980 1980 6.75_50.75 6.75 50.75    9.872727
 7: air_temp.1981 1981 6.25_51.75 6.25 51.75   10.727273
 8: air_temp.1981 1981 6.25_51.25 6.25 51.25   10.563636
 9: air_temp.1981 1981 6.25_50.75 6.25 50.75    9.654545
10: air_temp.1981 1981 6.75_51.75 6.75 51.75   10.409091
11: air_temp.1981 1981 6.75_51.25 6.75 51.25   10.654545
12: air_temp.1981 1981 6.75_50.75 6.75 50.75    9.954545
13: air_temp.1982 1982 6.25_51.75 6.25 51.75   11.809090
14: air_temp.1982 1982 6.25_51.25 6.25 51.25   11.581820
15: air_temp.1982 1982 6.25_50.75 6.25 50.75   10.618180
16: air_temp.1982 1982 6.75_51.75 6.75 51.75   11.445450
17: air_temp.1982 1982 6.75_51.25 6.75 51.25   11.736360
18: air_temp.1982 1982 6.75_50.75 6.75 50.75   10.854550

然后有点"正常化"将数据分成多个表:

ID_attr = unique(bigDT[, c("ID", "lat", "long")])

           ID   lat long
1: 6.25_51.75 51.75 6.25
2: 6.25_51.25 51.25 6.25
3: 6.25_50.75 50.75 6.25
4: 6.75_51.75 51.75 6.75
5: 6.75_51.25 51.25 6.75
6: 6.75_50.75 50.75 6.75

meas_data = bigDT[, c("Year", "ID", "Annual_Temp")]

    Year         ID Annual_Temp
 1: 1980 6.25_51.75   10.709091
 2: 1980 6.25_51.25   10.581818
 3: 1980 6.25_50.75    9.500000
 4: 1980 6.75_51.75   10.354545
 5: 1980 6.75_51.25   10.636364
 6: 1980 6.75_50.75    9.872727
 7: 1981 6.25_51.75   10.727273
 8: 1981 6.25_51.25   10.563636
 9: 1981 6.25_50.75    9.654545
10: 1981 6.75_51.75   10.409091
11: 1981 6.75_51.25   10.654545
12: 1981 6.75_50.75    9.954545
13: 1982 6.25_51.75   11.809090
14: 1982 6.25_51.25   11.581820
15: 1982 6.25_50.75   10.618180
16: 1982 6.75_51.75   11.445450
17: 1982 6.75_51.25   11.736360
18: 1982 6.75_50.75   10.854550

我认为这种格式比OP请求的宽格式更容易使用(其中年份嵌入字符串列名称中)。 Hadley Wickham的tidy data paper可能是一个有用的参考。

要在dplyr中执行此操作,请使用bind_rows代替rbindlist;或只是基础R中的do.call(rbind, L)

答案 1 :(得分:1)

正如弗兰克指出的那样,使用可重现的数据会更容易,但我认为以下内容可行:

library(tidyverse)
DF<-do.call("rbind", multiple_DF)
DF$Year<-paste0("Ann_temp_",DF$Year)
DF_final<-spread(DF,Year,Annual_Temp)