我有多个data.frame
,其中每个都有相同的气象站'坐标但包含不同年份的温度观测值。但是,我打算构建新的data.frame,其中station'坐标将保持不变,但相应的年度温度列将从原始多个data.frame以编程方式添加。也许使用dplyr
包可能有所帮助,但我有一些问题要连接Year
和Annual_Temp
列并以编程方式构造新列。因为我有35个data.frames,其中每个都有相同的ID
,long
,lat
,但Annual_Temp
彼此不同。我需要通过合并data.frame来构建干净的表格数据。我怎样才能在R中实现这一点?有没有办法使用dplyr
来完成这项工作?有什么想法吗?
例如,这里是前三个data.frame:
的头部> multiple_DF
$air_temp.1980
Year ID long lat Annual_Temp
34090 1980 6.25_51.75 6.25 51.75 10.709091
34091 1980 6.25_51.25 6.25 51.25 10.581818
34092 1980 6.25_50.75 6.25 50.75 9.500000
34224 1980 6.75_51.75 6.75 51.75 10.354545
34225 1980 6.75_51.25 6.75 51.25 10.636364
34226 1980 6.75_50.75 6.75 50.75 9.872727
$air_temp.1981
Year ID long lat Annual_Temp
119884 1981 6.25_51.75 6.25 51.75 10.727273
119885 1981 6.25_51.25 6.25 51.25 10.563636
119886 1981 6.25_50.75 6.25 50.75 9.654545
120018 1981 6.75_51.75 6.75 51.75 10.409091
120019 1981 6.75_51.25 6.75 51.25 10.654545
120020 1981 6.75_50.75 6.75 50.75 9.954545
$air_temp.1982
Year ID long lat Annual_Temp
205678 1982 6.25_51.75 6.25 51.75 11.80909
205679 1982 6.25_51.25 6.25 51.25 11.58182
205680 1982 6.25_50.75 6.25 50.75 10.61818
205812 1982 6.75_51.75 6.75 51.75 11.44545
205813 1982 6.75_51.25 6.75 51.25 11.73636
205814 1982 6.75_50.75 6.75 50.75 10.85455
所需输出(更新):
我想生成新的data.frame,其中Annual_Temp
将添加为必须连接Annual_Temp
和Year
的新列。这是我想要的data.frame:
ID long lat Ann_temp_1980 Ann_temp_1981 Ann_temp_1982
1 6.25_51.75 6.25 51.75 10.709091 10.727273 11.80909
2 6.25_51.25 6.25 51.25 10.581818 10.563636 11.58182
3 6.25_50.75 6.25 50.75 9.500000 9.654545 10.61818
4 6.75_51.75 6.75 51.75 10.354545 10.409091 11.44545
5 6.75_51.25 6.75 51.25 10.636364 10.654545 11.73636
6 6.75_50.75 6.75 50.75 9.872727 9.954545 10.85455
如何在R中以编程方式实现这一目标?任何的想法?
重新制作示例数据:
multiple_DF = structure(list(air_temp.1980 = structure(list(Year = c(1980L,
1980L, 1980L, 1980L, 1980L, 1980L), ID = c("6.25_51.75", "6.25_51.25",
"6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"), long = c(6.25,
6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75, 51.25, 50.75, 51.75,
51.25, 50.75), Annual_Temp = c(10.709091, 10.581818, 9.5, 10.354545,
10.636364, 9.872727)), .Names = c("Year", "ID", "long", "lat",
"Annual_Temp"), row.names = c(NA, -6L), class = "data.frame"),
air_temp.1981 = structure(list(Year = c(1981L, 1981L, 1981L,
1981L, 1981L, 1981L), ID = c("6.25_51.75", "6.25_51.25",
"6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"),
long = c(6.25, 6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75,
51.25, 50.75, 51.75, 51.25, 50.75), Annual_Temp = c(10.727273,
10.563636, 9.654545, 10.409091, 10.654545, 9.954545)), .Names = c("Year",
"ID", "long", "lat", "Annual_Temp"), row.names = c(NA, -6L
), class = "data.frame"), air_temp.1982 = structure(list(
Year = c(1982L, 1982L, 1982L, 1982L, 1982L, 1982L), ID = c("6.25_51.75",
"6.25_51.25", "6.25_50.75", "6.75_51.75", "6.75_51.25",
"6.75_50.75"), long = c(6.25, 6.25, 6.25, 6.75, 6.75,
6.75), lat = c(51.75, 51.25, 50.75, 51.75, 51.25, 50.75
), Annual_Temp = c(11.80909, 11.58182, 10.61818, 11.44545,
11.73636, 10.85455)), .Names = c("Year", "ID", "long",
"lat", "Annual_Temp"), row.names = c(NA, -6L), class = "data.frame")), .Names = c("air_temp.1980",
"air_temp.1981", "air_temp.1982"))
答案 0 :(得分:4)
首先,以长格式组合表格:
library(data.table)
L = lapply(multiple_DF, data.table)
bigDT = rbindlist(L, id="src")
src Year ID long lat Annual_Temp
1: air_temp.1980 1980 6.25_51.75 6.25 51.75 10.709091
2: air_temp.1980 1980 6.25_51.25 6.25 51.25 10.581818
3: air_temp.1980 1980 6.25_50.75 6.25 50.75 9.500000
4: air_temp.1980 1980 6.75_51.75 6.75 51.75 10.354545
5: air_temp.1980 1980 6.75_51.25 6.75 51.25 10.636364
6: air_temp.1980 1980 6.75_50.75 6.75 50.75 9.872727
7: air_temp.1981 1981 6.25_51.75 6.25 51.75 10.727273
8: air_temp.1981 1981 6.25_51.25 6.25 51.25 10.563636
9: air_temp.1981 1981 6.25_50.75 6.25 50.75 9.654545
10: air_temp.1981 1981 6.75_51.75 6.75 51.75 10.409091
11: air_temp.1981 1981 6.75_51.25 6.75 51.25 10.654545
12: air_temp.1981 1981 6.75_50.75 6.75 50.75 9.954545
13: air_temp.1982 1982 6.25_51.75 6.25 51.75 11.809090
14: air_temp.1982 1982 6.25_51.25 6.25 51.25 11.581820
15: air_temp.1982 1982 6.25_50.75 6.25 50.75 10.618180
16: air_temp.1982 1982 6.75_51.75 6.75 51.75 11.445450
17: air_temp.1982 1982 6.75_51.25 6.75 51.25 11.736360
18: air_temp.1982 1982 6.75_50.75 6.75 50.75 10.854550
然后有点"正常化"将数据分成多个表:
ID_attr = unique(bigDT[, c("ID", "lat", "long")])
ID lat long
1: 6.25_51.75 51.75 6.25
2: 6.25_51.25 51.25 6.25
3: 6.25_50.75 50.75 6.25
4: 6.75_51.75 51.75 6.75
5: 6.75_51.25 51.25 6.75
6: 6.75_50.75 50.75 6.75
meas_data = bigDT[, c("Year", "ID", "Annual_Temp")]
Year ID Annual_Temp
1: 1980 6.25_51.75 10.709091
2: 1980 6.25_51.25 10.581818
3: 1980 6.25_50.75 9.500000
4: 1980 6.75_51.75 10.354545
5: 1980 6.75_51.25 10.636364
6: 1980 6.75_50.75 9.872727
7: 1981 6.25_51.75 10.727273
8: 1981 6.25_51.25 10.563636
9: 1981 6.25_50.75 9.654545
10: 1981 6.75_51.75 10.409091
11: 1981 6.75_51.25 10.654545
12: 1981 6.75_50.75 9.954545
13: 1982 6.25_51.75 11.809090
14: 1982 6.25_51.25 11.581820
15: 1982 6.25_50.75 10.618180
16: 1982 6.75_51.75 11.445450
17: 1982 6.75_51.25 11.736360
18: 1982 6.75_50.75 10.854550
我认为这种格式比OP请求的宽格式更容易使用(其中年份嵌入字符串列名称中)。 Hadley Wickham的tidy data paper可能是一个有用的参考。
要在dplyr中执行此操作,请使用bind_rows
代替rbindlist
;或只是基础R中的do.call(rbind, L)
。
答案 1 :(得分:1)
正如弗兰克指出的那样,使用可重现的数据会更容易,但我认为以下内容可行:
library(tidyverse)
DF<-do.call("rbind", multiple_DF)
DF$Year<-paste0("Ann_temp_",DF$Year)
DF_final<-spread(DF,Year,Annual_Temp)