大家好,我在这里再问你一些问题。
我“n”分隔csv,只有两列温度和日期格式为dmy HM,这些文件来自单个数字温度计,只能存储4个月。 我想阅读所有这些文档,并将它们的可变温度放入一个新的df(Union)中。
为了创建单个文档,我创建了一个名为“date”的df,其日期时间序列大于任何其他csv,以将此“n”个文档与“Date”列结合起来,以粘贴日期时的值是一样的。
我的输入是这样的:
Date<- seq(as.POSIXlt("2017-01-01 00:00:00", tz="UTC"),
as.POSIXlt("2017-03-01 00:00:00", tz="UTC"),
by="60 min")
temp = runif(1417, min = 32, max = 100)
df1 <- data.frame(Date,temp)
Date<- seq(as.POSIXlt("2017-03-01 00:00:00", tz="UTC"),
as.POSIXlt("2017-06-01 00:00:00", tz="UTC"),
by="60 min")
temp = runif(2209, min = 32, max = 100)
df2 <- data.frame(Date,temp)
所以,这是我与df联合制作的重要序列。
Date <- seq(as.POSIXlt("2017-01-01 00:00:00", tz="UTC"),
as.POSIXlt("2017-07-01 00:00:00", tz="UTC"),
by="60 min")
date <- data.frame(Date)
我正在尝试使用库data.table
,如下所示:
setDT(date)
setDT(df1)
Union<-df1[date, on="Date"])
这只适用于1 df,但是,我怎样才能从我的2 df自动化多重合并到Union中制作的单个列中。
我希望你能帮助我。 谢谢
答案 0 :(得分:0)
我将生成一些示例数据,以说明如何使用Reduce
和merge
。我假设您想要一种“宽”格式,每个位置都有一列。
set.seed(123)
# list of 10 data.tables with columns date and temp
ldat <- lapply(1:10, function(x) data.table(date = sample(seq(as.Date('2016/01/01'), as.Date('2016/01/31'), by="day"), 12),
temp = runif(12, min = 32, max = 100)))
# right now, each table in the list has the same column name
# change the 'temp' column name to the location it was collected
loc_vect <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Indiana", "Iowa", "Kansas")
ldat <- lapply(1:10, function(x) setnames(ldat[[x]], c("date", "temp"), c("date", loc_vect[x])))
这是样本数据。现在使用merge将它组合在一起。
# now merge all of them within Reduce
dat <- Reduce(function(x, y) merge(x,
y,
all = TRUE,
by = "date"),
ldat,
accumulate = FALSE)
输出结果为:
date Alabama Alaska Arizona Arkansas California Colorado Connecticut Indiana Iowa Kansas
1: 2016-01-01 NA 47.84632 NA 61.57271 NA NA NA NA NA NA
2: 2016-01-02 34.86005 NA 58.10994 NA NA NA NA NA NA 45.44664
3: 2016-01-03 NA NA 86.01528 76.41093 42.00244 61.88134 46.82337 NA NA NA
4: 2016-01-04 NA 60.18915 62.49911 NA NA 98.62789 NA NA 64.77890 NA
5: 2016-01-05 NA NA 87.24249 NA NA NA NA NA NA NA
6: 2016-01-06 NA NA NA 55.35912 NA NA 68.29078 50.02121 46.70533 NA
7: 2016-01-07 NA NA NA 92.72748 NA 76.86901 46.74871 NA NA NA
8: 2016-01-08 NA 41.71040 74.78704 NA NA NA 47.03500 53.24648 NA 68.86146
9: 2016-01-09 78.07480 NA 77.22783 40.88731 96.44543 67.43723 56.17029 94.09680 NA 88.57107
10: 2016-01-10 99.61034 63.68545 NA NA NA 82.12129 NA NA NA NA
11: 2016-01-11 79.11063 NA NA 92.27990 NA NA NA NA 43.60388 79.48179
12: 2016-01-12 38.99888 NA NA NA 69.35136 NA 57.48055 93.32746 NA 86.63246
13: 2016-01-13 92.48867 NA 50.65809 NA 81.00055 NA NA 89.10421 35.24113 94.26648
14: 2016-01-14 54.29861 NA NA 98.97707 95.60039 32.71176 NA NA 51.60027 NA
15: 2016-01-15 NA NA 87.08438 76.65955 52.48357 NA NA 70.39215 NA NA
16: 2016-01-16 NA 53.63631 NA 43.90358 NA 59.84430 NA NA NA NA
17: 2016-01-17 NA 47.75055 61.90855 NA 36.12900 NA NA 50.02223 NA 59.00633
18: 2016-01-18 NA 41.43881 NA NA NA 44.50177 NA NA 43.70768 NA
19: 2016-01-19 NA NA 83.30431 NA NA NA NA 80.16374 91.85676 49.73827
20: 2016-01-20 NA NA NA NA 96.87820 NA 56.06551 64.72771 NA NA
21: 2016-01-21 75.55446 83.57525 NA NA NA 77.76394 NA NA NA NA
22: 2016-01-22 96.90625 46.71574 87.39552 NA 71.81287 NA 76.19899 53.86083 77.85759 NA
23: 2016-01-23 NA NA NA 38.99480 41.67601 NA NA NA NA NA
24: 2016-01-24 70.93907 NA NA NA NA NA NA 72.41534 NA 74.04788
25: 2016-01-25 93.18810 60.13325 NA 53.78538 59.92690 53.19575 NA NA NA 35.97654
26: 2016-01-26 48.73397 NA 38.44916 44.76300 NA 85.46715 NA NA NA 61.13266
27: 2016-01-27 NA NA NA NA NA NA 70.89160 NA 79.65801 NA
28: 2016-01-28 NA NA NA NA NA NA 82.34274 NA 56.75825 NA
29: 2016-01-29 NA 42.36624 NA NA NA NA 66.15637 50.64333 49.20162 NA
30: 2016-01-30 NA 57.08149 NA NA NA 87.88277 62.24422 NA NA NA
31: 2016-01-31 NA NA NA NA 59.50670 NA NA NA 59.37499 42.39633
date Alabama Alaska Arizona Arkansas California Colorado Connecticut Indiana Iowa Kansas
如果您希望它采用“高”格式(日期,温度只有两列,而位置可能只有三列),则可以通过添加“位置”列并使用rbindlist(dat)
来更轻松。但是,从当前表开始,您可以使用melt。
melt(dat,
id.vars = "date",
variable.name = "Location",
value.name = "Temp")[!is.na(Temp)]
导致:
date Location Temp
1: 2016-01-02 Alabama 34.86005
2: 2016-01-09 Alabama 78.07480
3: 2016-01-10 Alabama 99.61034
4: 2016-01-11 Alabama 79.11063
5: 2016-01-12 Alabama 38.99888
---
116: 2016-01-19 Kansas 49.73827
117: 2016-01-24 Kansas 74.04788
118: 2016-01-25 Kansas 35.97654
119: 2016-01-26 Kansas 61.13266
120: 2016-01-31 Kansas 42.39633
答案 1 :(得分:0)
如果没有样本数据集和所需的解决方案,我不能完全确定您的要求。您可以使用stack()
将列设置在彼此之上,如下所示:
> x <- data.frame(x = seq(1,3), y = seq(11,13), z = seq(21,23))
> x
x y z
1 1 11 21
2 2 12 22
3 3 13 23
> stack(x)
values ind
1 1 x
2 2 x
3 3 x
4 11 y
5 12 y
6 13 y
7 21 z
8 22 z
9 23 z
我假设x,y,z在您的示例中将是不同的气象站,这是在您创建了一个单个大型表并希望从该主表中获得两个列数据集之后。