将多个数据框合并为一列

时间:2017-07-06 15:37:47

标签: r data.table

大家好,我在这里再问你一些问题。

我“n”分隔csv,只有两列温度和日期格式为dmy HM,这些文件来自单个数字温度计,只能存储4个月。 我想阅读所有这些文档,并将它们的可变温度放入一个新的df(Union)中。

为了创建单个文档,我创建了一个名为“date”的df,其日期时间序列大于任何其他csv,以将此“n”个文档与“Date”列结合起来,以粘贴日期时的值是一样的。

我的输入是这样的:

Date<- seq(as.POSIXlt("2017-01-01 00:00:00", tz="UTC"),
     as.POSIXlt("2017-03-01 00:00:00", tz="UTC"), 
     by="60 min")
temp = runif(1417, min = 32, max = 100)
df1 <- data.frame(Date,temp)

Date<- seq(as.POSIXlt("2017-03-01 00:00:00", tz="UTC"),
     as.POSIXlt("2017-06-01 00:00:00", tz="UTC"), 
     by="60 min")
temp = runif(2209, min = 32, max = 100)
df2 <- data.frame(Date,temp)

所以,这是我与df联合制作的重要序列。

Date <- seq(as.POSIXlt("2017-01-01 00:00:00", tz="UTC"),
       as.POSIXlt("2017-07-01 00:00:00", tz="UTC"), 
       by="60 min")
date <- data.frame(Date)

我正在尝试使用库data.table,如下所示:

setDT(date)
setDT(df1)
Union<-df1[date, on="Date"])

这只适用于1 df,但是,我怎样才能从我的2 df自动化多重合并到Union中制作的单个列中。

我希望你能帮助我。 谢谢

2 个答案:

答案 0 :(得分:0)

我将生成一些示例数据,以说明如何使用Reducemerge。我假设您想要一种“宽”格式,每个位置都有一列。

set.seed(123)

# list of 10 data.tables with columns date and temp

ldat <- lapply(1:10, function(x) data.table(date = sample(seq(as.Date('2016/01/01'), as.Date('2016/01/31'), by="day"), 12),
                                            temp = runif(12, min = 32, max = 100)))

# right now, each table in the list has the same column name
# change the 'temp' column name to the location it was collected

loc_vect <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", 
              "Colorado", "Connecticut", "Indiana", "Iowa", "Kansas")

ldat <- lapply(1:10, function(x) setnames(ldat[[x]], c("date", "temp"), c("date", loc_vect[x])))

这是样本数据。现在使用merge将它组合在一起。

# now merge all of them within Reduce

dat <- Reduce(function(x, y) merge(x, 
                                   y, 
                                   all = TRUE, 
                                   by = "date"), 
              ldat, 
              accumulate = FALSE)

输出结果为:

          date  Alabama   Alaska  Arizona Arkansas California Colorado Connecticut  Indiana     Iowa   Kansas
 1: 2016-01-01       NA 47.84632       NA 61.57271         NA       NA          NA       NA       NA       NA
 2: 2016-01-02 34.86005       NA 58.10994       NA         NA       NA          NA       NA       NA 45.44664
 3: 2016-01-03       NA       NA 86.01528 76.41093   42.00244 61.88134    46.82337       NA       NA       NA
 4: 2016-01-04       NA 60.18915 62.49911       NA         NA 98.62789          NA       NA 64.77890       NA
 5: 2016-01-05       NA       NA 87.24249       NA         NA       NA          NA       NA       NA       NA
 6: 2016-01-06       NA       NA       NA 55.35912         NA       NA    68.29078 50.02121 46.70533       NA
 7: 2016-01-07       NA       NA       NA 92.72748         NA 76.86901    46.74871       NA       NA       NA
 8: 2016-01-08       NA 41.71040 74.78704       NA         NA       NA    47.03500 53.24648       NA 68.86146
 9: 2016-01-09 78.07480       NA 77.22783 40.88731   96.44543 67.43723    56.17029 94.09680       NA 88.57107
10: 2016-01-10 99.61034 63.68545       NA       NA         NA 82.12129          NA       NA       NA       NA
11: 2016-01-11 79.11063       NA       NA 92.27990         NA       NA          NA       NA 43.60388 79.48179
12: 2016-01-12 38.99888       NA       NA       NA   69.35136       NA    57.48055 93.32746       NA 86.63246
13: 2016-01-13 92.48867       NA 50.65809       NA   81.00055       NA          NA 89.10421 35.24113 94.26648
14: 2016-01-14 54.29861       NA       NA 98.97707   95.60039 32.71176          NA       NA 51.60027       NA
15: 2016-01-15       NA       NA 87.08438 76.65955   52.48357       NA          NA 70.39215       NA       NA
16: 2016-01-16       NA 53.63631       NA 43.90358         NA 59.84430          NA       NA       NA       NA
17: 2016-01-17       NA 47.75055 61.90855       NA   36.12900       NA          NA 50.02223       NA 59.00633
18: 2016-01-18       NA 41.43881       NA       NA         NA 44.50177          NA       NA 43.70768       NA
19: 2016-01-19       NA       NA 83.30431       NA         NA       NA          NA 80.16374 91.85676 49.73827
20: 2016-01-20       NA       NA       NA       NA   96.87820       NA    56.06551 64.72771       NA       NA
21: 2016-01-21 75.55446 83.57525       NA       NA         NA 77.76394          NA       NA       NA       NA
22: 2016-01-22 96.90625 46.71574 87.39552       NA   71.81287       NA    76.19899 53.86083 77.85759       NA
23: 2016-01-23       NA       NA       NA 38.99480   41.67601       NA          NA       NA       NA       NA
24: 2016-01-24 70.93907       NA       NA       NA         NA       NA          NA 72.41534       NA 74.04788
25: 2016-01-25 93.18810 60.13325       NA 53.78538   59.92690 53.19575          NA       NA       NA 35.97654
26: 2016-01-26 48.73397       NA 38.44916 44.76300         NA 85.46715          NA       NA       NA 61.13266
27: 2016-01-27       NA       NA       NA       NA         NA       NA    70.89160       NA 79.65801       NA
28: 2016-01-28       NA       NA       NA       NA         NA       NA    82.34274       NA 56.75825       NA
29: 2016-01-29       NA 42.36624       NA       NA         NA       NA    66.15637 50.64333 49.20162       NA
30: 2016-01-30       NA 57.08149       NA       NA         NA 87.88277    62.24422       NA       NA       NA
31: 2016-01-31       NA       NA       NA       NA   59.50670       NA          NA       NA 59.37499 42.39633
          date  Alabama   Alaska  Arizona Arkansas California Colorado Connecticut  Indiana     Iowa   Kansas

如果您希望它采用“高”格式(日期,温度只有两列,而位置可能只有三列),则可以通过添加“位置”列并使用rbindlist(dat)来更轻松。但是,从当前表开始,您可以使用melt。

melt(dat, 
     id.vars = "date", 
     variable.name = "Location", 
     value.name = "Temp")[!is.na(Temp)]

导致:

           date Location     Temp
  1: 2016-01-02  Alabama 34.86005
  2: 2016-01-09  Alabama 78.07480
  3: 2016-01-10  Alabama 99.61034
  4: 2016-01-11  Alabama 79.11063
  5: 2016-01-12  Alabama 38.99888
 ---                             
116: 2016-01-19   Kansas 49.73827
117: 2016-01-24   Kansas 74.04788
118: 2016-01-25   Kansas 35.97654
119: 2016-01-26   Kansas 61.13266
120: 2016-01-31   Kansas 42.39633

答案 1 :(得分:0)

如果没有样本数据集和所需的解决方案,我不能完全确定您的要求。您可以使用stack()将列设置在彼此之上,如下所示:

> x <- data.frame(x = seq(1,3), y = seq(11,13), z = seq(21,23))
> x
  x  y  z
1 1 11 21
2 2 12 22
3 3 13 23
> stack(x)
  values ind
1      1   x
2      2   x
3      3   x
4     11   y
5     12   y
6     13   y
7     21   z
8     22   z
9     23   z

我假设x,y,z在您的示例中将是不同的气象站,这是在您创建了一个单个大型表并希望从该主表中获得两个列数据集之后。