Question

我是R的新手，正在开展一个我需要帮助的项目。

我有一个包含一年数据的CSV文件。然而，时间序列中存在一些差距，我需要每隔半小时均匀间隔一次（每天48行，一年365天将在一整年内制作17520行数据）。差距从1个半小时到几天不等。这些丢失的时间戳不存在行。所以，我已经使用了一些其他论坛帖子来帮助我创建一个脚本，将CSV导入R，通过创建行使时间戳列的长度正确，然后将数据与新的时间戳列匹配。 / p>

但是，我有大约3列数据与新时间戳匹配，而我现在这样做的方式非常低效。截至目前，data.frame（newdata4）存在正确的时间戳。然后，我使用missing4 data.frame：

中的原始数据向该帧添加一个新列

newdata4 <- as.data.frame(timestamp_corr)
newdata4$PAR_in_Avg <- missing4$PAR_in_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$PAR_in_Avg[is.na(newdata4$PAR_in_Avg)] <- -9999 # replace NAs with -9999

在此示例中，PAR_in_Avg是原始CSV文件中的一列。这非常有效。但是，为了将所有列都放入newdata4中，我一遍又一遍地重复这些行：

newdata4$PAR_in_Avg <- missing4$PAR_in_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$PAR_in_Avg[is.na(newdata4$PAR_in_Avg)] <- -9999 # replace NAs with -9999
newdata4$PAR_out_Avg <- missing4$PAR_out_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$PAR_out_Avg[is.na(newdata4$PAR_out_Avg)] <- -9999 # replace NAs with -9999
newdata4$Rn_meas_Avg <- missing4$Rn_meas_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$Rn_meas_Avg[is.na(newdata4$Rn_meas_Avg)] <- -9999 # replace NAs with -9999
newdata4$PYRA_CMP3_Avg <- missing4$PYRA_CMP3_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$PYRA_CMP3_Avg[is.na(newdata4$PYRA_CMP3_Avg)] <- -9999 # replace NAs with -9999
newdata4$G_1_Avg <- missing4$G_1_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$G_1_Avg[is.na(newdata4$G_1_Avg)] <- -9999 # replace NAs with -9999
newdata4$G_2_Avg <- missing4$G_2_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$G_2_Avg[is.na(newdata4$G_2_Avg)] <- -9999 # replace NAs with -9999
newdata4$G_3_Avg <- missing4$G_3_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$G_3_Avg[is.na(newdata4$G_3_Avg)] <- -9999 # replace NAs with -9999
newdata4$G_4_Avg <- missing4$G_4_Avg[pmatch(newdata4$timestamp_corr, missing4$timestamp)] # add data where there was an original timestamp
newdata4$G_4_Avg[is.na(newdata4$G_4_Avg)] <- -9999 # replace NAs with -9999

这是不可持续的，因为我必须与其他网站和其他年份（每个具有不同的列标题）这样做。理想情况下，我希望R读取此CSV文件的第一行以确定有多少列，然后在构建新时间序列后使用pmatch将每个列添加回来。

我能够合并newdata4 data.frame和原始的missing4 data.frame，但这样做会删除刚为差距创建的所有行。

是否有一些简单的方法将数据重新组合在一起并不需要重复？

Answer 1

尝试

newdat <- data.frame(timestamp=with(dat, seq(min(timestamp),
                     max(timestamp), by='30 min')))

dat1 <- merge(dat, newdat, by='timestamp', all=TRUE)
indx <- setdiff(colnames(dat1), 'timestamp')
dat1[indx][is.na(dat1[indx])] <- -9999
head(dat1)

数据

set.seed(42)
dat <- data.frame(timestamp= sort(sample(seq(as.POSIXct('1996-01-01'),
    length.out=50, by='30 min'),30, replace=FALSE)), value1=rnorm(30),
    value2=runif(30))

重建填充了空白的data.frame

1 个答案:

数据