我有以下数据集。
name - old - new - datetime
1051 38656 38400 2016-01-24 03:22:37
1051 5888 5632 2016-01-24 04:03:28
1051 5632 38144 2016-01-24 04:34:22
1051 5120 4864 2016-01-24 03:56:33
1051 37376 37632 2016-01-25 08:08:16
.. .. .. ..
我想插入此数据集:
name - old - new - datetime
1051 ? ? 2016-01-24 03:20:00
1051 ? ? 2016-01-24 03:30:00
1051 ? ? 2016-01-24 03:40:00
1051 ? ? 2016-01-24 03:50:00
1051 ? ? 2016-01-24 04:00:00
1051 ? ? 2016-01-25 04:10:00
.. .. .. ..
我复杂了一个数据集。所以,我想插值以获得更干净的数据集。我试过这个:
data.frame(datetime = seq(roomsdatetime$datetime[1], roomsdatetime$datetime[nrow(roomsdatetime)], by = "10 min")) %>%
mutate(roomsdatetime, approx = na.approx(roomsdatetime$old_value))
我收到此错误:
错误:结果大小错误(3562),预期3565或1
还有其他方法吗?
答案 0 :(得分:0)
在Excel中,在leftest-top中输入以下内容:
[“差异”(在(有序)日期时间与基数24.01.2016 03:20:00之间的秒数,其值被指定为0)列是通过公式“=(B3- $ B $ 2)*获得的86400" ]
name datetime difference old new
1051 24.01.2016 03:20:00 0 NA NA
1051 24.01.2016 03:22:37 157 38656 38400
1051 24.01.2016 03:30:00 600 NA NA
1051 24.01.2016 03:40:00 1200 NA NA
1051 24.01.2016 03:50:00 1800 NA NA
1051 24.01.2016 03:56:33 2193 5120 4864
1051 24.01.2016 04:00:00 2400 NA NA
1051 24.01.2016 04:03:28 2608 5888 5632
1051 24.01.2016 04:34:22 4462 5632 38144
1051 25.01.2016 04:10:00 89400 NA NA
1051 25.01.2016 08:08:16 103696 37376 37632
然后,文件 - 另存为 - [FileName:seymaalaca.csv;键入:“CSV(逗号分隔)(* .csv)”]
mydataframe <- read.csv("C:/Users/User/Documents/Revolution/seymaalaca.csv", header=TRUE, sep=",", stringsAsFactors = FALSE)
mydataframe # results in:
name datetime difference old new
1 1051 24.01.2016 03:20:00 0 NA NA
2 1051 24.01.2016 03:22:37 157 38656 38400
3 1051 24.01.2016 03:30:00 600 NA NA
4 1051 24.01.2016 03:40:00 1200 NA NA
5 1051 24.01.2016 03:50:00 1800 NA NA
6 1051 24.01.2016 03:56:33 2193 5120 4864
7 1051 24.01.2016 04:00:00 2400 NA NA
8 1051 24.01.2016 04:03:28 2608 5888 5632
9 1051 24.01.2016 04:34:22 4462 5632 38144
10 1051 25.01.2016 04:10:00 89400 NA NA
11 1051 25.01.2016 08:08:16 103696 37376 37632
oldcolumn <- lm(mydataframe$old ~ mydataframe$difference)
oldcolumn # old = 1.348e+04 + 2.233e-01*difference
oldfunction <- function (difference) {1.348e+04 + 2.233e-01*difference} # produces the row values for the "old" column
newcolumn <- lm(mydataframe$new ~ mydataframe$difference)
newcolumn # new = 2.14e+04 + 1.56e-01*difference
newfunction <- function (difference) {2.14e+04 + 1.56e-01*difference} # produces the row values for the "new" column
myinterpolizer <- function (difference) {c(oldfunction(difference),newfunction(difference))} # produces the row values for the "old&new" column
myinterpolizer(0) # 13480 21400
myinterpolizer(600) # 13613.98 21493.60
myinterpolizer(1200) # 13747.96 21587.20
myinterpolizer(1800) # 13881.94 21680.80
myinterpolizer(2400) # 14015.92 21774.40
myinterpolizer(89400) # 33443.02 35346.40
产生上述12个数字的更简单的单线程:
# mydataframe[is.na(mydataframe$old),] # filters the rows where old=NA
# mydataframe[is.na(mydataframe$old),3] # After (filtering the rows where old=NA) select (the "difference" column)
lapply(mydataframe[is.na(mydataframe$old),3], myinterpolizer)