如何重新排列加载的时间序列数据?

时间:2013-02-02 09:39:20

标签: r data-structures load reshape

我得到的是txt文件(mydata.txt)中的数据,如下所示:

Variable, DateTime, Value, Quality
A, 01-01-1970 00:00:00, 0, 0
A, 01-01-1970 00:02:00, 2, 2
A, 01-01-1970 00:04:00, 4, 1
A, 01-01-1970 00:06:00, 6, 0
B, 01-01-1970 00:02:00, 0.2, 0
B, 01-01-1970 00:04:00, 0.4, 1
B, 01-01-1970 00:06:00, 0.6, 1
B, 01-01-1970 00:10:00, 1.0, 0
C, 01-01-1970 00:00:00, 20.0, 0
C, 01-01-1970 00:04:00, 16.0, 0
C, 01-01-1970 00:08:00, 12.0, 3

我可以将它加载到R而没有问题

read.csv("mydata.txt", header = TRUE, sep = ",")

read.table("mydata.txt", header = TRUE, sep = ",")

我想要在r中使用的内容将是这样的:

DateTime, A_Value, A_Quality, B_Value, B_Quality, C_Value, C_Quality
01-01-1970 00:00:00, 0, 0, NA, NA, 20.0, 0
01-01-1970 00:02:00, 2, 2, 0.2, 0, NA, NA
01-01-1970 00:04:00, 4, 1, 0.4, 1, 16.0, 0
01-01-1970 00:06:00, 6, 0, 0.6, 1, NA, NA
01-01-1970 00:08:00, NA, NA, NA, NA, 12.0, 3
01-01-1970 00:10:00, NA, NA, 1.0, 0, NA, NA

(第一列是日期/时间类型)。

我不知道我的文件中有哪些或哪些不同的变量(即A,B,... Z),我不知道他们的名字 - 我所知道的只是他们的专栏。

我如何从文本文件中的数据集到我想在R中使用的数据集?

提前致谢!

2 个答案:

答案 0 :(得分:4)

您可以使用reshape2包执行此操作:

第一步:melt data.frame

require(reshape2)
df.m <- melt(df, id.var = 1:2) # changed names(df)[1:2] to 1:2 (following @Anandamahto's comment)

第二步:cast结果:

dcast(df.m, DateTime ~ Variable + variable, fill=NA)

#               DateTime A_Value A_Quality B_Value B_Quality C_Value C_Quality
# 1  01-01-1970 00:00:00       0         0      NA        NA      20         0
# 2  01-01-1970 00:02:00       2         2     0.2         0      NA        NA
# 3  01-01-1970 00:04:00       4         1     0.4         1      16         0
# 4  01-01-1970 00:06:00       6         0     0.6         1      NA        NA
# 5  01-01-1970 00:08:00      NA        NA      NA        NA      12         3
# 6  01-01-1970 00:10:00      NA        NA     1.0         0      NA        NA

答案 1 :(得分:4)

正常阅读您的数据:

mydata <- read.table("mydata.txt", header = TRUE, sep = ",")

然后使用几种方法之一将其从所谓的“长”格式“重塑”为“宽”格式。

这只是使用reshape的基础R中的一行:

reshape(mydata, direction = "wide", idvar = "DateTime", timevar = "Variable")
#                DateTime Value.A Quality.A Value.B Quality.B Value.C Quality.C
# 1   01-01-1970 00:00:00       0         0      NA        NA      20         0
# 2   01-01-1970 00:02:00       2         2     0.2         0      NA        NA
# 3   01-01-1970 00:04:00       4         1     0.4         1      16         0
# 4   01-01-1970 00:06:00       6         0     0.6         1      NA        NA
# 8   01-01-1970 00:10:00      NA        NA     1.0         0      NA        NA
# 11  01-01-1970 00:08:00      NA        NA      NA        NA      12         3