聚合为R中的新列

时间:2016-12-16 12:56:42

标签: r awk aggregate

输入:

Time,id1,id2
22:30,1,0
22:32,2,1
22:33,1,0
22:34,2,1

所需的输出

Time,Time2,id1,id2
22:30,22:33,1,0
22:32,22:34,2,1

按我的代码输出

Time,id1,id2
22:30,22:33,1,0
22:32,22:34,2,1

我应该对我的代码进行哪些更改aggregate(Time~,df,FUN=toString) 我的id1和id2一起是关键,每个键的时间是进出时间。我需要花时间和超时作为单独的列值。目前它们位于时间列中。

我也尝试使用awk。

2 个答案:

答案 0 :(得分:0)

如果您不想使用任何软件包,则可以使用:

df <- aggregate(Time~.,df,FUN=toString)
df
#output
id1 id2         Time
 1   0  22:30, 22:33
 2   1  22:32, 22:34

df$Time2 <- lapply(strsplit(as.character(df$Time), ","),"[", 2)
df$Time <- lapply(strsplit(as.character(df$Time), ","),"[", 1)
df
#output
id1 id2  Time  Time2
1   0   22:30  22:33
2   1   22:32  22:34

答案 1 :(得分:0)

使用awk

$ cat time.awk
BEGIN {
    FS = OFS = ","
}

function in_time() {
    n++
    store[id1, id2] = n
    itime[n] = time; iid1[n] = id1; iid2[n] = id2
}

function out_time(   i) {
    i = store[id1, id2]
    otime[i] = time
}


NR > 1 {
    time = $1; id1 = $2; id2 = $3
    if   ((id1, id2) in store) out_time()
    else                        in_time()
}

END {
    print "Time,id1,id2"
    for (i = 1; i <= n; i++)
        print itime[i], otime[i], iid1[i], iid2[i]
}

用法:

awk -f time.awk file.dat