我有一个数据表,我想使用“时间”列合并数据以减少数据。
library(data.table)
DT <- data.table(ID=c("A","A","A","B","B","C","C","C","C","D"),
Time=c("2019-01-16 15:52:03","2019-01-16 16:01:04","2019-01-26 01:22:54",
"2019-02-18 17:00:08","2019-02-18 17:05:44",
"2019-03-16 13:23:42","2019-03-16 15:52:03","2019-06-04 12:01:04","2019-06-04 16:20:54",
"2019-03-16 13:23:42"),
place=c("Vienna","France","Berlin","Rome","Washington",
"Bangkok","Ottawa","Tokyo","SouthKorea","Singapore"))
如果当天将同一天和同一ID合并。
不同的日子,不需要合并
输出:
ID Time place
1 A 2019-01-16 Vienna-France
2 A 2019-01-26 Berlin
3 B 2019-02-18 Rome-Washington
4 C 2019-03-16 Bangkok-Ottawa
5 C 2019-06-04 Tokyo-SouthKorea
6 D 2019-03-16 Singapore
我该怎么办?谢谢。
答案 0 :(得分:4)
我看到您更喜欢data.table
(有关此信息,请参阅@January的帖子),但是,这里有一个dplyr
解决方案:
DT %>%
group_by(ID, Time = as.Date(Time, format = "%Y-%m-%d")) %>%
summarise(place = paste(place, collapse = "-"))
ID Time place
<chr> <date> <chr>
1 A 2019-01-16 Vienna-France
2 A 2019-01-26 Berlin
3 B 2019-02-18 Rome-Washington
4 C 2019-03-16 Bangkok-Ottawa
5 C 2019-06-04 Tokyo-SouthKorea
6 D 2019-03-16 Singapore
答案 1 :(得分:3)
编辑:我刚刚注意到它也应该按ID分组。
DT[ , by = .(ID, as.Date(Time, "%Y-%m-%d")), .(place=paste(place, collapse="-")) ]
ID as.Date place
1: A 2019-01-16 Vienna-France
2: A 2019-01-26 Berlin
3: B 2019-02-18 Rome-Washington
4: C 2019-03-16 Bangkok-Ottawa
5: C 2019-06-04 Tokyo-SouthKorea
6: D 2019-03-16 Singapore
答案 2 :(得分:3)
您还可以使用以R为基数的
aggregate(place ~ ID + as.Date(Time) , DT, paste0,collapse = '-')
ID as.Date(Time) place
1 A 2019-01-16 Vienna-France
2 A 2019-01-26 Berlin
3 B 2019-02-18 Rome-Washington
4 C 2019-03-16 Bangkok-Ottawa
5 D 2019-03-16 Singapore
6 C 2019-06-04 Tokyo-SouthKorea