我的数据如下所示:"19970325" "19970325" "19970422" "19970516"
我想转换为:1997-03-25, 1977-03-25, 1997-04-22....
df[,2] <- format(df[,2], format="%Y-%m-%d")
,但效果不佳。
date Amount
1 4 19970101 29.33
2 4 19970118 29.73
3 4 19970802 14.96
4 4 19971212 26.48
5 21 19970101 63.34
6 21 19970113 11.77
7 50 19970101 6.79
8 71 19970101 13.97
9 86 19970101 23.94
10 111 19970101 35.99
11 111 19970111 32.99
12 111 19970315 77.96
13 111 19970416 59.30
14 111 19970424 134.98
答案 0 :(得分:1)
我会使用lubridate
(install.library("lubridate")
)
http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
library(lubridate)
ymd("20110604")
## [1] "2011-06-04 UTC"
答案 1 :(得分:0)
这对我有用
as.Date(as.character(df1$date), format='%Y%m%d')
#[1] "1997-01-01" "1997-01-18" "1997-08-02" "1997-12-12" "1997-01-01"
#[6] "1997-01-13" "1997-01-01" "1997-01-01" "1997-01-01" "1997-01-01"
#[11] "1997-01-11" "1997-03-15" "1997-04-16" "1997-04-24"
如果您需要删除重复的ID,则一个选项为duplicated
。下面的代码给出了每个重复ID的第一行。
df1[!duplicated(df1$id),]
# id date Amount
#1 4 19970101 29.33
#5 21 19970101 63.34
#7 50 19970101 6.79
#8 71 19970101 13.97
#9 86 19970101 23.94
#10 111 19970101 35.99
假设,如果您需要获取每个id的'Amount'之和,请使用group by group函数之一。
aggregate(Amount~id, df1, sum)
df1 <- structure(list(id = c(4L, 4L, 4L, 4L, 21L, 21L, 50L, 71L, 86L,
111L, 111L, 111L, 111L, 111L), date = c(19970101L, 19970118L,
19970802L, 19971212L, 19970101L, 19970113L, 19970101L, 19970101L,
19970101L, 19970101L, 19970111L, 19970315L, 19970416L, 19970424L
), Amount = c(29.33, 29.73, 14.96, 26.48, 63.34, 11.77, 6.79,
13.97, 23.94, 35.99, 32.99, 77.96, 59.3, 134.98)), .Names = c("id",
"date", "Amount"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"))