Question

我的数据如下所示："19970325" "19970325" "19970422" "19970516"

我想转换为：1997-03-25, 1977-03-25, 1997-04-22....

我找到了 df[,2] <- format(df[,2], format="%Y-%m-%d")，但效果不佳。

    date    Amount
1   4   19970101    29.33
2   4   19970118    29.73
3   4   19970802    14.96
4   4   19971212    26.48
5   21  19970101    63.34
6   21  19970113    11.77
7   50  19970101    6.79
8   71  19970101    13.97
9   86  19970101    23.94
10  111 19970101    35.99
11  111 19970111    32.99
12  111 19970315    77.96
13  111 19970416    59.30
14  111 19970424    134.98

Answer 1

我会使用lubridate（install.library("lubridate")）

http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html

library(lubridate)
ymd("20110604")

## [1] "2011-06-04 UTC"

Answer 2

这对我有用

 as.Date(as.character(df1$date), format='%Y%m%d')
 #[1] "1997-01-01" "1997-01-18" "1997-08-02" "1997-12-12" "1997-01-01"
 #[6] "1997-01-13" "1997-01-01" "1997-01-01" "1997-01-01" "1997-01-01"
 #[11] "1997-01-11" "1997-03-15" "1997-04-16" "1997-04-24"

更新

如果您需要删除重复的ID，则一个选项为duplicated。下面的代码给出了每个重复ID的第一行。

 df1[!duplicated(df1$id),]
 #    id     date Amount
 #1    4 19970101  29.33
 #5   21 19970101  63.34
 #7   50 19970101   6.79
 #8   71 19970101  13.97
 #9   86 19970101  23.94
 #10 111 19970101  35.99

假设，如果您需要获取每个id的'Amount'之和，请使用group by group函数之一。

  aggregate(Amount~id, df1, sum)

数据

 df1 <- structure(list(id = c(4L, 4L, 4L, 4L, 21L, 21L, 50L, 71L, 86L, 
 111L, 111L, 111L, 111L, 111L), date = c(19970101L, 19970118L, 
 19970802L, 19971212L, 19970101L, 19970113L, 19970101L, 19970101L, 
 19970101L, 19970101L, 19970111L, 19970315L, 19970416L, 19970424L
 ), Amount = c(29.33, 29.73, 14.96, 26.48, 63.34, 11.77, 6.79, 
 13.97, 23.94, 35.99, 32.99, 77.96, 59.3, 134.98)), .Names = c("id", 
 "date", "Amount"), class = "data.frame", row.names = c("1", "2", 
 "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"))

如何在R中更改为年 - 月 - 日格式

2 个答案:

更新

数据