我的数据框的一列中有时间戳。他们看起来像
Student Age 1 Age 2 Age 3
----------- --------- --------- ---------
1 1@2.50 0 3@2.50
2 12@2.00 2@1.50 3@1.50
我想用仅月份名称替换整个时间戳。我怎么能在R?让我们说列名是"时间戳"和数据框名称是" Df"。
以下是更多条目的示例。
"Tue May 14 21:57:04 +0000 2013"
我希望这些看起来像
"Wed Jul 10 01:30:36 +0000 2013"
"Fri Apr 20 01:46:59 +0000 2012"
"Sat Jul 07 17:56:34 +0000 2012"
"Sat Mar 16 02:12:30 +0000 2013"
"Sat Feb 16 02:29:11 +0000 2013"
我们将非常感谢您的帮助。
答案 0 :(得分:5)
R> dates <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
R> dates
[1] "Tue May 14 21:57:04 +0000 2013"
[2] "Wed Jul 10 01:30:36 +0000 2013"
[3] "Fri Apr 20 01:46:59 +0000 2012"
[4] "Sat Jul 07 17:56:34 +0000 2012"
[5] "Sat Mar 16 02:12:30 +0000 2013"
[6] "Sat Feb 16 02:29:11 +0000 2013"
R>
使用适当的strptime
格式进行解析:
R> pt <- strptime(dates, "%a %b %d %H:%M:%S +0000 %Y")
R> pt
[1] "2013-05-14 21:57:04 CDT" "2013-07-10 01:30:36 CDT"
[3] "2012-04-20 01:46:59 CDT" "2012-07-07 17:56:34 CDT"
[5] "2013-03-16 02:12:30 CDT" "2013-02-16 02:29:11 CST"
R>
重新格式化所需的月份
R> strftime(pt, "%m")
[1] "05" "07" "04" "07" "03" "02"
R> strftime(pt, "%b")
[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
R> strftime(pt, "%B")
[1] "May" "July" "April" "July" "March"
[6] "February"
R>
答案 1 :(得分:3)
您可以将strptime
与format
一起使用。
假设您有字符,我们可以先将其转换为"POSIXlt" "POSIXt"
格式,然后提取其中的月份(%b
)部分
format(strptime(x, "%a %b %d %H:%M:%S +0000 %Y"), "%b")
#[1] "Jul" "Apr" "Jul" "Mar" "Feb"
答案 2 :(得分:2)
我们可以使用sub
。匹配一个或多个非空白字符(\\S+
),后跟一个或多个空格(\\s+
),然后将非空白区域捕获为一组((\\S+)
),然后直到字符串结尾的字符,并将其替换为捕获组的反向引用(\\1
)。
sub("\\S+\\s+(\\S+).*", "\\1", v1)
#[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
如果我们知道如何使format
更正,可能最好使用DateTime转换(如评论中提到的@DirkEddelbuettel)。
v1 <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
答案 3 :(得分:2)
假设您的timestamp
是文字:
df<-data.frame(timestamp=c("Tue May 14 21:57:04 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013"),stringsAsFactors = F)
df$month<-sapply(df$timestamp,function(sx)strsplit(sx,split=" ")[[1]][2])
df
> df
timestamp month
1 Tue May 14 21:57:04 +0000 2013 May
2 Fri Apr 20 01:46:59 +0000 2012 Apr
3 Sat Mar 16 02:12:30 +0000 2013 Mar
答案 4 :(得分:0)
1)月份名称始终位于字符位置5到7(包括timestamp
列),因此这会将timestamp
列替换为月份字符数:< / p>
transform(DF, timestamp = format(substr(timestamp, 5, 7)))
输出结果为:
timestamp
1 Jul
2 Apr
3 Jul
4 Mar
5 Feb
2)如果您想要一个因子列,那么请使用此变体来确保因子级别为Jan = 1,Feb = 2等,而不是按字母顺序分配:
transform(DF, timestamp = factor(substr(timestamp, 5, 7), levels = month.abb))
注意:我们假设输入的格式如下:
DF <- data.frame(timestamp = c("Fri Apr 20 01:46:59 +0000 2012",
"Sat Feb 16 02:29:11 +0000 2013", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013"))