从R中的数据文件中提取日期

时间:2014-12-14 15:56:17

标签: r date extract

这是文本文件:

1   Toy Story (1995)    01-Jan-95   http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)   0   0   0   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0
2   GoldenEye (1995)    01-Jan-95   http://us.imdb.com/M/title-exact?GoldenEye%20(1995) 0   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0
3   Four Rooms (1995)   01-Jan-95   http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0
4   Get Shorty (1995)   01-Jan-95   http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)  0   1   0   0   0   1   0   0   1   0   0   0   0   0   0   0   0   0   0
5   Copycat (1995)  01-Jan-95   http://us.imdb.com/M/title-exact?Copycat%20(1995)   0   0   0   0   0   0   1   0   1   0   0   0   0   0   0   0   1   0   0
6   Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)    01-Jan-95   http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)    0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
7   Twelve Monkeys (1995)   01-Jan-95   http://us.imdb.com/M/title-exact?Twelve%20Monkeys%20(1995)  0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   1   0   0   0
8   Babe (1995) 01-Jan-95   http://us.imdb.com/M/title-exact?Babe%20(1995)  0   0   0   0   1   1   0   0   1   0   0   0   0   0   0   0   0   0   0
9   Dead Man Walking (1995) 01-Jan-95   http://us.imdb.com/M/title-exact?Dead%20Man%20Walking%20(1995)  0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
10  Richard III (1995)  22-Jan-96   http://us.imdb.com/M/title-exact?Richard%20III%20(1995) 0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0
11  Seven (Se7en) (1995)    01-Jan-95   http://us.imdb.com/M/title-exact?Se7en%20(1995) 0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   0   0
12  "Usual Suspects, The (1995)"    14-Aug-95   "http://us.imdb.com/M/title-exact?Usual%20Suspects,%20The%20(1995)" 0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   0   0
13  Mighty Aphrodite (1995) 30-Oct-95   http://us.imdb.com/M/title-exact?Mighty%20Aphrodite%20(1995)    0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
14  "Postino, Il (1994)"    01-Jan-94   "http://us.imdb.com/M/title-exact?Postino,%20Il%20(1994)"   0   0   0   0   0   0   0   0   1   0   0   0   0   0   1   0   0   0   0
15  Mr. Holland's Opus (1995)   29-Jan-96   http://us.imdb.com/M/title-exact?Mr.%20Holland's%20Opus%20(1995)    0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
16  French Twist (Gazon maudit) (1995)  01-Jan-95   http://us.imdb.com/M/title-exact?Gazon%20maudit%20(1995)    0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0   0   0
17  From Dusk Till Dawn (1996)  05-Feb-96   http://us.imdb.com/M/title-exact?From%20Dusk%20Till%20Dawn%20(1996) 0   1   0   0   0   1   1   0   0   0   0   1   0   0   0   0   1   0   0
18  "White Balloon, The (1995)" 01-Jan-95   http://us.imdb.com/M/title-exact?Badkonake%20Sefid%20(1995) 0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
19  Antonia's Line (1995)   01-Jan-95   http://us.imdb.com/M/title-exact?Antonia%20(1995)   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
20  Angels and Insects (1995)   01-Jan-95   http://us.imdb.com/M/title-exact?Angels%20and%20Insects%20(1995)    0   0   0   0   0   0   0   0   1   0   0   0   0   0   1   0   0   0   0

我已使用此代码从文件中导入数据:

Movies = read.table("Movies.txt", 
               sep="\t", 
               col.names=c( "MId", "title", "date", "link", "c1", "c2", "c3","c4",           "c5", "c6","c7", "c8", "c9","c10", "c11", "c12","c13", "c14", "c15","c16", "c17", "c18", "c19"),
               fill=FALSE, 
               strip.white=TRUE,
               quote = "")

如何向包含“年”

的“电影”添加新列

您可以使用年份(未标注)或从日期中提取年份。感谢您提前提供任何帮助

1 个答案:

答案 0 :(得分:1)

你可以尝试

lines1 <- readLines('Movies.txt')
library(stringr)
as.numeric(str_extract(lines1, perl('(?<=[(])\\d+')))
#[1] 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1994 1995
#[16] 1995 1996 1995 1995 1995

或使用base R

as.numeric(regmatches(lines1,regexpr('(?<=[(])\\d+', lines1, perl=TRUE)))
#[1] 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1994 1995
#[16] 1995 1996 1995 1995 1995