我有一些代码用于解析R中的字符向量。它似乎以不同于其他方式处理某些观察;而且,我无法弄清楚如何纠正它。这是目前构成的代码:
superbowl$Receiver <- as.factor(ifelse(superbowl$Is.Pass == TRUE, ifelse(superbowl$Is.Complete == TRUE, gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\6\\7", superbowl$Detail), gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\7\\8", superbowl$Detail)), NA))
而且,除了它引用的三个向量之外,这里有一个:
> dput(superbowl$Is.Pass[1:25])
c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE,
TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, TRUE, FALSE, TRUE, FALSE, FALSE)
> dput(superbowl$Is.Complete[1:25])
c(NA, TRUE, NA, NA, FALSE, NA, FALSE, NA, NA, TRUE, FALSE, NA,
TRUE, NA, TRUE, NA, NA, TRUE, NA, NA, NA, NA, FALSE, NA, NA)
> dput(superbowl$Detail[1:25])
c("Brandon McManus kicks off 57 yards returned by Tim Hightower for 31 yards (tackle by Brandon McManus)",
"Drew Brees pass complete short left to Coby Fleener for 8 yards (tackle by Sylvester Williams)",
"Mark Ingram right guard for 5 yards (tackle by Jared Crick)",
"Mark Ingram right guard for 4 yards (tackle by Todd Davis)",
"Drew Brees pass incomplete short right intended for Willie Snead (defended by Bradley Roby)",
"Penalty on Andrus Peat: False Start 5 yards (no play)", "Drew Brees pass incomplete short left intended for Willie Snead (defended by Chris Harris)",
"Thomas Morstead punts 34 yards fair catch by Jordan Norwood",
"Devontae Booker left tackle for 6 yards (tackle by Craig Robertson and Dannell Ellerbe)",
"Trevor Siemian pass complete short left to Demaryius Thomas for 14 yards (tackle by Vonn Bell)",
"Trevor Siemian pass incomplete short right intended for A.J. Derby (defended by Kenny Vaccaro)",
"Devontae Booker left guard for 3 yards (tackle by Tyeler Davison)",
"Trevor Siemian pass complete short right to A.J. Derby for 10 yards (tackle by Vonn Bell)",
"Kapri Bibbs right end for 2 yards (tackle by Cameron Jordan and Craig Robertson)",
"Trevor Siemian pass complete deep right to Jordan Taylor for 18 yards (tackle by Vonn Bell)",
"Devontae Booker right tackle for 2 yards (tackle by Paul Kruger)",
"Timeout #1 by Denver Broncos", "Trevor Siemian pass complete short right to Demaryius Thomas for 8 yards (tackle by Delvin Breaux)",
"NOR challenged the first down ruling and the play was overturned. Trevor Siemian pass complete short right to Demaryius Thomas for 7 yards (tackle by Delvin Breaux)",
"Trevor Siemian left guard for 3 yards (tackle by Tyeler Davison)",
"Trevor Siemian sacked by Nick Fairley for -5 yards", "Devontae Booker right end for 11 yards (tackle by Sterling Moore)",
"Trevor Siemian pass incomplete short right intended for Jordan Taylor (defended by Jairus Byrd)",
"DEN challenged the incomplete pass ruling and the play was overturned. Trevor Siemian pass complete short right to Jordan Taylor for 14 yards touchdown",
"Brandon McManus kicks extra point good")
我的结果是:
> superbowl$Receiver <- as.factor(ifelse(superbowl$Is.Pass == TRUE, ifelse(superbowl$Is.Complete == TRUE, gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\6\\7", superbowl$Detail), gsub("(\\w+\\s)*pass\\s(\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+)(\\s\\w+).*", "\\7\\8", superbowl$Detail)), NA))
> superbowl$Receiver[1:25]
[1] <NA>
[2] Coby Fleener
[3] <NA>
[4] <NA>
[5] Willie Snead
[6] <NA>
[7] Willie Snead
[8] <NA>
[9] <NA>
[10] Demaryius Thomas
[11] Trevor Siemian pass incomplete short right intended for A.J. Derby (defended by Kenny Vaccaro)
[12] <NA>
[13] Trevor Siemian pass complete short right to A.J. Derby for 10 yards (tackle by Vonn Bell)
[14] <NA>
[15] Jordan Taylor
[16] <NA>
[17] <NA>
[18] Demaryius Thomas
[19] <NA>
[20] <NA>
[21] <NA>
[22] <NA>
[23] Jordan Taylor
[24] <NA>
[25] <NA>
21 Levels: Andy Janovich ... Trevor Siemian pass incomplete short right intended for A.J. Derby (defended by Kenny Vaccaro)
在我查看整个数据集时,似乎每次A.J. Derby是预期的目标,R返回superbowl$Detail
的整体而不是解析它。这是因为他有名字的首字母缩写吗?我如何让R忽略句点,只用空格来识别单词?谢谢你的帮助!