很抱歉,标题有点罗word,希望本示例会有所帮助。我有以下数据集:
my_df
Description thisYVal thisPts
1 (12:00) Start Period 0 0
2 (12:00) Jump Ball Thomas vs Grant 0 0
3 (11:48) [MIA 3-] Wade Layup Shot: Missed 0 2
4 (11:46) [PHL] Thomas Rebound (Off: Def:1) 0 0
6 (11:02) [MIA] Haslem Jump Shot: Missed -19 2
7 (11:00) [MIA] Haslem Rebound (Off:1 Def:) 0 0
8 (10:57) [MIA] Haslem Layup Shot: Missed 0 2
9 (10:56) [PHL] Coleman Rebound (Off: Def:1) 0 0
dput(my_df)
structure(list(Description = c("(12:00) Start Period", "(12:00) Jump Ball Thomas vs Grant",
"(11:48) [MIA 3-] Wade Layup Shot: Missed", "(11:46) [PHL] Thomas Rebound (Off: Def:1)",
"(11:02) [MIA] Haslem Jump Shot: Missed", "(11:00) [MIA] Haslem Rebound (Off:1 Def:)",
"(10:57) [MIA] Haslem Layup Shot: Missed", "(10:56) [PHL] Coleman Rebound (Off: Def:1)"
), thisYVal = c(0L, 0L, 0L, 0L, -19L, 0L, 0L, 0L), thisPts = c(0L,
0L, 2L, 0L, 2L, 0L, 2L, 0L)), row.names = c(1L, 2L, 3L, 4L, 6L,
7L, 8L, 9L), class = "data.frame")
...,我想提取出现在数据框的Description
列中的3个字母的团队缩写。
第三个字母的描述始终紧跟在第一个方括号 [之后,尽管不一定总是紧随其后的是] (如数据框第3行所示)。
我一直在尝试使用substr()函数来执行此操作,但是到目前为止还没有运气。任何帮助表示赞赏!
编辑:一些附加上下文-一些行(在这种情况下为1和2)没有[]或团队缩写。在这些情况下,数据框可能返回空字符串,NA或其他内容。
EDIT-2:只是为了以防万一,因为它没有被明确提及-我试图获取带有c("", "", "MIA", "PHL", "MIA", "MIA", "MIA", "PHL")
的第四列
Edit-3:以下内容使我接近,但还不完全是
my_df %>%
dplyr::mutate(teamAbb = unlist(stringr::str_extract(Description, "\\[(.*)\\]")))
答案 0 :(得分:2)
R最近将{
"require" : {
"php" : ">=7.0",
"bitwasp/bitcoin": "0.0.35.0",
"btccom/bitwasp-bitcoin-bch-addon" : "0.0.2"
}
}
引入了其标准strcapture
软件包:
utils
答案 1 :(得分:1)
您可以使用stringr
软件包中的str_match
。具体来说,您将需要在左方括号后寻找三个大写字母(假设 all 团队缩写为三个字母)。
> str_match(df$Description, '\\[([A-Z]{3})')
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] "[MIA" "MIA"
[4,] "[PHL" "PHL"
[5,] "[MIA" "MIA"
[6,] "[MIA" "MIA"
[7,] "[MIA" "MIA"
[8,] "[PHL" "PHL"
您会注意到,团队缩写模式实际上在括号内;这是因为它是我们要提取的模式的subgroup。这样,str_match
返回(1)整个模式,以及(2)括号中指定的子组。因此,在这种情况下,我们要采用第二列,其中包含来自 first 子组的匹配项。
df$Team <- str_match(df$Description, '\\[([A-Z]{3})')[,2]
这给我们想要的结果:
Description Team
1 (12:00) Start Period <NA>
2 (12:00) Jump Ball Thomas vs Grant <NA>
3 (11:48) [MIA 3-] Wade Layup Shot: Missed MIA
4 (11:46) [PHL] Thomas Rebound (Off: Def:1) PHL
5 (11:02) [MIA] Haslem Jump Shot: Missed MIA
6 (11:00) [MIA] Haslem Rebound (Off:1 Def:) MIA
7 (10:57) [MIA] Haslem Layup Shot: Missed MIA
8 (10:56) [PHL] Coleman Rebound (Off: Def:1) PHL
答案 2 :(得分:1)
这是另一种选择,它在方括号后查找3个非数字并将其放在名为Team的新列中。
library(tidyverse)
df %>% mutate(Team = str_extract(Description, "(?<=\\[)\\D{3}"))
#> Description thisYVal thisPts Team
#> 1 (12:00) Start Period 0 0 <NA>
#> 2 (12:00) Jump Ball Thomas vs Grant 0 0 <NA>
#> 3 (11:48) [MIA 3-] Wade Layup Shot: Missed 0 2 MIA
#> 4 (11:46) [PHL] Thomas Rebound (Off: Def:1) 0 0 PHL
#> 5 (11:02) [MIA] Haslem Jump Shot: Missed -19 2 MIA
#> 6 (11:00) [MIA] Haslem Rebound (Off:1 Def:) 0 0 MIA
#> 7 (10:57) [MIA] Haslem Layup Shot: Missed 0 2 MIA
#> 8 (10:56) [PHL] Coleman Rebound (Off: Def:1) 0 0 PHL
由reprex package(v0.2.0)于2018-09-09创建。