对于示例数据框:
Name Value
1 Katie (5676W) <NA>
2 John (2345G) <NA>
3 Hex (4563W) <NA>
4 Mike (4564R) <NA>
df <- structure(list(
Name = c("Katie (5676W)", "John (2345G)", "Hex (4563W)",
"Mike (4564R)"),
Value = c(NA_character_, NA_character_, NA_character_, NA_character_)),
.Names = c("Name", "Value"),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -4L),
spec = structure(list(
cols = structure(list(Name = structure(list(), class = c("collector_character",
"collector")), Value = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Name", "Value")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
我想提取括号中包含的数字和字母,而是在df数据框中添加它(减去括号到'value'列)。
我已经看到Stackoverflow如何在向量中提取它,但是没有设法让它在数据帧中工作。有什么想法吗?
答案 0 :(得分:1)
您可以尝试以下方法:
library(qdapRegex)
df$Value = rm_between(df$Name, '(', ')', extract=TRUE)
输出:
Name Value
1 Katie (5676W) 5676W
2 John (2345G) 2345G
3 Hex (4563W) 4563W
4 Mike (4564R) 4564R
希望这有帮助!
答案 1 :(得分:0)
你可以这样做:(我使用stringr
- 包,但它也可以在基数R中完成)
library(stringr)
df$Value <- str_extract(df$Name, "\\(.*\\)")
df$Value <- str_remove_all(df$Value, "[\\(\\)]")
df
# A tibble: 4 x 2
# Name Value
# <chr> <chr>
# 1 Katie (5676W) 5676W
# 2 John (2345G) 2345G
# 3 Hex (4563W) 4563W
# 4 Mike (4564R) 4564R
使用基数R,您可以:
df$Value <- sub("(.*\\()(.*)(\\))", "\\2", df$Name)