R提取字符串的字符串部分

时间:2018-01-29 21:27:03

标签: r regex

我在变量resHeaders中有一个字符串列表:
每一行的语法是:"玩家名称,玩家A分数,玩家B名称,玩家B分数"

> resHeaders
[1] "Mackenzie McDonald\n0\n5\n0\nTatsuma Ito\n0\n5\n0"     
[2] "Uladzimir Ignatik\n0\n5\n15\nGleb Sakharov\n0\n3\n30"  
[3] "Evgeny Karlovskiy\n0\n0\n30\nGuillermo Olaso\n1\n0\n15"
[4] "Katherine Sebov\n0\n3\n40\nAmandine Hesse\n0\n2\n40"   
[5] "Karolina Muchova\n1\n1\n15\nElena Bovina\n0\n1\n0"

如何提取"玩家名字"和#34;玩家B的名字"零件?
对于第一行,结果将是:

  • 球员一个名字:" Mackenzie McDonald"
  • 玩家B名称:" Tatsuma Ito"

数据(@ h3rm4n)

vec <- c("Mackenzie McDonald\n0\n5\n0\nTatsuma Ito\n0\n5\n0","Uladzimir Ignatik\n0\n5\n15\nGleb Sakharov\n0\n3\n30",
     "Evgeny Karlovskiy\n0\n0\n30\nGuillermo Olaso\n1\n0\n15","Katherine Sebov\n0\n3\n40\nAmandine Hesse\n0\n2\n40")

4 个答案:

答案 0 :(得分:1)

使用方法,我们可以使用中的动词将数据重新整形为 key~value 格式。

library(dplyr)
library(purrr)

chr <- c(
  "Mackenzie McDonald\n0\n5\n0\nTatsuma Ito\n0\n5\n0",
  "Uladzimir Ignatik\n0\n5\n15\nGleb Sakharov\n0\n3\n30",
  "Evgeny Karlovskiy\n0\n0\n30\nGuillermo Olaso\n1\n0\n15",
  "Katherine Sebov\n0\n3\n40\nAmandine Hesse\n0\n2\n40",
  "Karolina Muchova\n1\n1\n15\nElena Bovina\n0\n1\n0"
)

map2_dfr(chr, 1:length(chr), ~{

  df <- as.data.frame(
    matrix(unlist(strsplit(.x, "\n")), ncol = 4, byrow = TRUE),
    stringsAsFactors = FALSE
  )

  df %>%
    transmute(
      match = .y,
      player = c("A", "B"),
      name = V1,
      score = paste(V2, V3, V4, sep = ", ")
    ) %>%
    as_tibble

})

# # A tibble: 10 x 4
#    match player name               score   
#    <int> <chr>  <chr>              <chr>   
#  1     1 A      Mackenzie McDonald 0, 5, 0 
#  2     1 B      Tatsuma Ito        0, 5, 0 
#  3     2 A      Uladzimir Ignatik  0, 5, 15
#  4     2 B      Gleb Sakharov      0, 3, 30
#  5     3 A      Evgeny Karlovskiy  0, 0, 30
#  6     3 B      Guillermo Olaso    1, 0, 15
#  7     4 A      Katherine Sebov    0, 3, 40
#  8     4 B      Amandine Hesse     0, 2, 40
#  9     5 A      Karolina Muchova   1, 1, 15
# 10     5 B      Elena Bovina       0, 1, 0 

答案 1 :(得分:0)

这可能是一个解决方案:

lapply(sapply(vec, strsplit, split = '\n'), '[', c(1,5))

结果:

$`Mackenzie McDonald\n0\n5\n0\nTatsuma Ito\n0\n5\n0`
[1] "Mackenzie McDonald" "Tatsuma Ito"       

$`Uladzimir Ignatik\n0\n5\n15\nGleb Sakharov\n0\n3\n30`
[1] "Uladzimir Ignatik" "Gleb Sakharov"    

$`Evgeny Karlovskiy\n0\n0\n30\nGuillermo Olaso\n1\n0\n15`
[1] "Evgeny Karlovskiy" "Guillermo Olaso"  

$`Katherine Sebov\n0\n3\n40\nAmandine Hesse\n0\n2\n40`
[1] "Katherine Sebov" "Amandine Hesse" 

数据:

vec <- c("Mackenzie McDonald\n0\n5\n0\nTatsuma Ito\n0\n5\n0","Uladzimir Ignatik\n0\n5\n15\nGleb Sakharov\n0\n3\n30",
         "Evgeny Karlovskiy\n0\n0\n30\nGuillermo Olaso\n1\n0\n15","Katherine Sebov\n0\n3\n40\nAmandine Hesse\n0\n2\n40")

答案 2 :(得分:0)

您可以使用r egmatches`来提取名称

do.call(rbind,regmatches(x,gregexpr("[A-Z]\\w+\\s[A-Z]\\w+",x,perl = T)))
     [,1]                 [,2]             
[1,] "Mackenzie McDonald" "Tatsuma Ito"    
[2,] "Uladzimir Ignatik"  "Gleb Sakharov"  
[3,] "Evgeny Karlovskiy"  "Guillermo Olaso"
[4,] "Katherine Sebov"    "Amandine Hesse" 
[5,] "Karolina Muchova"   "Elena Bovina"   

您也可以使用str_extract_all(x,"[A-Z]\\w+\\s[A-Z]\\w+")

答案 3 :(得分:0)

您也可以尝试:

df <- data.frame(strings = c("Mackenzie McDonald\n0\n5\n0\nTatsuma Ito\n0\n5\n0", 
                   "Uladzimir Ignatik\n0\n5\n15\nGleb Sakharov\n0\n3\n30" ), stringsAsFactors = FALSE)

f <- function(x) str_split(x, "\n")[[1]][nchar(str_split(x, "\n")[[1]]) > 3]

df2 <- apply(df, 1, f)
df2[,1] <- paste0("Player A name: ", df2[,1])
df2[,2] <- paste0("Player B name: ", df2[,2])
df2