使用stringr从字符变量中提取名称

时间:2018-04-02 18:17:03

标签: r stringr

我有一个字符变量(Min3 $ Name)由包含姓氏的文件名制成,我还有一个名为“Name”的列表,其中包括上面的所有姓氏加上未包含的姓氏,我可以使用stringr来只使用文件中的姓氏创建一个新列? 我试过了:

Min3$Name2 <- as.character(str_match_all(Min3$Name , Name))

然而问题是列表有63个名称,而df只包含其中的25个,所以我收到此错误:

Error in `$<-.data.frame`(`*tmp*`, Names, value = c("character(0)", 
"character(0)",  : 
 replacement has 63 rows, data has 25

由于

编辑: 这是我正在使用的df

> dput(head(Min3, 1))
structure(list(Min_1 = 136.075840266223, Min_2 = 114.131164725458, 
 Min_3 = 109.639994444444, Min_4 = 103.885620833333, Min_5 = 
97.1868380634391, 
Min_6 = 92.3339222222222, Min_7 = 91.5180047619048, Min_8 = 
90.1389770833333, 
Min_9 = 84.5778222222222, Min_10 = 83.6758497495826, Name = "Sale_A Export 
for Alafoti Fa'osiliva 37599.csv", 
Game = structure(c("Sale_A", "Export", "for", "Alafoti", 
"Fa'osiliva 37599.csv"), .Dim = c(1L, 5L)), Date = structure(17623, class = 
"Date")), .Names = c("Min_1", 
"Min_2", "Min_3", "Min_4", "Min_5", "Min_6", "Min_7", "Min_8", 
"Min_9", "Min_10", "Name", "Game", "Date"), row.names = "Sale_A Export for 
Alafoti Fa'osiliva 37599.csv", class = "data.frame")
> 

name变量以csv文件命名,该文件作为一组25个文件的一部分通过循环运行。

我还有一个姓氏列表,共有63个名字:

Name
[1] "Alo"            "Bower"          "Kerrod"         "Milasinovich"   
"Morris"         "Rigby"          "Schonert"       "Waller"        
 [9] "Annett"         "Cutting"        "Singleton"      "Taufete'e"      
"Williams"       "Barry"          "Clegg"          "Kitchener"     
[17] "O'Callaghan"    "Phillips"    "Hill"           
"Kirwan"         "Lewis"          "Fa'osiliva"     "Hill"     

我正在尝试创建一个新变量Min3$Name2,它从Min3$Name变量中提取人名。

希望有点清楚!感谢

2 个答案:

答案 0 :(得分:0)

这对我有用,但如果它给你带来问题,请告诉我。

我无法用一行重现您的问题,因此我扩展了您的数据。只是提醒一下,在未来,您可能需要提供几行来处理列表列表交互,这看起来是。

# Add another example, sub in a new name
test <- rbind(Min3, Min3)
test$Name[2] <- "Sale_A Export for Alafoti O'Callaghan 37599.csv"

# Running down test$Name, make a new column...
test$newName <- sapply(test$Name, function(x)

      # str_match_all returns a list.  Everything except the matches is empty and gets removed if you unlist it
       unlist(str_match_all(x, Name)))

# Check in the console.  Looks ok to me!
test$newName
[1] "Fa'osiliva"  "O'Callaghan"

答案 1 :(得分:0)

您可以将名称向量折叠为“或”正则表达式。我的例子中只有两个名字只是为了告诉你。

names <- c('Alo', "Fa'osiliva")
names.pattern <- paste0(names, collapse = "|")
names.pattern
#[1] "Alo|Fa'osiliva"

str_extract_all(Min3$Name, pattern = names.pattern)
#[[1]]
#[1] "Fa'osiliva"