我有一个data.frame,df,其中我有2列,一列是歌曲的标题,另一列是合并的标题和艺术家。我希望创建一个单独的艺术家领域。 前三行显示在这里
title titleArtist
I'll Never Smile Again I'll Never Smile Again TOMMY DORSEY & HIS ORCHESTRA / FRANK SINATRA & PIED PIPERS
Imagination Imagination GLENN MILLER & HIS ORCHESTRA / RAY EBERLE
The Breeze And I The Breeze And I JIMMY DORSEY & HIS ORCHESTRA / BOB EBERLY
此代码
对此数据集没有任何问题library(stringr)
library(dplyr)
df %>%
head(3) %>%
mutate(artist=str_to_title(str_trim(str_replace(titleArtist,title,"")))) %>%
select(artist,title)
artist title
1 Tommy Dorsey & His Orchestra / Frank Sinatra & Pied Pipers I'll Never Smile Again
2 Jimmy Dorsey & His Orchestra / Bob Eberly The Breeze And I
3 Glenn Miller & His Orchestra / Ray Eberle Imagination
但是当我将它应用于数千行时,我得到了错误
Error: Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN)
#or for part of the mutation
df$artist <-str_replace(df$titleArtist,df$title,"")
Error in stri_replace_first_regex(string, pattern, replacement, opts_regex = attr(pattern, :
Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN)
我已从列中删除所有括号,代码似乎在我收到错误之前工作了一段时间
Error: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
是否是另一个可能导致问题的特殊角色,或者可能是其他角色?
TIA
答案 0 :(得分:2)
您的一般问题是str_replace
将您的artist
值视为正则表达式,因此除括号外的特殊字符会导致很多潜在错误。 stringi
包装和简化的stringr
库允许更细粒度的控件,包括将参数视为固定字符串而不是正则表达式。我没有原始数据,但是当我在以下位置抛出一些导致错误的字符时,这是有效的。
library(dplyr)
library(stringi)
df = data_frame(title = c("I'll Never Smile Again (", "Imagination.*", "The Breeze And I(?>="),
titleArtist = c("I'll Never Smile Again ( TOMMY DORSEY & HIS ORCHESTRA / FRANK SINATRA & PIED PIPERS",
"Imagination.* GLENN MILLER & HIS ORCHESTRA / RAY EBERLE",
"The Breeze And I(?>= JIMMY DORSEY & HIS ORCHESTRA / BOB EBERLY"))
df %>%
mutate(artist=stri_trans_totitle(stri_trim(stri_replace_first_fixed(titleArtist,title,"")))) %>%
select(artist,title)
结果:
Source: local data frame [3 x 2]
artist title
(chr) (chr)
1 Tommy Dorsey & His Orchestra / Frank Sinatra & Pied Pipers I'll Never Smile Again (
2 Glenn Miller & His Orchestra / Ray Eberle Imagination.*
3 Jimmy Dorsey & His Orchestra / Bob Eberly The Breeze And I(?>=
答案 1 :(得分:0)
df <- data.frame(ID=11:13, T_A=c('a/b','b/c','x/y')) # T_A Title/Artist
ID T_A
1 11 a/b
2 12 b/c
3 13 x/y
# Title Artist are separated by /
> within(df, T_A<-data.frame(do.call('rbind', strsplit(as.character(T_A), '/', fixed=TRUE))))
ID T_A.X1 T_A.X2
1 11 a b
2 12 b c
3 13 x y