我有一个数据集,其中title
列包含电影名称。在某些行中,电影的名称被错放了。
[1] "Killer Shrews, The (1959)" [2] "Kronos (1957)"
[3] "Kronos (1973)" [4] "Phantom of the Opera, The (1943)"
[5] "Runaway (1984)" [6] "Slumber Party Massacre, The (1982)"
例如,第一个应该是The Killer Shrews (1959)
。
我不知道如何解决这个问题。有什么想法吗?
答案 0 :(得分:2)
我们可以使用sub
。在pattern参数中捕获字符作为一个组,并在替换时将反向引用洗牌(假设预期的输出模式类似于第一个元素显示的那个)。
sub("([^,]+),\\s+([^( ]+)\\s+(.*)", "\\2 \\1 \\3", v1)
#[1] "The Killer Shrews (1959)" "Kronos (1957)"
#[3] "Kronos (1973)" "The Phantom of the Opera (1943)"
#[5] "Runaway (1984)" "The Slumber Party Massacre (1982)"
v1 <- c("Killer Shrews, The (1959)", "Kronos (1957)", "Kronos (1973)",
"Phantom of the Opera, The (1943)", "Runaway (1984)",
"Slumber Party Massacre, The (1982)" )