如何修复R中错位的文本单词

时间:2016-07-13 19:30:33

标签: r

我有一个数据集,其中title列包含电影名称。在某些行中,电影的名称被错放了。

[1] "Killer Shrews, The (1959)"       [2] "Kronos (1957)"                         
[3] "Kronos (1973)" [4] "Phantom of the Opera, The (1943)"      
[5] "Runaway (1984)"             [6]  "Slumber Party Massacre, The (1982)"  
例如,第一个应该是The Killer Shrews (1959)。 我不知道如何解决这个问题。有什么想法吗?

1 个答案:

答案 0 :(得分:2)

我们可以使用sub。在pattern参数中捕获字符作为一个组,并在替换时将反向引用洗牌(假设预期的输出模式类似于第一个元素显示的那个)。

sub("([^,]+),\\s+([^( ]+)\\s+(.*)", "\\2 \\1 \\3", v1)
#[1] "The Killer Shrews (1959)"          "Kronos (1957)"
#[3] "Kronos (1973)"                     "The Phantom of the Opera (1943)"  
#[5] "Runaway (1984)"                    "The Slumber Party Massacre (1982)"

数据

v1 <- c("Killer Shrews, The (1959)", "Kronos (1957)",  "Kronos (1973)", 
 "Phantom of the Opera, The (1943)", "Runaway (1984)", 
 "Slumber Party Massacre, The (1982)" )