Question

我想合并两个数据集，但是我在解决以下问题：

一个数据集中的县以以下模式命名：

[351] "Lindau (Bodensee), Landkreis"                  "Ostallgäu, Landkreis"                         
[353] "Unterallgäu, Landkreis"                        "Donau-Ries, Landkreis"

，另一个：

 [641] "Landkreis Nienburg/Weser"                      "Landkreis Nordhausen"                         
 [643] "Landkreis Nordsachsen"                         "Landkreis Nordwestmecklenburg"                
 [645] "Landkreis Northeim"                            "Landkreis Nürnberger Land"                    
 [647] "Landkreis Oberallgäu"                          "Landkreis Oberhavel"                          
 [649] "Landkreis Oberspreewald-Lausitz"               "Landkreis Oder-Spree"

有人可以帮我编写一些代码来将所有表达式设置为以下形状

"Nordsachsen, Landkreis"

Answer 1

将它们全部以其他格式放置可能会更容易，因为您可以用逗号很好地描绘出来。但是要回答所提出的问题，假设只有一个空格，这应该可以解决问题：

myfunc <- function(s) {
    el <- strsplit(s, ' ')[[1]]
    return(paste0(el[2], ', ', el[1]))
}

myvec <- sapply(vector_of_strings, myfunc)

如果您使用其他方法，则可以在逗号上进行拆分，以防名称中包含空格：

myfunc <- function(s) {
    el <- strsplit(s, ',')[[1]]
    el <- trimws(el)
    return(paste0(el[2], ' ', el[1]))
}

myvec <- sapply(vector_of_strings, myfunc)

编辑：如果所有条目都以Landkreis开头，则您可以实现更特定于您上下文的内容，而不能使用正则表达式进行泛化：

s <- "Landkreis Nordhausen"
trimws(gsub('(Landkreis)(.*?$)', '\\2, \\1', s))

Answer 2

由于您有一个通用的固定长度前缀，因此可以使用单独的字符将其删除，然后粘贴0进行附加。

将通用前缀转换为通用后缀的tidy解决方案：

    a <- data.frame(x = c('long words', 'long day', 'long time'))

    a %>%
      separate(x, c('A','B'), sep = 5) %>%
      mutate(
        B = paste0(B,', long')
      ) %>%
      select(-A) # to remove

更改列中单词的位置

2 个答案: