如何根据位置条件在行中合并字符串

时间:2019-04-17 17:35:14

标签: r conditional-statements combinations

事实证明,很难找到此类问题的搜索字词。 我需要编写一个脚本,以使数据框中每一行的字符串全部组合。它应该使用每个字符串一次,并且仅使字符串与第一个字符串相距两个步骤。实际上,第一列和最后一列彼此相邻。因此,它们也无法组合(实际上是一串弦)。需要将此相同的脚本应用于具有不同偶数列的数据帧,这里以8为例。

我只设法手动处理了给定列数的数据框,但没有一个表达式可以用于任何列数的数据框。

这是数据类型:

  Crop_1    Crop_2      Crop_3      Crop_4  Crop_5 Crop_6 Crop_7 Crop_8
1 Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
2 Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat

在这种情况下,理想的结果应该是以下6种选择:

                  Pair_1            Pair_2              Pair_3             Pair_4 Crop_1    Crop_2      Crop_3      Crop_4  Crop_5 Crop_6 Crop_7 Crop_8
1   Potato-Sugarbeet Onion-Grassclover       Cabbage-Wheat      Potato-Carrot Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
2 Potato-Grassclover  Sugarbeet-Potato      Cabbage-Carrot        Onion-Wheat Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat
3       Potato-Wheat      Onion-Carrot   Sugarbeet-Cabbage Grassclover-Potato Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
4      Potato-Carrot   Sugarbeet-Wheat Grassclover-Cabbage       Potato-Onion Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat
5     Potato-Cabbage      Onion-Potato     Sugarbeet-Wheat Grassclover-Carrot Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
6     Potato-Cabbage   Sugarbeet-Onion  Grassclover-Carrot       Potato-Wheat Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat

可在此处检索数据框:

structure(list(Crop_1 = structure(c(1L, 1L), .Label = "Potato", class = "factor"), 
    Crop_2 = structure(1:2, .Label = c("Onion", "Sugarbeet"), class = "factor"), 
    Crop_3 = structure(2:1, .Label = c("Grassclover", "Sugarbeet"
    ), class = "factor"), Crop_4 = structure(1:2, .Label = c("Grassclover", 
    "Potato"), class = "factor"), Crop_5 = structure(c(1L, 1L
    ), .Label = "Cabbage", class = "factor"), Crop_6 = structure(2:1, .Label = c("Onion", 
    "Potato"), class = "factor"), Crop_7 = structure(2:1, .Label = c("Carrot", 
    "Wheat"), class = "factor"), Crop_8 = structure(1:2, .Label = c("Carrot", 
    "Wheat"), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

1 个答案:

答案 0 :(得分:0)

这里有个功能可以解决问题。您需要处理的是偶数被四整除的数,而不是不能被四整除的数。对于可被四整除的那些,您可以将它们分组为四,然后按照完成的方法进行两对处理。我们使用seq.int来获取每对的起点,然后使用setdiff来获取终点。对于那些不是的人,请特别对待前6个(匹配1-4、2-5、3-6),然后像对待四个一样对待其余的人。

剩下的复杂性只是确保您可以接受tibble并返回tibble,因为这正是nestunnest的期望。

library(tidyverse)
tbl <- structure(list(Crop_1 = c("Potato", "Potato"), Crop_2 = c("Onion", "Sugarbeet"), Crop_3 = c("Sugarbeet", "Grassclover"), Crop_4 = c("Grassclover", "Potato"), Crop_5 = c("Cabbage", "Cabbage"), Crop_6 = c("Potato", "Onion"), Crop_7 = c("Wheat", "Carrot"), Crop_8 = c("Carrot", "Wheat")), class = "data.frame", row.names = c(NA, -2L))

pair_crops <- function(crop_row) {
  crop_set <- as.character(crop_row)
  n_crops <- length(crop_set)
  if (n_crops %% 2 == 1) {
    stop("Odd number of crops!")
  } else if (n_crops %% 4 == 0) {
    starts <- sort(c(seq.int(1, n_crops, 4), seq.int(2, n_crops, 4)))
  } else {
    starts <- sort(c(1:3,seq.int(7, n_crops, 4), seq.int(8, n_crops, 4)))
  }
  ends <- setdiff(1:n_crops, starts)
  tibble(
    pair = str_c(crop_set[starts], "-", crop_set[ends]),
    name = str_c("Pair_", 1:length(starts))
  ) %>%
    spread(name, pair)
}

tbl %>%
  rowid_to_column %>%
  nest(-rowid, .key = "crop") %>%
  mutate(pairs = map(crop, pair_crops)) %>%
  unnest()
#>   rowid Crop_1    Crop_2      Crop_3      Crop_4  Crop_5 Crop_6 Crop_7
#> 1     1 Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat
#> 2     2 Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot
#>   Crop_8             Pair_1            Pair_2         Pair_3        Pair_4
#> 1 Carrot   Potato-Sugarbeet Onion-Grassclover  Cabbage-Wheat Potato-Carrot
#> 2  Wheat Potato-Grassclover  Sugarbeet-Potato Cabbage-Carrot   Onion-Wheat

reprex package(v0.2.1)于2019-04-19创建