Question

这与之前的问题here密切相关。但是我需要稍微不同的东西......

我有一长串的对象需要分成更小的列表，每个列表都有一定数量的条目。我需要能够为不同的任务更改列表的长度。问题是每个对象只能在一个列表中出现一次。

# Create some example data... 
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')

# Generate a longer list, with a random sequence and number of repetitions for each entry.
set.seed(123)

LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))

print(LONG.LIST)

Name
1         Cup
2    Distance
3        Roof
4      Pencil
5       Lunch
6       Toast
7       Watch
8      Bottle
9         Car
10       Roof
11      Lunch
12    Forever
13     Cheese
14    Oranges
15      Ocean
16  Chocolate
17      Socks
18     Leaves
19    Oranges
20   Distance
21      Green
22      Paper
23        Red
24      Paper
25      Trees
26  Chocolate
27     Bottle
28        Dog
29       Wind
30     Parrot
etc....

对于论证，假设我想创建一系列20项列表。使用上面生成的示例，'Distance'在“5”和“11”的位置“2”和位置“20”，'Lunch'以及“14”和“19”的'Oranges'处出现'，因此第一个没有重复的列表需要扩展到包括'Green'，'Paper'和'Red'。然后第二个列表将从第24位的'Paper'开始。但是我不希望被限制在20的长度，有时我可能希望将其设置为10或25。

将@LAP中的评论合并到一起，这有助于描述我的问题; “浏览你的矢量，直到找到20个独特的物品，将它们放在一起，丢弃重复物品，然后继续向上移动，直到找到接下来的20个独特物品，依此类推，直到你的矢量结束，填充最后一部分NA。

“单独的列表本身只需要是唯一的。两个或多个列表之间可能存在重复。”

最后一个列表可能不完整，因此最好用'NA'填充它。理想情况下，条目在每个列表中都是按字母顺序排列的。

最有用的输出是数据帧中每列的一个列表。

Answer 1

这是一个潜在的答案，它并不漂亮，但我认为这就是你所追求的：

首先是数据：

LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 
         'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 
         'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 
         'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 
         'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 
         'Green', 'Chocolate', 'Car', 'Distance')

set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))

创建一个函数，它将从数据框顶部找到20个唯一元素，并根据该元素将数据框拆分为两个列表元素：

library(tidyverse)

spliter <- function(df){
  df  %>%
  as.tibble()%>%
    mutate(Name = as.character(Name),
           dup = !duplicated(Name),
           cum = cumsum(dup),
           splt = ifelse(cum <= 20, 0, 1)) %>%
    {split(df, .$splt)} 
}

现在将此函数应用于结果列表的第二个元素，直到没有任何内容可以拆分删除每个列表元素中的重复项：

b <- spliter(LONG.LIST)
c1 <- list(b[[1]] %>%
            filter(!duplicated(Name)))

i <- 1
while(length(b) != 1){
  i <- i+1
  b <- spliter(b[[2]])

  c1[[i]] <- b[[1]] %>%
    filter(!duplicated(Name))
}

如果需要，用NA填充最后一个元素：

c1 <- lapply(c1, function(x){
  if(nrow(x) <  20){
    data.frame(Name = c(as.character(x$Name), rep(NA_character_, (20-length(x$Name)))))
  } else( x)
})

合并到数据框：

do.call(cbind, c1)

        Name    Name     Name      Name     Name    Name      Name
1        Cup   Green     Wind      Mint     Book Hammock    Parrot
2       Blue     Tin    Paper    Bottle   Pencil   Trees   Hammock
3      Cloud    Blue   Cheese    Cheese      Red     Dog    Pencil
4       Wind Oranges      Dog     Lunch    Paper   Socks       Bag
5  Chocolate   Train  Peanuts    Pencil Distance   Train     Watch
6      Toast  Lemons    Watch      Blue     Hope Peanuts     Train
7       Moon     Red    Plane       Dog      Dog   Hippo     Horse
8      Horse  Pencil  Forever     Ocean   Bottle   Horse     Green
9      Ocean   Trees     Blue      Fork      Tin     Red  Distance
10       Car  Bottle   Lemons    Parrot   Leaves Forever    Leaves
11       Tin   Cloud     Book     Train     Wind    Fork Chocolate
12     Hippo   Paper      Bag       Car   Cheese   Paper     Ocean
13     Trees    Hope  Oranges      Wind    Socks    Book     Cloud
14     Lunch   Ocean    Train     Green     Fork    Moon    Cheese
15      Book   Watch      Red    Leaves    Plane   Cloud      Hope
16  Distance    Roof   Leaves     Cloud     Blue   Watch      <NA>
17    Cheese   Toast    Hippo Chocolate  Forever    Mint      <NA>
18       Bag Forever    Trees     Truck    Cloud    Roof      <NA>
19    Parrot   Hippo    Cloud       Bag  Oranges  Cheese      <NA>
20    Bottle   Horse Distance      Moon     Mint  Leaves      <NA>

这是一个类似于ngm答案的函数：

miss <- function(y, split){
  require(tidyverse)
  spliter <- function(df){
    df  %>%
      as.tibble()%>%
      mutate(Name = as.character(Name),
             dup = !duplicated(Name),
             cum = cumsum(dup),
             splt = ifelse(cum <= split, 0, 1)) %>%
             {split(df, .$splt)} 
    }
  b <- spliter(y)
  c1 <- list(b[[1]] %>%
            filter(!duplicated(Name)))
  i <- 1
  while(length(b) != 1){
    i <- i+1
    b <- spliter(b[[2]])

      c1[[i]] <- b[[1]] %>%
        filter(!duplicated(Name))
      }
  c1 <- lapply(c1, function(x){
    if(nrow(x) <  20){
      data.frame(Name = c(as.character(x$Name), rep(NA_character_, (20-length(x$Name)))))
      } else( x)
    })
  return(do.call(cbind, c1))
}

用法：

miss(LONG.LIST, 20 )

Answer 2

好吧，这是一个部分答案，因为我认为我已经得到了你需要的大部分内容。

请注意，对于庞大的数据，这可能会很慢。

首先，您可以使用之后需要组的空矢量初始化列表。在这个例子中，我们想要从200个项目的向量中创建10个20组。

首先，我们创建可重现的数据：

LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 
          'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 
          'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 
          'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 
          'Chocolate', 'Car', 'Distance')

set.seed(123)

LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)), stringsAsFactors = F)

test <- vector("list", 10)

然后初始化两个计数器：

i <- 1
j <- 1

现在我们使用while循环运行，直到i大于要分割的向量中的项目数（因此它会在i > 200时停止）。在这个循环中，我们检查列表中的当前子向量j是否短于20.如果是，我们添加一个项目并进行重复数据删除，如果没有，我们将{1}添加1以跳转到下一个子向量

这是我们的结果：

while(i <= nrow(LONG.LIST)){
  if(length(test[[j]]) < 20){
      test[[j]] <- c(test[[j]], LONG.LIST$Name[i])
      test[[j]] <- unique(test[[j]])
      i <- i+1
  }else{
      j <- j+1
    }
}

现在我们只需要用> test [[1]] [1] "Lunch" "Cheese" "Truck" "Roof" "Hope" "Mint" "Lemons" "Pencil" "Hippo" "Moon" [11] "Car" "Chocolate" "Trees" "Distance" "Dog" "Bag" "Paper" "Peanuts" "Ocean" "Wind" [[2]] [1] "Hippo" "Wind" "Mint" "Plane" "Trees" "Truck" "Lemons" "Watch" "Chocolate" "Train" [11] "Dog" "Lunch" "Green" "Horse" "Toast" "Distance" "Cloud" "Hammock" "Fork" "Paper" [[3]] [1] "Watch" "Hope" "Paper" "Socks" "Bag" "Plane" "Bottle" "Green" "Lunch" "Fork" [11] "Mint" "Hippo" "Chocolate" "Car" "Trees" "Toast" "Forever" "Red" "Wind" "Ocean" [[4]] [1] "Car" "Lunch" "Toast" "Lemons" "Moon" "Socks" "Hippo" "Pencil" "Blue" "Fork" "Paper" [12] "Distance" "Cloud" "Train" "Wind" "Watch" "Bottle" "Forever" "Green" "Bag" [[5]] [1] "Train" "Cheese" "Bottle" "Fork" "Paper" "Green" "Leaves" "Blue" "Toast" "Parrot" "Lemons" "Dog" [13] "Hammock" "Ocean" "Red" "Peanuts" "Pencil" "Bag" "Horse" "Hope" [[6]] [1] "Oranges" "Truck" "Hippo" "Trees" "Parrot" "Red" "Hope" "Cloud" "Tin" "Bag" [11] "Pencil" "Cup" "Dog" "Leaves" "Chocolate" "Mint" "Plane" "Moon" "Fork" "Green" [[7]] [1] "Tin" "Mint" "Book" "Bag" "Roof" "Hope" "Socks" "Watch" "Paper" "Peanuts" [11] "Cup" "Distance" "Leaves" "Bottle" "Cloud" "Horse" "Trees" "Oranges" "Chocolate" "Toast" [[8]] [1] "Horse" "Watch" "Chocolate" "Tin" "Red" "Train" [[9]] NULL [[10]] NULL填充最后一个向量。这可能会有所不同，但它完成了工作：

NA

Answer 3

此函数svu（＆＃34;拆分向量唯一＆＃34;）采用向量并根据您的规范生成数据帧。

我不明白为什么输入会是列表或数据框。将输入变为矢量似乎更自然。

words <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')
set.seed(123)
more_words <- sample(words, size = 200, replace = TRUE)

# x is the original vector and n is the desired number of 
# words in each column of the resulting data frame.
svu <- function(x, n) {
  # How many eventual columns?
  n_cols <- trunc(length(x)/n)
  # That many eventual columns all filled with NA for now.
  vec_list <- lapply(1:n_cols, function(x) rep(NA, n))

  # For each word...
  for(string in x) {
    for(i in 1:n_cols) {
      if(!(string %in% vec_list[[i]]) && sum(is.na(vec_list[[i]])) > 0) {
        # ...add it to a non-full column not containing that word.
        vec_list[[i]][min(which(is.na(vec_list[[i]])))] <- string
        break
      }
    }
  }
  # Make it a data frame
  data.frame(do.call(cbind, vec_list), stringsAsFactors = FALSE)
}

尝试一下：

svu(more_words, 20)                                                                                                                                                                                                                                                                                                                                                                
#>           X1      X2      X3       X4        X5        X6       X7
#> 1        Cup    Wind    Wind     Wind      Wind     Plane     Wind
#> 2       Blue   Ocean     Car   Bottle     Plane   Forever   Bottle
#> 3      Cloud   Horse     Tin    Watch   Forever      Wind   Pencil
#> 4       Wind   Toast   Cloud    Plane      Blue    Cheese Distance
#> 5  Chocolate     Car  Bottle  Forever     Hippo      Mint     Hope
#> 6      Toast     Tin   Trees     Blue      Mint      Blue      Dog
#> 7       Moon    Moon   Ocean   Lemons    Bottle     Lunch      Tin
#> 8      Horse     Cup   Watch     Book    Cheese     Train   Leaves
#> 9      Ocean   Green    Roof      Bag     Lunch    Bottle   Cheese
#> 10       Car    Blue   Toast  Oranges    Pencil    Pencil    Socks
#> 11       Tin Oranges Forever    Train       Dog     Truck     Fork
#> 12     Hippo   Train    Blue      Red     Ocean Chocolate    Plane
#> 13     Trees  Lemons   Hippo  Peanuts      Fork       Bag     Blue
#> 14     Lunch     Red   Horse   Leaves    Parrot      Moon  Forever
#> 15      Book  Pencil     Red    Paper     Train       Car    Cloud
#> 16  Distance   Trees  Lemons    Hippo       Car    Parrot  Oranges
#> 17    Cheese  Bottle   Paper    Trees     Green     Cloud     Mint
#> 18       Bag   Cloud  Cheese   Cheese    Leaves      Book  Hammock
#> 19    Parrot   Paper     Dog    Cloud     Cloud       Red    Trees
#> 20    Bottle    Hope Peanuts Distance Chocolate     Paper    Train
#>          X8      X9       X10
#> 1      Wind    Wind     Trees
#> 2    Pencil  Pencil     Paper
#> 3    Bottle   Trees       Red
#> 4    Cheese Peanuts     Socks
#> 5  Distance     Red      Roof
#> 6     Trees   Paper    Pencil
#> 7       Dog   Socks    Parrot
#> 8     Socks    Book     Watch
#> 9   Hammock    Mint     Green
#> 10  Peanuts    Roof  Distance
#> 11    Hippo  Cheese    Leaves
#> 12    Horse  Leaves Chocolate
#> 13      Red    Moon     Ocean
#> 14  Forever  Parrot     Cloud
#> 15     Fork Hammock    Cheese
#> 16    Paper     Bag      Hope
#> 17     Book   Watch     Horse
#> 18     Moon   Train      <NA>
#> 19    Cloud   Horse      <NA>
#> 20    Watch   Green      <NA>

将长列表划分为R中指定长度的短列表

3 个答案: