这与之前的问题here密切相关。但是我需要稍微不同的东西......
我有一长串的对象需要分成更小的列表,每个列表都有一定数量的条目。我需要能够为不同的任务更改列表的长度。问题是每个对象只能在一个列表中出现一次。
# Create some example data...
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')
# Generate a longer list, with a random sequence and number of repetitions for each entry.
set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))
print(LONG.LIST)
Name
1 Cup
2 Distance
3 Roof
4 Pencil
5 Lunch
6 Toast
7 Watch
8 Bottle
9 Car
10 Roof
11 Lunch
12 Forever
13 Cheese
14 Oranges
15 Ocean
16 Chocolate
17 Socks
18 Leaves
19 Oranges
20 Distance
21 Green
22 Paper
23 Red
24 Paper
25 Trees
26 Chocolate
27 Bottle
28 Dog
29 Wind
30 Parrot
etc....
对于论证,假设我想创建一系列20项列表。使用上面生成的示例,'Distance'
在“5”和“11”的位置“2”和位置“20”,'Lunch'
以及“14”和“19”的'Oranges'
处出现',因此第一个没有重复的列表需要扩展到包括'Green'
,'Paper'
和'Red'
。然后第二个列表将从第24位的'Paper'
开始。但是我不希望被限制在20的长度,有时我可能希望将其设置为10或25。
将@LAP中的评论合并到一起,这有助于描述我的问题; “浏览你的矢量,直到找到20个独特的物品,将它们放在一起,丢弃重复物品,然后继续向上移动,直到找到接下来的20个独特物品,依此类推,直到你的矢量结束,填充最后一部分NA
。
“单独的列表本身只需要是唯一的。两个或多个列表之间可能存在重复。”
最后一个列表可能不完整,因此最好用'NA'
填充它。理想情况下,条目在每个列表中都是按字母顺序排列的。
最有用的输出是数据帧中每列的一个列表。
答案 0 :(得分:1)
这是一个潜在的答案,它并不漂亮,但我认为这就是你所追求的:
首先是数据:
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope',
'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper',
'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon',
'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag',
'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind',
'Green', 'Chocolate', 'Car', 'Distance')
set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))
创建一个函数,它将从数据框顶部找到20个唯一元素,并根据该元素将数据框拆分为两个列表元素:
library(tidyverse)
spliter <- function(df){
df %>%
as.tibble()%>%
mutate(Name = as.character(Name),
dup = !duplicated(Name),
cum = cumsum(dup),
splt = ifelse(cum <= 20, 0, 1)) %>%
{split(df, .$splt)}
}
现在将此函数应用于结果列表的第二个元素,直到没有任何内容可以拆分删除每个列表元素中的重复项:
b <- spliter(LONG.LIST)
c1 <- list(b[[1]] %>%
filter(!duplicated(Name)))
i <- 1
while(length(b) != 1){
i <- i+1
b <- spliter(b[[2]])
c1[[i]] <- b[[1]] %>%
filter(!duplicated(Name))
}
如果需要,用NA填充最后一个元素:
c1 <- lapply(c1, function(x){
if(nrow(x) < 20){
data.frame(Name = c(as.character(x$Name), rep(NA_character_, (20-length(x$Name)))))
} else( x)
})
合并到数据框:
do.call(cbind, c1)
Name Name Name Name Name Name Name
1 Cup Green Wind Mint Book Hammock Parrot
2 Blue Tin Paper Bottle Pencil Trees Hammock
3 Cloud Blue Cheese Cheese Red Dog Pencil
4 Wind Oranges Dog Lunch Paper Socks Bag
5 Chocolate Train Peanuts Pencil Distance Train Watch
6 Toast Lemons Watch Blue Hope Peanuts Train
7 Moon Red Plane Dog Dog Hippo Horse
8 Horse Pencil Forever Ocean Bottle Horse Green
9 Ocean Trees Blue Fork Tin Red Distance
10 Car Bottle Lemons Parrot Leaves Forever Leaves
11 Tin Cloud Book Train Wind Fork Chocolate
12 Hippo Paper Bag Car Cheese Paper Ocean
13 Trees Hope Oranges Wind Socks Book Cloud
14 Lunch Ocean Train Green Fork Moon Cheese
15 Book Watch Red Leaves Plane Cloud Hope
16 Distance Roof Leaves Cloud Blue Watch <NA>
17 Cheese Toast Hippo Chocolate Forever Mint <NA>
18 Bag Forever Trees Truck Cloud Roof <NA>
19 Parrot Hippo Cloud Bag Oranges Cheese <NA>
20 Bottle Horse Distance Moon Mint Leaves <NA>
这是一个类似于ngm答案的函数:
miss <- function(y, split){
require(tidyverse)
spliter <- function(df){
df %>%
as.tibble()%>%
mutate(Name = as.character(Name),
dup = !duplicated(Name),
cum = cumsum(dup),
splt = ifelse(cum <= split, 0, 1)) %>%
{split(df, .$splt)}
}
b <- spliter(y)
c1 <- list(b[[1]] %>%
filter(!duplicated(Name)))
i <- 1
while(length(b) != 1){
i <- i+1
b <- spliter(b[[2]])
c1[[i]] <- b[[1]] %>%
filter(!duplicated(Name))
}
c1 <- lapply(c1, function(x){
if(nrow(x) < 20){
data.frame(Name = c(as.character(x$Name), rep(NA_character_, (20-length(x$Name)))))
} else( x)
})
return(do.call(cbind, c1))
}
用法:
miss(LONG.LIST, 20 )
答案 1 :(得分:1)
好吧,这是一个部分答案,因为我认为我已经得到了你需要的大部分内容。
请注意,对于庞大的数据,这可能会很慢。
首先,您可以使用之后需要组的空矢量初始化列表。在这个例子中,我们想要从200个项目的向量中创建10个20组。
首先,我们创建可重现的数据:
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red',
'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever',
'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese',
'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green',
'Chocolate', 'Car', 'Distance')
set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)), stringsAsFactors = F)
test <- vector("list", 10)
然后初始化两个计数器:
i <- 1
j <- 1
现在我们使用while
循环运行,直到i
大于要分割的向量中的项目数(因此它会在i > 200
时停止)。在这个循环中,我们检查列表中的当前子向量j
是否短于20.如果是,我们添加一个项目并进行重复数据删除,如果没有,我们将{1}添加1以跳转到下一个子向量
j
这是我们的结果:
while(i <= nrow(LONG.LIST)){
if(length(test[[j]]) < 20){
test[[j]] <- c(test[[j]], LONG.LIST$Name[i])
test[[j]] <- unique(test[[j]])
i <- i+1
}else{
j <- j+1
}
}
现在我们只需要用> test
[[1]]
[1] "Lunch" "Cheese" "Truck" "Roof" "Hope" "Mint" "Lemons" "Pencil" "Hippo" "Moon"
[11] "Car" "Chocolate" "Trees" "Distance" "Dog" "Bag" "Paper" "Peanuts" "Ocean" "Wind"
[[2]]
[1] "Hippo" "Wind" "Mint" "Plane" "Trees" "Truck" "Lemons" "Watch" "Chocolate" "Train"
[11] "Dog" "Lunch" "Green" "Horse" "Toast" "Distance" "Cloud" "Hammock" "Fork" "Paper"
[[3]]
[1] "Watch" "Hope" "Paper" "Socks" "Bag" "Plane" "Bottle" "Green" "Lunch" "Fork"
[11] "Mint" "Hippo" "Chocolate" "Car" "Trees" "Toast" "Forever" "Red" "Wind" "Ocean"
[[4]]
[1] "Car" "Lunch" "Toast" "Lemons" "Moon" "Socks" "Hippo" "Pencil" "Blue" "Fork" "Paper"
[12] "Distance" "Cloud" "Train" "Wind" "Watch" "Bottle" "Forever" "Green" "Bag"
[[5]]
[1] "Train" "Cheese" "Bottle" "Fork" "Paper" "Green" "Leaves" "Blue" "Toast" "Parrot" "Lemons" "Dog"
[13] "Hammock" "Ocean" "Red" "Peanuts" "Pencil" "Bag" "Horse" "Hope"
[[6]]
[1] "Oranges" "Truck" "Hippo" "Trees" "Parrot" "Red" "Hope" "Cloud" "Tin" "Bag"
[11] "Pencil" "Cup" "Dog" "Leaves" "Chocolate" "Mint" "Plane" "Moon" "Fork" "Green"
[[7]]
[1] "Tin" "Mint" "Book" "Bag" "Roof" "Hope" "Socks" "Watch" "Paper" "Peanuts"
[11] "Cup" "Distance" "Leaves" "Bottle" "Cloud" "Horse" "Trees" "Oranges" "Chocolate" "Toast"
[[8]]
[1] "Horse" "Watch" "Chocolate" "Tin" "Red" "Train"
[[9]]
NULL
[[10]]
NULL
填充最后一个向量。这可能会有所不同,但它完成了工作:
NA
答案 2 :(得分:1)
此函数svu
(&#34;拆分向量唯一&#34;)采用向量并根据您的规范生成数据帧。
我不明白为什么输入会是列表或数据框。将输入变为矢量似乎更自然。
words <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')
set.seed(123)
more_words <- sample(words, size = 200, replace = TRUE)
# x is the original vector and n is the desired number of
# words in each column of the resulting data frame.
svu <- function(x, n) {
# How many eventual columns?
n_cols <- trunc(length(x)/n)
# That many eventual columns all filled with NA for now.
vec_list <- lapply(1:n_cols, function(x) rep(NA, n))
# For each word...
for(string in x) {
for(i in 1:n_cols) {
if(!(string %in% vec_list[[i]]) && sum(is.na(vec_list[[i]])) > 0) {
# ...add it to a non-full column not containing that word.
vec_list[[i]][min(which(is.na(vec_list[[i]])))] <- string
break
}
}
}
# Make it a data frame
data.frame(do.call(cbind, vec_list), stringsAsFactors = FALSE)
}
尝试一下:
svu(more_words, 20)
#> X1 X2 X3 X4 X5 X6 X7
#> 1 Cup Wind Wind Wind Wind Plane Wind
#> 2 Blue Ocean Car Bottle Plane Forever Bottle
#> 3 Cloud Horse Tin Watch Forever Wind Pencil
#> 4 Wind Toast Cloud Plane Blue Cheese Distance
#> 5 Chocolate Car Bottle Forever Hippo Mint Hope
#> 6 Toast Tin Trees Blue Mint Blue Dog
#> 7 Moon Moon Ocean Lemons Bottle Lunch Tin
#> 8 Horse Cup Watch Book Cheese Train Leaves
#> 9 Ocean Green Roof Bag Lunch Bottle Cheese
#> 10 Car Blue Toast Oranges Pencil Pencil Socks
#> 11 Tin Oranges Forever Train Dog Truck Fork
#> 12 Hippo Train Blue Red Ocean Chocolate Plane
#> 13 Trees Lemons Hippo Peanuts Fork Bag Blue
#> 14 Lunch Red Horse Leaves Parrot Moon Forever
#> 15 Book Pencil Red Paper Train Car Cloud
#> 16 Distance Trees Lemons Hippo Car Parrot Oranges
#> 17 Cheese Bottle Paper Trees Green Cloud Mint
#> 18 Bag Cloud Cheese Cheese Leaves Book Hammock
#> 19 Parrot Paper Dog Cloud Cloud Red Trees
#> 20 Bottle Hope Peanuts Distance Chocolate Paper Train
#> X8 X9 X10
#> 1 Wind Wind Trees
#> 2 Pencil Pencil Paper
#> 3 Bottle Trees Red
#> 4 Cheese Peanuts Socks
#> 5 Distance Red Roof
#> 6 Trees Paper Pencil
#> 7 Dog Socks Parrot
#> 8 Socks Book Watch
#> 9 Hammock Mint Green
#> 10 Peanuts Roof Distance
#> 11 Hippo Cheese Leaves
#> 12 Horse Leaves Chocolate
#> 13 Red Moon Ocean
#> 14 Forever Parrot Cloud
#> 15 Fork Hammock Cheese
#> 16 Paper Bag Hope
#> 17 Book Watch Horse
#> 18 Moon Train <NA>
#> 19 Cloud Horse <NA>
#> 20 Watch Green <NA>