如何将矢量分组到矢量列表?

时间:2014-02-01 15:05:25

标签: r list vector grouping

我有一些看起来像这样的数据(例如假数据):

dressId        color 
6              yellow 
9              red
10             green 
10             purple 
10             yellow 
12             purple 
12             red 

其中颜色是因子向量。不能保证因子的所有可能水平实际出现在数据中(例如,颜色“蓝色”也可以是其中一个水平)。

我需要一个矢量列表,将每件衣服的可用颜色分组:

[[1]]
yellow  

[[2]] 
red    

[[3]] 
green purple yellow 

[[4]] 
purple red 

保留连衣裙的ID会很好(例如,这个列表是第二列,ID是第一列的数据帧),但不是必需的。

我写了一个循环,遍历行的数据帧行,而下一个ID是相同的,它将颜色添加到矢量。 (我确信数据按ID排序)。当第一列中的ID发生更改时,它会将向量添加到列表中:

result <- NULL 
while(blah blah) 
{
    some code which creates the vector called "colors" 
    result[[dressCounter]] <- colors 
    dressCounter <- dressCounter + 1
}

在努力让所有必要的计数变量正确之后,我发现我不高兴它不起作用。第一次,colors

[1] yellow
Levels: green yellow purple red blue

并且它被强制转换为整数,因此result变为2

在第二次循环重复中,colors仅包含红色,result变为简单整数向量[1] 2 4

在第三次重复中,colors现在是一个向量,

[1] green  purple yellow
Levels: green yellow purple red blue 

我得到了

result[[3]] <- colors
  

结果[[3]]中的错误&lt; - colors:
        提供的元素多于要替换的元素

我做错了什么?有没有办法初始化result所以它不会被转换为数字向量,但成为向量列表?

另外,还有另外一种方法来完成整个事情而不是“滚动我自己”吗?

4 个答案:

答案 0 :(得分:7)

split.data.frame是组织此活动的好方法;然后提取颜色成分。

d <- data.frame(dressId=c(6,9,10,10,10,12,12),
               color=factor(c("yellow","red","green",
                              "purple","yellow",
                              "purple","red"),
                 levels=c("red","orange","yellow",
                          "green","blue","purple")))

我认为您想要的版本实际上就是这样:

ss <- split.data.frame(d,d$dressId)

通过提取颜色分量,您可以获得更像您所请求的列表:

lapply(ss,"[[","color")

答案 1 :(得分:6)

split外,您还应考虑aggregate。使用cI作为汇总功能来获取list列:

out <- aggregate(color ~ dressId, mydf, c)
out
#   dressId                 color
# 1       6                yellow
# 2       9                   red
# 3      10 green, purple, yellow
# 4      12           purple, red
str(out)
# 'data.frame': 4 obs. of  2 variables:
#  $ dressId: int  6 9 10 12
#  $ color  :List of 4
#   ..$ 0: chr "yellow"
#   ..$ 1: chr "red"
#   ..$ 2: chr  "green" "purple" "yellow"
#   ..$ 3: chr  "purple" "red"
out$color
# $`0`
# [1] "yellow"
# 
# $`1`
# [1] "red"
# 
# $`2`
# [1] "green"  "purple" "yellow"
# 
# $`3`
# [1] "purple" "red" 

注意 :即使“颜色”变量为factor,这也适用于Ben的示例数据(我在发布时错过了这一点)以上回答)但您需要使用I作为聚合函数而不是c

out <- aggregate(color ~ dressId, d, I)
str(out)
# 'data.frame': 4 obs. of  2 variables:
#  $ dressId: num  6 9 10 12
#  $ color  :List of 4
#   ..$ 0: Factor w/ 6 levels "red","orange",..: 3
#   ..$ 1: Factor w/ 6 levels "red","orange",..: 1
#   ..$ 2: Factor w/ 6 levels "red","orange",..: 4 6 3
#   ..$ 3: Factor w/ 6 levels "red","orange",..: 6 1
out$color
# $`0`
# [1] yellow
# Levels: red orange yellow green blue purple
# 
# $`1`
# [1] red
# Levels: red orange yellow green blue purple
# 
# $`2`
# [1] green  purple yellow
# Levels: red orange yellow green blue purple
# 
# $`3`
# [1] purple red   
# Levels: red orange yellow green blue purple

然而,奇怪的是,默认显示显示整数值:

out
#   dressId   color
# 1       6       3
# 2       9       1
# 3      10 4, 6, 3
# 4      12    6, 1

答案 2 :(得分:1)

假设您的数据帧保存在名为fig, ax = plt.subplots(figsize=(8, 8)) ax.set_xlim((0, 1)) ax.set_ylim((0, 1)) x = 0.5; y = 0.5; r = 0.2 degree = np.pi filledcircle(x, y, r/2) filledcircle(x-r, y, r/2) filledcircle(x, y-r, r/2) filledcircle(x+r, y, r/2) filledcircle(x, y+r, r/2) filledcircle(x -r*np.sin(0.75*degree), y + r*np.cos(0.75*degree), r/2) filledcircle(x -r*np.sin(1.25*degree), y + r*np.cos(1.25*degree), r/2) filledcircle(x -r*np.sin(1.75*degree), y + r*np.cos(1.75*degree), r/2) filledcircle(x -r*np.sin(0.25*degree), y + r*np.cos(0.25*degree), r/2) filledcircle(x -2*r*np.sin(0.75*degree), y + 2*r*np.cos(0.75*degree), r/2) filledcircle(x -2*r*np.sin(0.25*degree), y + 2*r*np.cos(0.25*degree), r/2) filledcircle(x -2*r*np.sin(1.75*degree), y + 2*r*np.cos(1.75*degree), r/2) filledcircle(x -2*r*np.sin(1.25*degree), y + 2*r*np.cos(1.25*degree), r/2) filledcircle(x -2*r*np.sin(0*degree), y + 2*r*np.cos(0*degree), r/2) filledcircle(x -2*r*np.sin(0.5*degree), y + 2*r*np.cos(0.5*degree), r/2) filledcircle(x -2*r*np.sin(1*degree), y + 2*r*np.cos(1*degree), r/2) filledcircle(x -2*r*np.sin(1.5*degree), y + 2*r*np.cos(1.5*degree), r/2) filledcircle(x -2*r*np.sin(0.125*degree), y + 2*r*np.cos(0.125*degree), r/2) filledcircle(x -2*r*np.sin(0.375*degree), y + 2*r*np.cos(0.375*degree), r/2) filledcircle(x -2*r*np.sin(0.625*degree), y + 2*r*np.cos(0.625*degree), r/2) filledcircle(x -2*r*np.sin(0.875*degree), y + 2*r*np.cos(0.875*degree), r/2) filledcircle(x -2*r*np.sin(1.125*degree), y + 2*r*np.cos(1.125*degree), r/2) filledcircle(x -2*r*np.sin(1.375*degree), y + 2*r*np.cos(1.375*degree), r/2) filledcircle(x -2*r*np.sin(1.625*degree), y + 2*r*np.cos(1.625*degree), r/2) filledcircle(x -2*r*np.sin(1.875*degree), y + 2*r*np.cos(1.875*degree), r/2) plt.show() 的变量中,那么您可以简单地将dfgroup_bysummarize包的list函数一起使用这个

dplyr

应用于您的示例:

library('dplyr')

df %>%
  group_by(dressId) %>%
  summarize(colors = list(color))

答案 3 :(得分:0)

恐怕答案应该有所不同,您应该使用以下代码完成请求的行为

df %>%
group_by(dressId) %>%
summarize(colors = toString(unique(color)))