结合R中的向量和表格

时间:2015-10-02 20:07:31

标签: r merge data.table dplyr

我在简单的合并任务中遇到了问题,我正在寻找更好的解决方案。我正在从一系列调查中创建表格(我无法合并)。表格具有相同的值,但尺寸不同。

数据如下。

表x

x <- structure(c(44L, 167L), .Dim = 2L, .Dimnames = structure(list(
    c("similar", "compete")), .Names = ""), class = "table")

表y

y <- structure(c(69L, 213L, 154L, 4L, 29L, 32L), .Dim = c(3L, 2L), .Dimnames = structure(list(
    c("other", "compete", "similar"), c("college", "no college"
    )), .Names = c("", "")), class = "table")

表z

z <- structure(c(13L, 38L, 43L, 46L, 131L, 172L, 37L, 177L, 122L, 
8L, 34L, 12L, 16L, 114L, 70L, 20L, 17L, 27L), .Dim = c(3L, 6L
), .Dimnames = structure(list(c("other", "compete", "similar"
), c("skipped", "Democrat", "Independent", "Libertarian", "Republican", 
"other")), .Names = c("", "")), class = "table")

我的解决方案是使用cbind并取出不同的列,如此

cbind(y[-1,], x,  z[-1,-1])

然后我了解到在R中,行名称不可靠,如果cbind的顺序混淆,表格会变得不同。这使得创建表非常不可靠。我希望能够合并3个或更多表,而不必担心合并的顺序会弄乱数据。

将不同维度的表组合在一起的更好方法是什么?

我怀疑data.tabledplyr可能有很好的方法,但还没有弄明白。

谢谢,如果我能更清楚地说明这个问题,请告诉我。

2 个答案:

答案 0 :(得分:1)

不确定我是否在这里忽略了这一点,也不确定你需要“自动化”这个过程,但这可能会有所帮助:

x <- structure(c(44L, 167L), .Dim = 2L, .Dimnames = structure(list(
  c("similar", "compete")), .Names = ""), class = "table")

y <- structure(c(69L, 213L, 154L, 4L, 29L, 32L), .Dim = c(3L, 2L), .Dimnames = structure(list(
  c("other", "compete", "similar"), c("college", "no college"
  )), .Names = c("", "")), class = "table")

z <- structure(c(13L, 38L, 43L, 46L, 131L, 172L, 37L, 177L, 122L, 
                 8L, 34L, 12L, 16L, 114L, 70L, 20L, 17L, 27L), .Dim = c(3L, 6L
                 ), .Dimnames = structure(list(c("other", "compete", "similar"
                 ), c("skipped", "Democrat", "Independent", "Libertarian", "Republican", 
                      "other")), .Names = c("", "")), class = "table")

library(dplyr)
library(tidyr)

# create data frames from tables
x = data.frame(x)
names(x) = c("group","x")

y = data.frame(y) %>% spread(Var2,Freq)
names(y)[1] = "group"

z = data.frame(z) %>% spread(Var2, Freq)
names(z)[1] = "group"

# join data frames
x %>% inner_join(y, by="group") %>% inner_join(z, by="group")

#     group   x college no college skipped Democrat Independent Libertarian Republican other
# 1 similar  44     154         32      43      172         122          12         70    27
# 2 compete 167     213         29      38      131         177          34        114    17

答案 1 :(得分:1)

下面的代码按行绑定数据,并用NA填充缺失列的值。从那以后,您应该能够继续进行分析。

library(plyr)

my_list <- list(as.data.frame(x),
                as.data.frame(y),
                as.data.frame(z))


Reduce(x = my_list, f = rbind.fill)

# resulting data.frame

      Var1 Freq        Var2
1  similar   44        <NA>
2  compete  167        <NA>
3    other   69     college
4  compete  213     college
5  similar  154     college
6    other    4  no college
7  compete   29  no college
8  similar   32  no college
9    other   13     skipped
10 compete   38     skipped
11 similar   43     skipped
12   other   46    Democrat
13 compete  131    Democrat
14 similar  172    Democrat
15   other   37 Independent
16 compete  177 Independent
17 similar  122 Independent
18   other    8 Libertarian
19 compete   34 Libertarian
20 similar   12 Libertarian
21   other   16  Republican
22 compete  114  Republican
23 similar   70  Republican
24   other   20       other
25 compete   17       other
26 similar   27       other