我在简单的合并任务中遇到了问题,我正在寻找更好的解决方案。我正在从一系列调查中创建表格(我无法合并)。表格具有相同的值,但尺寸不同。
数据如下。
表x
x <- structure(c(44L, 167L), .Dim = 2L, .Dimnames = structure(list(
c("similar", "compete")), .Names = ""), class = "table")
表y
y <- structure(c(69L, 213L, 154L, 4L, 29L, 32L), .Dim = c(3L, 2L), .Dimnames = structure(list(
c("other", "compete", "similar"), c("college", "no college"
)), .Names = c("", "")), class = "table")
表z
z <- structure(c(13L, 38L, 43L, 46L, 131L, 172L, 37L, 177L, 122L,
8L, 34L, 12L, 16L, 114L, 70L, 20L, 17L, 27L), .Dim = c(3L, 6L
), .Dimnames = structure(list(c("other", "compete", "similar"
), c("skipped", "Democrat", "Independent", "Libertarian", "Republican",
"other")), .Names = c("", "")), class = "table")
我的解决方案是使用cbind
并取出不同的列,如此
cbind(y[-1,], x, z[-1,-1])
然后我了解到在R中,行名称不可靠,如果cbind的顺序混淆,表格会变得不同。这使得创建表非常不可靠。我希望能够合并3个或更多表,而不必担心合并的顺序会弄乱数据。
将不同维度的表组合在一起的更好方法是什么?
我怀疑data.table
或dplyr
可能有很好的方法,但还没有弄明白。
谢谢,如果我能更清楚地说明这个问题,请告诉我。
答案 0 :(得分:1)
不确定我是否在这里忽略了这一点,也不确定你需要“自动化”这个过程,但这可能会有所帮助:
x <- structure(c(44L, 167L), .Dim = 2L, .Dimnames = structure(list(
c("similar", "compete")), .Names = ""), class = "table")
y <- structure(c(69L, 213L, 154L, 4L, 29L, 32L), .Dim = c(3L, 2L), .Dimnames = structure(list(
c("other", "compete", "similar"), c("college", "no college"
)), .Names = c("", "")), class = "table")
z <- structure(c(13L, 38L, 43L, 46L, 131L, 172L, 37L, 177L, 122L,
8L, 34L, 12L, 16L, 114L, 70L, 20L, 17L, 27L), .Dim = c(3L, 6L
), .Dimnames = structure(list(c("other", "compete", "similar"
), c("skipped", "Democrat", "Independent", "Libertarian", "Republican",
"other")), .Names = c("", "")), class = "table")
library(dplyr)
library(tidyr)
# create data frames from tables
x = data.frame(x)
names(x) = c("group","x")
y = data.frame(y) %>% spread(Var2,Freq)
names(y)[1] = "group"
z = data.frame(z) %>% spread(Var2, Freq)
names(z)[1] = "group"
# join data frames
x %>% inner_join(y, by="group") %>% inner_join(z, by="group")
# group x college no college skipped Democrat Independent Libertarian Republican other
# 1 similar 44 154 32 43 172 122 12 70 27
# 2 compete 167 213 29 38 131 177 34 114 17
答案 1 :(得分:1)
下面的代码按行绑定数据,并用NA填充缺失列的值。从那以后,您应该能够继续进行分析。
library(plyr)
my_list <- list(as.data.frame(x),
as.data.frame(y),
as.data.frame(z))
Reduce(x = my_list, f = rbind.fill)
# resulting data.frame
Var1 Freq Var2
1 similar 44 <NA>
2 compete 167 <NA>
3 other 69 college
4 compete 213 college
5 similar 154 college
6 other 4 no college
7 compete 29 no college
8 similar 32 no college
9 other 13 skipped
10 compete 38 skipped
11 similar 43 skipped
12 other 46 Democrat
13 compete 131 Democrat
14 similar 172 Democrat
15 other 37 Independent
16 compete 177 Independent
17 similar 122 Independent
18 other 8 Libertarian
19 compete 34 Libertarian
20 similar 12 Libertarian
21 other 16 Republican
22 compete 114 Republican
23 similar 70 Republican
24 other 20 other
25 compete 17 other
26 similar 27 other