在复杂数据帧中计算真/假

时间:2015-09-30 19:19:36

标签: r dataframe

我想在R中做一些相当复杂的事情,我不确定从哪里开始。

我有一个看起来像这样的数据框:

main_val sub_val bit_one bit_two
 one      a        1       1
 one      a        1       0
 one      a        1       1
 one      b        1       0
 two      a        1       1
 two      b        1       1
 two      a        1       1

现在我计算每个主值的每个子值的比特所代表的0,1s,2和3的数量。所以这应该回归:

main_val sub_val  0s  1s  2s  3s
 one       a      0   0   1   2
 one       b      0   0   1   0
 two       a      0   0   0   2
 two       b      0   0   0   1

有关如何做到这一点的想法?我只能想到永远需要的循环(这将在很多数据上运行)。

2 个答案:

答案 0 :(得分:5)

请原谅我之前的评论 - 我认为你实际上只需要table()reshape()在基地R中执行此操作。如果您拥有真正庞大的数据,那么可能会变慢我建议调查data.table

# Start by turning of stringsAsFactors
options(stringsAsFactors = FALSE)

# Create fake data
fake.data <- data.frame(main_val = c("one","one","one","one","two","two","two"),
                        sub_val = c("a","a","a","b","a","b","a"),
                        bit_one = c(1,1,1,1,1,1,1),
                        bit_two = c(1,0,1,0,1,1,1))

# Generate a decimal representation of your two bits
fake.data$decimal <- fake.data$bit_one*1 +fake.data$bit_two*2

# Create a table of the results, then reshape it
fake.data.summary <- as.data.frame(table(Main=fake.data$main_val,
                                         Sub=fake.data$sub_val,
                                         Value=fake.data$decimal))

fake.data.summary <- reshape(data = fake.data.summary,
                             v.names = "Freq",
                             idvar = c("Main","Sub"),
                             timevar = "Value",
                             direction = "wide")

请注意,在此示例中,输出中只有一个和三个,因为输入中只有一个和三个。如果需要统一输出,尽管可能存在或不存在,您可能需要对输出进行一些消毒 - 但我怀疑您不需要那样,因为您可能有足够的音量来确保表示0到3

答案 1 :(得分:3)

正如@TARehman在他的回答中已经提到的,对于大型数据集,您可能想要使用data.table。因此,@ {TARehman的答案的data.table替代:

library(data.table)
df2 <- dcast(setDT(df)[, .("dec" = paste0("d",(bit_one*1 + bit_two*2))), by = .(main_val,sub_val)
                       ][, .N, by = .(main_val,sub_val,dec)], 
             main_val + sub_val ~ dec, value.var = "N", fill = 0)

这给出了:

> df2
   main_val sub_val d1 d3
1:      one       a  1  2
2:      one       b  1  0
3:      two       a  0  2
4:      two       b  0  1