我想在R中做一些相当复杂的事情,我不确定从哪里开始。
我有一个看起来像这样的数据框:
main_val sub_val bit_one bit_two
one a 1 1
one a 1 0
one a 1 1
one b 1 0
two a 1 1
two b 1 1
two a 1 1
现在我计算每个主值的每个子值的比特所代表的0,1s,2和3的数量。所以这应该回归:
main_val sub_val 0s 1s 2s 3s
one a 0 0 1 2
one b 0 0 1 0
two a 0 0 0 2
two b 0 0 0 1
有关如何做到这一点的想法?我只能想到永远需要的循环(这将在很多数据上运行)。
答案 0 :(得分:5)
请原谅我之前的评论 - 我认为你实际上只需要table()
和reshape()
在基地R中执行此操作。如果您拥有真正庞大的数据,那么可能会变慢我建议调查data.table
。
# Start by turning of stringsAsFactors
options(stringsAsFactors = FALSE)
# Create fake data
fake.data <- data.frame(main_val = c("one","one","one","one","two","two","two"),
sub_val = c("a","a","a","b","a","b","a"),
bit_one = c(1,1,1,1,1,1,1),
bit_two = c(1,0,1,0,1,1,1))
# Generate a decimal representation of your two bits
fake.data$decimal <- fake.data$bit_one*1 +fake.data$bit_two*2
# Create a table of the results, then reshape it
fake.data.summary <- as.data.frame(table(Main=fake.data$main_val,
Sub=fake.data$sub_val,
Value=fake.data$decimal))
fake.data.summary <- reshape(data = fake.data.summary,
v.names = "Freq",
idvar = c("Main","Sub"),
timevar = "Value",
direction = "wide")
请注意,在此示例中,输出中只有一个和三个,因为输入中只有一个和三个。如果需要统一输出,尽管可能存在或不存在,您可能需要对输出进行一些消毒 - 但我怀疑您不需要那样,因为您可能有足够的音量来确保表示0到3
答案 1 :(得分:3)
正如@TARehman在他的回答中已经提到的,对于大型数据集,您可能想要使用data.table
。因此,@ {TARehman的答案的data.table
替代:
library(data.table)
df2 <- dcast(setDT(df)[, .("dec" = paste0("d",(bit_one*1 + bit_two*2))), by = .(main_val,sub_val)
][, .N, by = .(main_val,sub_val,dec)],
main_val + sub_val ~ dec, value.var = "N", fill = 0)
这给出了:
> df2
main_val sub_val d1 d3
1: one a 1 2
2: one b 1 0
3: two a 0 2
4: two b 0 1