我想阅读一个表并创建另一个表,该表计算在多个特定列中出现唯一ID的次数。
例如,我有一个表,其中每一行显示一个事务,userId标识每个人的角色。
buyer <- c("A", "A", "B", "A", "B", "C")
seller <- c("C", "B", "C", "B", "C", "A")
negotiator <- c("B", "C", "D", "D", "A", "B")
df <- data.frame(buyer, seller, negotiator)
df
# buyer seller negotiator
# 1 A C B
# 2 A B C
# 3 B C D
# 4 A B D
# 5 B C A
# 6 C A B
然后我想创建一个表来计算userId在事务中履行角色的次数。
# id asBuyer asSeller asNegotiator
# A 3 1 1
# B 2 2 2
# C 1 3 1
# D 0 0 2
我是否需要创建不同的数据框然后合并?
答案 0 :(得分:5)
您可以先将数据融化,然后将其制成表格。例如
dd<-reshape2::melt(df,0)
xtabs(~value+variable,dd)
# variable
# value buyer seller negotiator
# A 3 1 1
# B 2 2 2
# C 1 3 1
# D 0 0 2
答案 1 :(得分:3)
我会使用data.table
:
library(data.table)
setDT(df)
dcast(melt(df, measure.vars = names(df)), value ~ variable)
# value buyer seller negotiator
# 1: A 3 1 1
# 2: B 2 2 2
# 3: C 1 3 1
# 4: D 0 0 2
您可以将fun.aggregate = length
作为参数添加到dcast
以取消警告消息。如果您希望将该列命名为value.name = "id"
,则可以将melt
作为参数添加到id
。
setDT(df)
outDT <- data.table(id = unique(unlist(df)))
invisible(
sapply(names(df), function(jj)
outDT[df[ , .N, by = jj],
#set the name you desire by pasting;
# could use a regex or substr to
# for the first letter capital if need be
(jj2 <- paste0("as", jj)) := i.N,
#merge id to the count column
on = c(id = jj)
clean-up: missed observations were NA, set to 0
][is.na(get(jj2)), (jj2) := 0])
)
答案 2 :(得分:3)
这里的解决方案仅使用基础R(可能比其他方法慢):
lst <- lapply(names(df), function(col) as.data.frame(table(df[[col]]),responseName=col))
mergeAll <- function(x,y) merge(x,y,all=TRUE)
res <- Reduce(f=mergeAll, lst)
names(res)[1] <- 'id'
res[is.na(res)] <- 0
> res
id buyer seller negotiator
1 A 3 1 1
2 B 2 2 2
3 C 1 3 1
4 D 0 0 2
答案 3 :(得分:0)
这里的R巫师太多了。
这是我的简单解决方案,使用基本R只有ddply
(用于创建“count by by”表)和merge
(用于执行外连接)。
# Create data frame for buyer count
dfBuyer <- ddply(df, c("buyer"), summarise, count=length(seller))
colnames(dfBuyer) <- c("id", "asBuyer")
dfBuyer
# id asBuyer
# 1 A 3
# 2 B 2
# 3 C 1
# Create data frame for seller count
dfSeller <- ddply(df, c("seller"), summarise, count=length(buyer))
colnames(dfSeller) <- c("id", "asSeller")
dfSeller
# id asSeller
# 1 A 1
# 2 B 2
# 3 C 3
# Create data frame for negotiator count
dfNegotiator <- ddply(df, c("negotiator"), summarise, count=length(seller))
colnames(dfNegotiator) <- c("id", "asNegotiator")
dfNegotiator
# id asNegotiator
# 1 A 1
# 2 B 2
# 3 C 1
# 4 D 2
# merge() apparently can merge only two dataframes at a time,
# so to merge three dataframes, merge the first two and then
# the third. Use "all=TRUE" to perform outer join.
# Merge buyer and seller
dfBuyerSellerMerged <- merge(x=dfBuyer, y=dfSeller, by="id", all=TRUE)
# Merge buyer and seller and negotiator
dfBuyerSellerNegotiatorMerged <- merge(x=dfBuyerSellerMerged, y=dfNegotiator, by="id", all=TRUE)
dfBuyerSellerNegotiatorMerged
# id asBuyer asSeller asNegotiator
# 1 A 3 1 1
# 2 B 2 2 2
# 3 C 1 3 1
# 4 D NA NA 2
# Remove NAs.
dfBuyerSellerNegotiatorMerged[is.na(dfBuyerSellerNegotiatorMerged)] <- 0
dfBuyerSellerNegotiatorMerged
# id asBuyer asSeller asNegotiator
# 1 A 3 1 1
# 2 B 2 2 2
# 3 C 1 3 1
# 4 D 0 0 2