计算项目在多列中的每一列中出现的次数

时间:2016-05-11 20:29:50

标签: r dataframe

我想阅读一个表并创建另一个表,该表计算在多个特定列中出现唯一ID的次数。

例如,我有一个表,其中每一行显示一个事务,userId标识每个人的角色。

buyer      <- c("A", "A", "B", "A", "B", "C")
seller     <- c("C", "B", "C", "B", "C", "A")
negotiator <- c("B", "C", "D", "D", "A", "B")

df <- data.frame(buyer, seller, negotiator)
df
#    buyer seller negotiator
#  1     A      C          B
#  2     A      B          C
#  3     B      C          D
#  4     A      B          D
#  5     B      C          A
#  6     C      A          B

然后我想创建一个表来计算userId在事务中履行角色的次数。

#   id  asBuyer  asSeller  asNegotiator
#    A        3         1             1
#    B        2         2             2
#    C        1         3             1
#    D        0         0             2

我是否需要创建不同的数据框然后合并?

4 个答案:

答案 0 :(得分:5)

您可以先将数据融化,然后将其制成表格。例如

dd<-reshape2::melt(df,0)
xtabs(~value+variable,dd)
#      variable
# value buyer seller negotiator
#     A     3      1          1
#     B     2      2          2
#     C     1      3          1
#     D     0      0          2

答案 1 :(得分:3)

我会使用data.table

更新灵感来自MrFlick

library(data.table)
setDT(df)
dcast(melt(df, measure.vars = names(df)), value ~ variable)
#    value buyer seller negotiator
# 1:     A     3      1          1
# 2:     B     2      2          2
# 3:     C     1      3          1
# 4:     D     0      0          2

您可以将fun.aggregate = length作为参数添加到dcast以取消警告消息。如果您希望将该列命名为value.name = "id",则可以将melt作为参数添加到id

原创,更长的答案

setDT(df)

outDT <- data.table(id = unique(unlist(df)))

invisible(
  sapply(names(df), function(jj)
    outDT[df[ , .N, by = jj], 
          #set the name you desire by pasting;
          #  could use a regex or substr to 
          #  for the first letter capital if need be
          (jj2 <- paste0("as", jj)) := i.N,
          #merge id to the count column
          on = c(id = jj)
          clean-up: missed observations were NA, set to 0
          ][is.na(get(jj2)), (jj2) := 0])
)

答案 2 :(得分:3)

这里的解决方案仅使用基础R(可能比其他方法慢):

lst <- lapply(names(df), function(col) as.data.frame(table(df[[col]]),responseName=col))

mergeAll <- function(x,y) merge(x,y,all=TRUE)

res <- Reduce(f=mergeAll, lst)
names(res)[1] <- 'id'
res[is.na(res)] <- 0

> res
  id buyer seller negotiator
1  A     3      1          1
2  B     2      2          2
3  C     1      3          1
4  D     0      0          2

答案 3 :(得分:0)

这里的R巫师太多了。

这是我的简单解决方案,使用基本R只有ddply(用于创建“count by by”表)和merge(用于执行外连接)。

# Create data frame for buyer count
dfBuyer <- ddply(df, c("buyer"), summarise, count=length(seller))
colnames(dfBuyer) <- c("id", "asBuyer")

dfBuyer
#    id asBuyer
#  1  A       3
#  2  B       2
#  3  C       1


# Create data frame for seller count
dfSeller <- ddply(df, c("seller"), summarise, count=length(buyer))
colnames(dfSeller) <- c("id", "asSeller")

dfSeller
#    id asSeller
#  1  A        1
#  2  B        2
#  3  C        3


# Create data frame for negotiator count
dfNegotiator <- ddply(df, c("negotiator"), summarise, count=length(seller))
colnames(dfNegotiator) <- c("id", "asNegotiator")

dfNegotiator
#    id asNegotiator
#  1  A            1
#  2  B            2
#  3  C            1
#  4  D            2


# merge() apparently can merge only two dataframes at a time,
# so to merge three dataframes, merge the first two and then
# the third. Use "all=TRUE" to perform outer join.

# Merge buyer and seller
dfBuyerSellerMerged <- merge(x=dfBuyer, y=dfSeller, by="id", all=TRUE)

# Merge buyer and seller and negotiator
dfBuyerSellerNegotiatorMerged <- merge(x=dfBuyerSellerMerged, y=dfNegotiator, by="id", all=TRUE) 

dfBuyerSellerNegotiatorMerged
#   id asBuyer asSeller asNegotiator
# 1  A       3        1            1
# 2  B       2        2            2
# 3  C       1        3            1
# 4  D      NA       NA            2


# Remove NAs.
dfBuyerSellerNegotiatorMerged[is.na(dfBuyerSellerNegotiatorMerged)] <- 0

dfBuyerSellerNegotiatorMerged
#   id asBuyer asSeller asNegotiator
# 1  A       3        1            1
# 2  B       2        2            2
# 3  C       1        3            1
# 4  D       0        0            2