Question

我想使用因素，但遇到了一些问题。我的问题可能表明我完全不了解因素：）

考虑以下数据＆＃34; my_data＆＃34;例如（真实数据更大，但形式相似）：

0,stack
0,exchange
0,overflow
1,list
1,stack

第一列是 user_id ，第二列是用户使用的标记。我想计算所有用户的标签交叉点。为了有效地做到这一点，我想在整数而不是字符串上计算它们。

所以我虽然因素是要走的路。从第2列中得出一个因子给出了一个看起来像这样的因素：

Factor w/ levels "stack", "exchange", "overflow", "list": 1 2 3 4 1

我接下来要做的是：

unique(my_data[my_data$V1 == 0, 2])

获取id为0的用户的所有标签，这当然会返回带字符串的向量。我现在如何从该向量的因子中获得相应的指数？

Answer 1

我可以建议data.table包！

test <- data.table(id = c(0, 0, 0, 1, 1), tag = c('a', 'b', 'c', 'd', 'a'))

## Sort IDs by tag 
test[,id, by = tag]

## Return True for tags used by ID == 0, False otherwise
test[,id==0, by = tag]

## Return tags used by ID == 0
test[id ==0,id, by = tag]
test[id ==0,tag]

## Return tags used by all IDs
n.ids<-length(unique(test$id))
test[length(unique(id==n.ids)),tag]

希望有所帮助！

如何有效地利用r中的因素？

1 个答案: