我有以下data.frame
user_id 1 2 3 4 5 6 7 8 9
1 54449024717783 0 0 1 0 0 0 0 0 0
2 117592134783793 0 0 0 0 0 1 0 0 0
3 187145545782493 0 0 1 0 0 0 0 0 0
4 245003020993334 0 0 0 0 0 1 0 0 0
5 332625230637592 0 1 0 0 0 0 0 0 0
6 336336752713947 0 1 0 0 0 0 0 0 0
我想要做的是创建一个列(并删除1:9)并插入列名称,其中值为1,每个用户只包含值为1的列,
如果我正在运行以下功能:
rowSums(users_cluster(users_cluster), dims = 1)
它将汇总所有行值,但我需要使用列名
复制它答案 0 :(得分:8)
Base R解决方案:
data.frame(user_id = df[, 1],
name = which(t(df[, -1] == 1)) %% (ncol(df) - 1))
# user_id name
# 1 54449024717783 3
# 2 117592134783793 6
# 3 187145545782493 3
# 4 245003020993334 6
# 5 332625230637592 2
# 6 336336752713947 2
答案 1 :(得分:5)
这是另一个cmd.exe
R选项:
base
答案 2 :(得分:5)
另一种方法是来自max.col
的{{1}},因为用户指定了base R
each user contain only column with the value 1
答案 3 :(得分:3)
使用tidyverse的解决方案。
library(tidyverse)
dat2 <- dat %>%
mutate(ID = 1:n()) %>%
gather(Column, Value, -user_id, -ID) %>%
filter(Value == 1) %>%
arrange(ID) %>%
select(-Value, -ID) %>%
as.data.frame()
dat2
# user_id Column
# 1 54449024717783 3
# 2 117592134783793 6
# 3 187145545782493 3
# 4 245003020993334 6
# 5 332625230637592 2
# 6 336336752713947 2
数据强>
dat <- read.table(text = " user_id 1 2 3 4 5 6 7 8 9
1 54449024717783 0 0 1 0 0 0 0 0 0
2 117592134783793 0 0 0 0 0 1 0 0 0
3 187145545782493 0 0 1 0 0 0 0 0 0
4 245003020993334 0 0 0 0 0 1 0 0 0
5 332625230637592 0 1 0 0 0 0 0 0 0
6 336336752713947 0 1 0 0 0 0 0 0 0",
header = TRUE, stringsAsFactors = FALSE)
library(tidyverse)
dat <- as.tibble(dat) %>%
setNames(sub("X", "", names(.))) %>%
mutate(user_id = as.character(user_id))
答案 4 :(得分:3)
另一个基础R解决方案:
df$ind = apply(df[,-1]>0,1,which)
df[,c("user_id","ind")]
输出:
user_id ind
1 5.444902e+13 3
2 1.175921e+14 6
3 1.871455e+14 3
4 2.450030e+14 6
5 3.326252e+14 2
6 3.363368e+14 2
答案 5 :(得分:1)
为了完整起见,这里还有一个data.table
解决方案,它使用melt()
从长到长格式重塑:
library(data.table)
melt(setDT(DF), id = "user_id")[value == 1L][order(user_id), !"value"]
user_id variable
1: 54449024717783 3
2: 117592134783793 6
3: 187145545782493 3
4: 245003020993334 6
5: 332625230637592 2
6: 336336752713947 2
这利用了样本数据集已按升序user_id
排序的事实。
如果样本数据集具有应在最终结果中维护的不同顺序,则必须通过引入临时行ID来记住该顺序:
melt(setDT(DF), id = "user_id")[, rn := rowid(variable)][value == 1L][
order(rn), !c("rn", "value")]
或者,
melt(setDT(DF), id = "user_id")[, rn := rowid(variable)][, setorder(.SD, rn)][
value == 1L, !c("rn", "value")]
library(data.table)
DF <- fread(
"i user_id 1 2 3 4 5 6 7 8 9
1 54449024717783 0 0 1 0 0 0 0 0 0
2 117592134783793 0 0 0 0 0 1 0 0 0
3 187145545782493 0 0 1 0 0 0 0 0 0
4 245003020993334 0 0 0 0 0 1 0 0 0
5 332625230637592 0 1 0 0 0 0 0 0 0
6 336336752713947 0 1 0 0 0 0 0 0 0"
, drop = 1L)[, lapply(.SD, as.integer), by = user_id]