从每一行获取索引并与原始data.frame合并

时间:2018-01-01 09:19:12

标签: r dataframe

我有以下data.frame

                   user_id 1 2 3 4 5 6 7 8 9
1           54449024717783 0 0 1 0 0 0 0 0 0
2          117592134783793 0 0 0 0 0 1 0 0 0
3          187145545782493 0 0 1 0 0 0 0 0 0
4          245003020993334 0 0 0 0 0 1 0 0 0
5          332625230637592 0 1 0 0 0 0 0 0 0
6          336336752713947 0 1 0 0 0 0 0 0 0

我想要做的是创建一个列(并删除1:9)并插入列名称,其中值为1,每个用户只包含值为1的列,

如果我正在运行以下功能:

rowSums(users_cluster(users_cluster), dims = 1)

它将汇总所有行值,但我需要使用列名

复制它

6 个答案:

答案 0 :(得分:8)

Base R解决方案:

data.frame(user_id = df[, 1],
           name = which(t(df[, -1] == 1)) %% (ncol(df) - 1))

#           user_id name
# 1  54449024717783    3
# 2 117592134783793    6
# 3 187145545782493    3
# 4 245003020993334    6
# 5 332625230637592    2
# 6 336336752713947    2

答案 1 :(得分:5)

这是另一个cmd.exe R选项:

base

答案 2 :(得分:5)

另一种方法是来自max.col的{​​{1}},因为用户指定了base R

each user contain only column with the value 1

答案 3 :(得分:3)

使用的解决方案。

library(tidyverse)
dat2 <- dat %>%
  mutate(ID = 1:n()) %>%
  gather(Column, Value, -user_id, -ID) %>%
  filter(Value == 1) %>%
  arrange(ID) %>%
  select(-Value, -ID) %>%
  as.data.frame()
dat2
#           user_id Column
# 1  54449024717783      3
# 2 117592134783793      6
# 3 187145545782493      3
# 4 245003020993334      6
# 5 332625230637592      2
# 6 336336752713947      2

数据

dat <- read.table(text = "                  user_id 1 2 3 4 5 6 7 8 9
1           54449024717783 0 0 1 0 0 0 0 0 0
2          117592134783793 0 0 0 0 0 1 0 0 0
3          187145545782493 0 0 1 0 0 0 0 0 0
4          245003020993334 0 0 0 0 0 1 0 0 0
5          332625230637592 0 1 0 0 0 0 0 0 0
6          336336752713947 0 1 0 0 0 0 0 0 0",
                  header = TRUE, stringsAsFactors = FALSE)

library(tidyverse)

dat <- as.tibble(dat) %>%
  setNames(sub("X", "", names(.))) %>%
  mutate(user_id = as.character(user_id))

答案 4 :(得分:3)

另一个基础R解决方案:

df$ind = apply(df[,-1]>0,1,which)
df[,c("user_id","ind")]

输出:

       user_id ind
1 5.444902e+13   3
2 1.175921e+14   6
3 1.871455e+14   3
4 2.450030e+14   6
5 3.326252e+14   2
6 3.363368e+14   2

答案 5 :(得分:1)

为了完整起见,这里还有一个data.table解决方案,它使用melt()从长到长格式重塑:

library(data.table)
melt(setDT(DF), id = "user_id")[value == 1L][order(user_id), !"value"]

           user_id variable
1:  54449024717783        3
2: 117592134783793        6
3: 187145545782493        3
4: 245003020993334        6
5: 332625230637592        2
6: 336336752713947        2

这利用了样本数据集已按升序user_id排序的事实。

如果样本数据集具有应在最终结果中维护的不同顺序,则必须通过引入临时行ID来记住该顺序:

melt(setDT(DF), id = "user_id")[, rn := rowid(variable)][value == 1L][
  order(rn), !c("rn", "value")]

或者,

melt(setDT(DF), id = "user_id")[, rn := rowid(variable)][, setorder(.SD, rn)][
  value == 1L, !c("rn", "value")]

数据

library(data.table)
DF <- fread(
  "i                   user_id 1 2 3 4 5 6 7 8 9
  1           54449024717783 0 0 1 0 0 0 0 0 0
  2          117592134783793 0 0 0 0 0 1 0 0 0
  3          187145545782493 0 0 1 0 0 0 0 0 0
  4          245003020993334 0 0 0 0 0 1 0 0 0
  5          332625230637592 0 1 0 0 0 0 0 0 0
  6          336336752713947 0 1 0 0 0 0 0 0 0"
, drop = 1L)[, lapply(.SD, as.integer), by = user_id]