Question

我有data.frame类似于：

> df<-data.frame(ID=1:5, Age30_40=c(1,0,0,1,0),Age40_50=c(0,1,1,0,0), Age50_60=c(0,0,0,0,1))
> df
  ID Age30_40 Age40_50 Age50_60
1  1        1        0        0
2  2        0        1        0
3  3        0        1        0
4  4        1        0        0
5  5        0        0        1

我想将此宽数据转换为长数据，以便输出如下：

ID  AgeRange
1   Age30_40
2   Age40_50
3   Age40_50
4   Age30_40
5   Age50_60

然而，到目前为止，我所能做的就是：

> library(tidyr)
> gather(df, AgeRange, InThisRange, Age30_40:Age50_60)
   ID AgeRange InThisRange
1   1 Age30_40   1
2   2 Age30_40   0
3   3 Age30_40   0
4   4 Age30_40   1
5   5 Age30_40   0
6   1 Age40_50   0
7   2 Age40_50   1
8   3 Age40_50   1
9   4 Age40_50   0
10  5 Age40_50   0
11  1 Age50_60   0
12  2 Age50_60   0
13  3 Age50_60   0
14  4 Age50_60   0
15  5 Age50_60   1

由于这是一个简化的示例，我可以循环遍历每个条目，并通过检查布尔变量AgeRange来保留与每个ID对应的唯一非零InThisRange条目。但是，对于大型数据帧来说，这绝对不是一个非常有效的解决方案。有没有更简单的方法来实现这一目标？

Answer 1

我们可以使用max.col中的base R（假设每行只有一个）

cbind(df[1], AgeRange = names(df)[-1][max.col(df[-1])])
#  ID AgeRange
#1  1 Age30_40
#2  2 Age40_50
#3  3 Age40_50
#4  4 Age30_40
#5  5 Age50_60

Answer 2

可以使用apply列的名称来实现基于dynamic的解决方案。

#Columns to select to change in long format
cols <- grepl("^Age.*",names(df))

cbind(df[!cols], AgeRange = apply(df[cols],1,function(x)names(x[x!=0][1])))
#   ID AgeRange
# 1  1 Age30_40
# 2  2 Age40_50
# 3  3 Age40_50
# 4  4 Age30_40
# 5  5 Age50_60

注意：以上解决方案仅考虑left-most列，其中多列在同一行中得到1。

Answer 3

这个怎么样？

  AgeRange ID another_col
1 Age30_40  1         101
2 Age40_50  2         102
3 Age40_50  3         103
4 Age30_40  4         104
5 Age50_60  5         105

输出为：

df <- structure(list(ID = 1:5, Age30_40 = c(1, 0, 0, 1, 0), Age40_50 = c(0, 
1, 1, 0, 0), Age50_60 = c(0, 0, 0, 0, 1), another_col = 101:105), .Names = c("ID", 
"Age30_40", "Age40_50", "Age50_60", "another_col"), row.names = c(NA, 
-5L), class = "data.frame")

示例数据：

another_col

注意：为了便于说明，我又添加了一列（即{{1}}）。

对于互斥类别，从宽格式转换为长格式

3 个答案: