创建使用下划线连接的度量组合

时间:2017-01-04 20:38:46

标签: r dataframe data.table dplyr

我有一个数据帧df1

ID <- c("A","B","C")
Measurement <- c("Length","Height","Breadth")
df1 <- data.frame(ID,Measurement)

我正在尝试使用它们之间的下划线创建测量组合并将其放在ID列下#34; ALL&#34;

这是我的所需输出

   ID           Measurement
    A                Length
    B                Height
    C               Breadth
  ALL Length_Height_Breadth
  ALL Length_Breadth_Height
  ALL Breadth_Height_Length
  ALL Breadth_Length_Height
  ALL Height_Length_Breadth
  ALL Height_Breadth_Length

同样,在&#34;测量&#34;专栏,我想消除下划线。

例如:

ID <- c("A","B")
Measurement <- c("Length","Length")
df2 <- data.frame(ID,Measurement)

然后我希望所需的输出

   ID           Measurement
    A                Length
    B                Length
  ALL                Length

我正在尝试做一些完全错误的事情

df1$ID <- paste(df1$Measurement, df1$Measurement, sep="_")

有人能指出我正确的方向来实现上述产出吗?

我想看看它是如何以编程方式完成的,而不是使用实际的测量名称。我打算将逻辑应用于具有多个测量名称的更大数据集,因此非常感谢一般解决方案。

1 个答案:

答案 0 :(得分:2)

我们可以使用 permn 套餐中的combinat功能:

library(combinat)
sol_1 <- sapply(permn(unique(df1$Measurement)), 
                FUN = function(x) paste(x, collapse = '_'))
rbind.data.frame(df1, data.frame('ID' = 'All', 'Measurement' = sol_1))

#    ID           Measurement
# 1   A                Length
# 2   B                Height
# 3   C               Breadth
# 4 All Length_Height_Breadth
# 5 All Length_Breadth_Height
# 6 All Breadth_Length_Height
# 7 All Breadth_Height_Length
# 8 All Height_Breadth_Length
# 9 All Height_Length_Breadth

sol_2 <- sapply(permn(unique(df2$Measurement)), 
                FUN = function(x) paste(x, collapse = '_'))
rbind.data.frame(df2, data.frame('ID' = 'All', 'Measurement' = sol_2))

#    ID Measurement
# 1   A      Length
# 2   B      Length
# 3 All      Length

在信用到期时给予信用:Generating all distinct permutations of a list

我们还可以使用 permutations 包中的gtools(HT @ joel.wilson):

library(gtools)
unique_meas <- as.character(unique(df1$Measurement))
apply(permutations(length(unique_meas), length(unique_meas), unique_meas),
      1, FUN = function(x) paste(x, collapse = '_'))

# "Breadth_Height_Length" "Breadth_Length_Height" 
# "Height_Breadth_Length" "Height_Length_Breadth"
# "Length_Breadth_Height" "Length_Height_Breadth"