如何使用r省略data.frame中单词的倒数

时间:2016-01-27 07:12:30

标签: r combinations subset

我一直在网上寻找答案,但似乎无法接近......

我有一组代码,并使用expand.grid()查找它们的组合:

# TICKERS
A <- c("AIR", "AFAP", "AAL", "CECE", "ASA", "AVX")
# FIND COMBINATIONS
B <- expand.grid(A,A,stringsAsFactors=FALSE)

所以现在我想省略倒数,例如:

第2行和第7行是倒数,我只想保留其中一个组合而不是两者。

head(B,10)
   Var1 Var2
1   AIR  AIR
2  AFAP  AIR
3   AAL  AIR
4  CECE  AIR
5   ASA  AIR
6   AVX  AIR
7   AIR AFAP
8  AFAP AFAP
9   AAL AFAP
10 CECE AFAP

2 个答案:

答案 0 :(得分:3)

使用OP的初始输出,我们可以sort&#39; B&#39;使用带有apply的{​​{1}}行,然后获取&#39; d1&#39;的非重复逻辑索引。 MARGIN=1行,并将其用于子集&#39; B&#39;

duplicated

另一个紧凑的选项是使用d1 <- as.data.frame(t(apply(B, 1, sort))) B1 <- B[!duplicated(d1),] head(B1, 10) # Var1 Var2 #1 AIR AIR #2 AFAP AIR #3 AAL AIR #4 CECE AIR #5 ASA AIR #6 AVX AIR #8 AFAP AFAP #9 AAL AFAP #10 CECE AFAP #11 ASA AFAP

data.table

答案 1 :(得分:3)

使用包gtools代替:

library(gtools)
A <- c("AIR", "AFAP", "AAL", "CECE", "ASA", "AVX")

combinations(length(A), 2, A, repeats = FALSE)

#       [,1]   [,2]  
#  [1,] "AAL"  "AFAP"
#  [2,] "AAL"  "AIR" 
#  [3,] "AAL"  "ASA" 
#  [4,] "AAL"  "AVX" 
#  [5,] "AAL"  "CECE"
#  [6,] "AFAP" "AIR" 
#  [7,] "AFAP" "ASA" 
#  [8,] "AFAP" "AVX" 
#  [9,] "AFAP" "CECE"
# [10,] "AIR"  "ASA" 
# [11,] "AIR"  "AVX" 
# [12,] "AIR"  "CECE"
# [13,] "ASA"  "AVX" 
# [14,] "ASA"  "CECE"
# [15,] "AVX"  "CECE"