查找所有列组合的所有因子组合的频率

时间:2019-10-07 03:48:58

标签: r dataframe

我有一个包含n个变量的数据框,其值都是因素。现在,我想从此数据帧中选择m列(m

我查了一下,但是我只发现如果选择特定的列,如何找到因子组合的频率。就我而言,由于m

这是我们的数据,所有变量都有因子值。

company <- data.frame("country" = c("USA", "China", 'France', "Germany"),
                    "category" = c("C-corp", "S-corp", "C-corp", "LLC"),
                    "Type" = c("Public", "Private", "Private", "Private"),
                    "Profit" = c("High", "High", "High", "Low"))

现在我想选择2列(m = 2),并找出所有可能变量的因子组合频率

在这种情况下,我可以输入“国家=美国&类别= S-Corp”,“国家=美国&类别= C-Corp”,“国家=中国&类别= LLC”。但是我也可以选择其他列,并具有“国家=美国&利润=低”,“国家=中国&类型=公共”。我想知道所有这些组合的频率

编辑:我的预期输出是

country = USA, category = C-corp  freq 1
country = USA, category = S-corp  freq 0
country = USA, category = LLC  freq 0
country = China, category = LLC  freq 0
country = France, category = C-corp  freq 1
country = USA, type = Public    freq 1
country = China, type = Public    freq 0
Type = Private, Profit = High   freq 2
Type = Public, category = LLC  freq 0
category = Private, Profit = Low freq 1

如果我需要选择2列,则需要所有可能的列组合,顺序无关紧要

2 个答案:

答案 0 :(得分:0)

您可以使用表格函数的嵌套循环来完成此操作:

UILabel

这很丑陋,并且有很多重复项,但是出于您的目的,它既快速又容易。

答案 1 :(得分:0)

组合部分听起来像expand.grid()

expand.grid(company[, 1:2])

   country category
1      USA   C-corp
2    China   C-corp
3   France   C-corp
4  Germany   C-corp
5      USA   S-corp
6    China   S-corp
7   France   S-corp
8  Germany   S-corp
9      USA   C-corp
10   China   C-corp
11  France   C-corp
12 Germany   C-corp
13     USA      LLC
14   China      LLC
15  France      LLC
16 Germany      LLC

# or if you want 4 columns with all countries, do a cross join:

merge(company[, 1, drop = F], company[, -1], by = NULL)

#or if you want 4 columns with all possible results, do expand.grid without subsetting:

expand.grid(company)

第二部分听起来像table()。您可以直接在company data.frame上执行它:

table(company)

, , Type = Private, Profit = High

         category
country   C-corp LLC S-corp
  China        0   0      1
  France       1   0      0
  Germany      0   0      0
  USA          0   0      0

, , Type = Public, Profit = High

         category
country   C-corp LLC S-corp
  China        0   0      0
  France       0   0      0
  Germany      0   0      0
  USA          1   0      0

, , Type = Private, Profit = Low

         category
country   C-corp LLC S-corp
  China        0   0      0
  France       0   0      0
  Germany      0   1      0
  USA          0   0      0

, , Type = Public, Profit = Low

         category
country   C-corp LLC S-corp
  China        0   0      0
  France       0   0      0
  Germany      0   0      0
  USA          0   0      0