如何计算数据框中组合的类似出现次数?

时间:2016-07-18 06:58:05

标签: r

我天真,我在R上加载了一个着名的数据集,现在我想用它做几个实验。下面是我到目前为止执行的脚本数组:

我有一个final ImageView splash1 = (ImageView) this.findViewById(R.id.splash); new Handler().postDelayed(new Runnable(){ @Override public void run() { splash1.setVisibility(View.GONE); } }, 1000); 数据框:

battles

我的要求是我想知道一个国王在他的整个GOT范围内到目前为止有多少损失和胜利。

str(battles)

'data.frame':   38 obs. of  25 variables:
 $ name              : Factor w/ 38 levels "Battle at the Mummer's Ford",..: 13 1 7 14 18 10 25 5 3 17 ...
 $ year              : int  298 298 298 298 298 298 298 299 299 299 ...
 $ battle_number     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ attacker_king     : Factor w/ 5 levels "","Balon/Euron Greyjoy",..: 3 3 3 4 4 4 3 2 2 2 ...
 $ defender_king     : Factor w/ 7 levels "","Balon/Euron Greyjoy",..: 6 6 6 3 3 3 6 6 6 6 ...
 $ attacker_1        : Factor w/ 11 levels "Baratheon","Bolton",..: 10 10 10 11 11 11 10 9 9 9 ...
 $ attacker_2        : Factor w/ 8 levels "","Bolton","Frey",..: 1 1 1 1 8 8 1 1 1 1 ...
 $ attacker_3        : Factor w/ 3 levels "","Giants","Mormont": 1 1 1 1 1 1 1 1 1 1 ...
 $ attacker_4        : Factor w/ 2 levels "","Glover": 1 1 1 1 1 1 1 1 1 1 ...
 $ defender_1        : Factor w/ 13 levels "","Baratheon",..: 12 2 12 8 8 8 6 11 11 11 ...
 $ defender_2        : Factor w/ 3 levels "","Baratheon",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ defender_3        : logi  NA NA NA NA NA NA ...
 $ defender_4        : logi  NA NA NA NA NA NA ...
 $ attacker_outcome  : Factor w/ 3 levels "","loss","win": 3 3 3 2 3 3 3 3 3 3 ...
 $ battle_type       : Factor w/ 5 levels "","ambush","pitched battle",..: 3 2 3 3 2 2 3 3 5 2 ...
 $ major_death       : int  1 1 0 1 1 0 0 0 0 0 ...
 $ major_capture     : int  0 0 1 1 1 0 0 0 0 0 ...
 $ attacker_size     : int  15000 NA 15000 18000 1875 6000 NA NA 1000 264 ...
 $ defender_size     : int  4000 120 10000 20000 6000 12625 NA NA NA NA ...
 $ attacker_commander: Factor w/ 32 levels "","Asha Greyjoy",..: 8 6 9 22 16 18 6 30 2 28 ...
 $ defender_commander: Factor w/ 29 levels "","Amory Lorch",..: 7 4 10 28 12 14 15 1 1 1 ...
 $ summer            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ location          : Factor w/ 28 levels "","Castle Black",..: 8 13 17 9 27 17 4 12 5 23 ...
 $ region            : Factor w/ 7 levels "Beyond the Wall",..: 7 5 5 5 5 5 5 3 3 3 ...
 $ note              : Factor w/ 6 levels "","Greyjoy's troop number based on the Battle of Deepwood Motte, in which Asha had 1000 soldier on 30 longships. That comes out to"| __truncated__,..: 1 1 1 1 1 1 1 1 1 2 ...
  

我还需要2个名称为“胜利数”和“失去数量”的列   对于每个攻击者国王。

注意:请原谅我,如果我的问题以任何方式伤害了stackOverFlow问问题政策,因为这是我在R的第一个问题。

2 个答案:

答案 0 :(得分:3)

您可以使用基本软件包中的table

table(df$attacker_king,df$attacker_outcome )

#                           loss win
#  Balon/Euron Greyjoy         0   7
#  Joffrey/Tommen Baratheon    1  13
#  Robb Stark                  2   8
#  Stannis Baratheon           2   2

答案 1 :(得分:2)

一个选项是dplyr。在通过'attacker_king'进行分组后,我们summarise输出,创建两列('NoWins','NoLoss'),基于逻辑向量的sum为“赢”和“丢失”,如果需要filter'attacker_king'中的空白元素。

library(dplyr)
battles %>%
      group_by(attacker_king) %>%
      summarise(NoWins = sum(attacker_outcome == "win"),
                 NoLoss = sum(attacker_outcome == "loss")) %>%
      filter(nzchar(attacker_king))
#            attacker_king NoWins NoLoss
#                 <chr>  <int>  <int>
#1      Balon/Euron Greyjoy      7      0
#2 Joffrey/Tommen Baratheon     13      1
#3               Robb Stark      8      2
#4        Stannis Baratheon      2      2

或者我们可以使用dplyr/tidyr。分组后,我们会使用tallyfilter(如上所述)和spread(来自tidyr)获取频率计数,以将'long'转换为'wide'格式

library(tidyr)
battles %>%
     group_by(attacker_king, attacker_outcome) %>%
     tally() %>% 
     filter(nzchar(attacker_king) & nzchar(attacker_outcome)) %>% 
     spread(attacker_outcome, n)

或使用dcast中的data.table。这会更容易,因为dcast也有fun.aggregate,因此我们可以在重塑为“宽”格式时指定函数(在本例中为length)。

library(data.table)
dcast(setDT(battles), attacker_king~attacker_outcome, length)[nzchar(attacker_king)
                        ][, -2, with = FALSE]
#                attacker_king loss win
#1:      Balon/Euron Greyjoy    0   7
#2: Joffrey/Tommen Baratheon    1  13
#3:               Robb Stark    2   8
#4:        Stannis Baratheon    2   2

或使用table

中的base R
table(battles[c("attacker_king", "attacker_outcome")])[-1,-1]
#                          attacker_outcome
#  attacker_king              loss win
#  Balon/Euron Greyjoy         0   7
#  Joffrey/Tommen Baratheon    1  13
#  Robb Stark                  2   8
#  Stannis Baratheon           2   2