频率表来自两个滤波器,两个汇总在dplyr中

时间:2018-04-20 19:53:44

标签: r dplyr data.table

如何将以下代码合并为一个:

Error: invalid address
    at inputAddressFormatter (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/formatters.js:279:11)
    at inputTransactionFormatter (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/formatters.js:101:20)
    at /home/jall/ZeroExTrading/node_modules/web3/lib/web3/method.js:90:28
    at Array.map (<anonymous>)
    at Method.formatInput (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/method.js:88:32)
    at Method.toPayload (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/method.js:116:23)
    at Eth.send [as sendTransaction] (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/method.js:141:30)
    at SolidityFunction.sendTransaction (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/function.js:170:26)
    at SolidityFunction.execute (/home/jall/ZeroExTrading/node_modules/web3/lib/web3/function.js:256:37)
    at deposit (/home/jall/ZeroExTrading/lib/Transfer.js:56:30)

因此输出将是一个表(或df),它根据上述过滤器按年和两列频率分组。 以下是示例数据:

df %>% group_by(year) %>% filter(MIAPRFCD_J8==1 | MIAPRFCD_55==1) %>% summarise (Freq = n()) 


df %>% group_by(year) %>% filter(sum==1 | (MIAPRFCD_J8==1 & MIAPRFCD_55==1)) %>% summarise (reason_lv = n()) 

梳理代码后的输出为:

df<- read.table(header=T, text='Act year    MIAPRFCD_J8 MIAPRFCD_55 sum
1   2015    1   0   1
2   2016    1   0   1
3   2016    0   1   2
6   2016    1   1   3
7   2016    0   0   2
9   2015    1   0   1
11  2015    1   0   1
12  2015    0   1   2
15  2014    0   1   1
20  2014    1   0   1
60  2013    1   0   1') 

先谢谢!

1 个答案:

答案 0 :(得分:1)

现在您已经包含了数据,这很容易解决。这有两种可能的选择。这两个选项都可以为您提供所需的输出,这主要是风格问题。

选项1,制作2个过滤后的数据帧,然后使用inner_join按年将它们连接在一起。 (您也可以在inner_join的参数中内联构建这些数据框,但这不太清楚。)

library(tidyverse)

df<- read.table(header=T, 
    text='Act year    MIAPRFCD_J8 MIAPRFCD_55 sum
    1   2015    1   0   1
    2   2016    1   0   1
    3   2016    0   1   2
    6   2016    1   1   3
    7   2016    0   0   2
    9   2015    1   0   1
    11  2015    1   0   1
    12  2015    0   1   2
    15  2014    0   1   1
    20  2014    1   0   1
    60  2013    1   0   1') 

# option 1: two dataframes, then join
freq_df <- df %>% 
    group_by(year) %>% 
    filter(MIAPRFCD_J8 == 1 | MIAPRFCD_55 == 1) %>% 
    summarise (Freq = n()) 

reason_df <- df %>% 
    group_by(year) %>% 
    filter(sum == 1 | (MIAPRFCD_J8 == 1 & MIAPRFCD_55 == 1)) %>% 
    summarise (reason_lv = n())

inner_join(freq_df, reason_df, by = "year")
#> # A tibble: 4 x 3
#>    year  Freq reason_lv
#>   <int> <int>     <int>
#> 1  2013     1         1
#> 2  2014     2         2
#> 3  2015     4         3
#> 4  2016     3         2

选项2,为观察是否需要进入频率计算添加布尔变量,以及是否需要进入响应计算 - 虚拟变量有助于此,因为这两个事物不是互斥的。

# option 2: binary variables
df %>%
    mutate(getFreq = (MIAPRFCD_J8 == 1 | MIAPRFCD_55 == 1)) %>%
    mutate(getReason = (sum == 1 | (MIAPRFCD_J8 == 1 & MIAPRFCD_55 == 1))) %>%
    group_by(year) %>%
    summarise(Freq = sum(getFreq), reason_lv = sum(getReason))
#> # A tibble: 4 x 3
#>    year  Freq reason_lv
#>   <int> <int>     <int>
#> 1  2013     1         1
#> 2  2014     2         2
#> 3  2015     4         3
#> 4  2016     3         2

reprex package(v0.2.0)创建于2018-04-23。