tidyverse-dplyr汇总未按预期运行

时间:2018-07-02 20:00:02

标签: r dplyr tidyverse

我们正在分析SQL Server环境中的列。我们正在提取列名和数据类型。然后,我们运行一个简单的管道参数,以查看是否在不同的表中具有相同列名称的混合数据类型。

library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"), 
                DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))
DF %>% group_by(COLUMN_NAME) %>% 
           summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) & 
                                  !(all(grepl("char", DATA_TYPE)))))

我回来的就是

  mixedTypes
1       TRUE

但是我相信我应该取回data.frame的子集,包括两列以及名为mixedTypes的新列。

更新:有人建议使用conflicts,但我的知识不足以了解如何解释detail=TRUE的输出:

$.GlobalEnv
[1] "df"

$`package:forcats`
[1] "%>%" "%>%" "%>%" "%>%" "%>%"

$`package:purrr`
[1] "%>%"       "%>%"       "compact"   "%>%"       "%>%"       "set_names" "%>%"      

$`package:tidyr`
[1] "%>%"     "%>%"     "%>%"     "%>%"     "extract" "%>%"    

$`package:plyr`
 [1] "compact"     "arrange"     "count"       "desc"        "failwith"    "id"          "mutate"      "rename"      "summarise"  
[10] "summarize"   "is.discrete" "summarize"  

$`package:stringr`
[1] "%>%" "%>%" "%>%" "%>%" "%>%"

$`package:tibble`
 [1] "add_row"       "as_data_frame" "as_tibble"     "data_frame"    "data_frame_"   "frame_data"    "glimpse"       "lst"          
 [9] "lst_"          "tbl_sum"       "tibble"        "tribble"       "trunc_mat"     "type_sum"     

$`package:magrittr`
[1] "%>%"       "%>%"       "%>%"       "%>%"       "extract"   "set_names" "%>%"      

$`package:dplyr`
 [1] "%>%"           "%>%"           "%>%"           "%>%"           "%>%"           "add_row"       "arrange"       "as_data_frame"
 [9] "as_tibble"     "count"         "data_frame"    "data_frame_"   "desc"          "failwith"      "frame_data"    "glimpse"      
[17] "id"            "lst"           "lst_"          "mutate"        "rename"        "summarise"     "summarize"     "tbl_sum"      
[25] "tibble"        "tribble"       "trunc_mat"     "type_sum"      "src"           "summarize"     "coalesce"      "filter"       
[33] "lag"           "intersect"     "setdiff"       "setequal"      "union"        

$`package:Hmisc`
[1] "summarize"   "is.discrete" "src"         "summarize"   "format.pval" "units"      

$`package:ggplot2`
[1] "Position"

$`package:MyPackage`
[1] "coalesce" "HeatMap" 

$`package:stats`
[1] "df"     "filter" "lag"   

$`package:methods`
[1] "body<-"    "kronecker"

$`package:base`
 [1] "body<-"      "format.pval" "HeatMap"     "intersect"   "kronecker"   "Position"    "setdiff"     "setequal"    "union"      
[10] "units"   

1 个答案:

答案 0 :(得分:1)

如评论中所述,问题在于plyr的{​​{1}}版本是在summarise之后加载的,因此,当您调用dplyr时会得到错误的信息。您应该首先尝试加载summarise(或者更好的是,完全不要加载它),但是也可以通过明确声明所需的plyr版本来保证安全。

summarise

如果您确实需要同时加载library(tidyverse) DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"), DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric")) # bad: DF %>% group_by(COLUMN_NAME) %>% plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) & !(all(grepl("char", DATA_TYPE))))) # good: DF %>% group_by(COLUMN_NAME) %>% dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) & !(all(grepl("char", DATA_TYPE))))) plyr,则最好采用这种方式,并且还应避免与其他其他重要冲突,例如dplyr。但最好避免同时加载两者。