我们正在分析SQL Server环境中的列。我们正在提取列名和数据类型。然后,我们运行一个简单的管道参数,以查看是否在不同的表中具有相同列名称的混合数据类型。
library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"),
DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))
DF %>% group_by(COLUMN_NAME) %>%
summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
我回来的就是
mixedTypes
1 TRUE
但是我相信我应该取回data.frame的子集,包括两列以及名为mixedTypes
的新列。
更新:有人建议使用conflicts
,但我的知识不足以了解如何解释detail=TRUE
的输出:
$.GlobalEnv
[1] "df"
$`package:forcats`
[1] "%>%" "%>%" "%>%" "%>%" "%>%"
$`package:purrr`
[1] "%>%" "%>%" "compact" "%>%" "%>%" "set_names" "%>%"
$`package:tidyr`
[1] "%>%" "%>%" "%>%" "%>%" "extract" "%>%"
$`package:plyr`
[1] "compact" "arrange" "count" "desc" "failwith" "id" "mutate" "rename" "summarise"
[10] "summarize" "is.discrete" "summarize"
$`package:stringr`
[1] "%>%" "%>%" "%>%" "%>%" "%>%"
$`package:tibble`
[1] "add_row" "as_data_frame" "as_tibble" "data_frame" "data_frame_" "frame_data" "glimpse" "lst"
[9] "lst_" "tbl_sum" "tibble" "tribble" "trunc_mat" "type_sum"
$`package:magrittr`
[1] "%>%" "%>%" "%>%" "%>%" "extract" "set_names" "%>%"
$`package:dplyr`
[1] "%>%" "%>%" "%>%" "%>%" "%>%" "add_row" "arrange" "as_data_frame"
[9] "as_tibble" "count" "data_frame" "data_frame_" "desc" "failwith" "frame_data" "glimpse"
[17] "id" "lst" "lst_" "mutate" "rename" "summarise" "summarize" "tbl_sum"
[25] "tibble" "tribble" "trunc_mat" "type_sum" "src" "summarize" "coalesce" "filter"
[33] "lag" "intersect" "setdiff" "setequal" "union"
$`package:Hmisc`
[1] "summarize" "is.discrete" "src" "summarize" "format.pval" "units"
$`package:ggplot2`
[1] "Position"
$`package:MyPackage`
[1] "coalesce" "HeatMap"
$`package:stats`
[1] "df" "filter" "lag"
$`package:methods`
[1] "body<-" "kronecker"
$`package:base`
[1] "body<-" "format.pval" "HeatMap" "intersect" "kronecker" "Position" "setdiff" "setequal" "union"
[10] "units"
答案 0 :(得分:1)
如评论中所述,问题在于plyr
的{{1}}版本是在summarise
之后加载的,因此,当您调用dplyr
时会得到错误的信息。您应该首先尝试加载summarise
(或者更好的是,完全不要加载它),但是也可以通过明确声明所需的plyr
版本来保证安全。
summarise
如果您确实需要同时加载library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"),
DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))
# bad:
DF %>% group_by(COLUMN_NAME) %>%
plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
# good:
DF %>% group_by(COLUMN_NAME) %>%
dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
和plyr
,则最好采用这种方式,并且还应避免与其他其他重要冲突,例如dplyr
。但最好避免同时加载两者。