base :: identical()返回TRUE,但数据框不同

时间:2015-09-30 19:43:26

标签: r character-encoding dplyr

我在dplyr中遇到了一个奇怪的问题(可能是一个错误?),但在调试时遇到了一个更奇怪的问题。

代码的dplyr部分有already have an issue now,但请帮我弄清楚为什么identical()没有检测到差异?

代码(从我在dplyr的github上创建的问题中复制而来)显示了瑞典字母(å,ä,ö,Å,Ä,Ö)的问题,以及base::identical(x,y)时的示例即使数据帧x和y不同,也会返回TRUE

# Script to show how dplyr::select() breakes dplyr::group_by() with swedish names
library(dplyr)

# Create data frame, column 1's name contains ä (specific swedish letters are åäöÅÄÖ)
my_df <- data.frame(användarnamn = letters[1:4], my_numvalues = 1:4, 
                my_text = c("stop","break","my","code"),
                extra_col = LETTERS[1:4])

# use dplyr::select() to subset columns, then dplyr::group_by
# group_by fails on swedish column names if the df is subsetted with filter.
# If not subsetted or subsetted with [,1:3], everything works

my_df %>% select(1:3) %>% group_by(my_numvalues)   # This works
my_df %>% select(1:3) %>% group_by(användarnamn)   # This fails
my_df[,1:3] %>% group_by(användarnamn)             # This works
my_df %>% group_by(användarnamn)                   # This works

# Same thing, but step by step
my_df_selected <- select(my_df, 1:3)
group_by(my_df_selected, användarnamn)             # This fails
group_by(my_df_selected, my_numvalues)             # This works

# and by %>% 
my_df_selected %>% group_by(användarnamn)          # This fails
my_df_selected %>% group_by(my_numvalues)          # This works

# The names of the orignal df and the filtered is identical
identical(names(my_df)[1:3],names(my_df_selected))
# The function base::make.names() doesn't change the name, it's already valid
identical(names(my_df_selected), make.names(names(my_df_selected)))

# copy to a new df to rename
my_df_selected_renamed <- my_df_selected
# rename the df with it's own old names passing make.names()
names(my_df_selected_renamed) <- make.names(names(my_df_selected_renamed))

# The orignal subsetted and the renamed df is identical
# according to base::identical()
identical(my_df_selected, my_df_selected_renamed)

# Here's the strange thing, it works now! Why??? I REALLY don't understand!
my_df_selected_renamed %>% group_by(användarnamn)  # This works now!

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252   
[3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                   
[5] LC_TIME=Swedish_Sweden.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3

loaded via a namespace (and not attached):
[1] lazyeval_0.1.10 magrittr_1.5    R6_2.1.1        assertthat_0.1        parallel_3.2.2 
[6] DBI_0.3.1       tools_3.2.2     Rcpp_0.12.1  

0 个答案:

没有答案