我在dplyr
中遇到了一个奇怪的问题(可能是一个错误?),但在调试时遇到了一个更奇怪的问题。
代码的dplyr部分有already have an issue now,但请帮我弄清楚为什么identical()
没有检测到差异?
代码(从我在dplyr的github上创建的问题中复制而来)显示了瑞典字母(å,ä,ö,Å,Ä,Ö)的问题,以及base::identical(x,y)
时的示例即使数据帧x和y不同,也会返回TRUE
。
# Script to show how dplyr::select() breakes dplyr::group_by() with swedish names
library(dplyr)
# Create data frame, column 1's name contains ä (specific swedish letters are åäöÅÄÖ)
my_df <- data.frame(användarnamn = letters[1:4], my_numvalues = 1:4,
my_text = c("stop","break","my","code"),
extra_col = LETTERS[1:4])
# use dplyr::select() to subset columns, then dplyr::group_by
# group_by fails on swedish column names if the df is subsetted with filter.
# If not subsetted or subsetted with [,1:3], everything works
my_df %>% select(1:3) %>% group_by(my_numvalues) # This works
my_df %>% select(1:3) %>% group_by(användarnamn) # This fails
my_df[,1:3] %>% group_by(användarnamn) # This works
my_df %>% group_by(användarnamn) # This works
# Same thing, but step by step
my_df_selected <- select(my_df, 1:3)
group_by(my_df_selected, användarnamn) # This fails
group_by(my_df_selected, my_numvalues) # This works
# and by %>%
my_df_selected %>% group_by(användarnamn) # This fails
my_df_selected %>% group_by(my_numvalues) # This works
# The names of the orignal df and the filtered is identical
identical(names(my_df)[1:3],names(my_df_selected))
# The function base::make.names() doesn't change the name, it's already valid
identical(names(my_df_selected), make.names(names(my_df_selected)))
# copy to a new df to rename
my_df_selected_renamed <- my_df_selected
# rename the df with it's own old names passing make.names()
names(my_df_selected_renamed) <- make.names(names(my_df_selected_renamed))
# The orignal subsetted and the renamed df is identical
# according to base::identical()
identical(my_df_selected, my_df_selected_renamed)
# Here's the strange thing, it works now! Why??? I REALLY don't understand!
my_df_selected_renamed %>% group_by(användarnamn) # This works now!
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252
[3] LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
[5] LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3
loaded via a namespace (and not attached):
[1] lazyeval_0.1.10 magrittr_1.5 R6_2.1.1 assertthat_0.1 parallel_3.2.2
[6] DBI_0.3.1 tools_3.2.2 Rcpp_0.12.1