我有一些代码循环遍历所有列,并在我定义的gsub()
语句后清除特定的字符串(见下文)。奇怪的是,dplyr包中的tbl_df
类使gsub
表现得很奇怪。
如果没有先明确地传递我的列as.data.frame
,gsub
语句就无法正常运行,并且会给出完全错误的返回值(参见下文)。是什么导致了这种行为?
我已将tbl_df
对象转换为as.data.frame
,然后在gsub
内使用它,之后绕过了这个问题。
我的代码(通过调试器访问,因此Browse >
语句
Browse[1]> x[,1]
Source: local data frame [70 x 1]
Symbol
(chr)
1 AAK.ST
2 ABB.ST
3 ALFA.ST
4 ALIV-SDB.ST
5 AOI.ST
6 ATCO-A.ST
7 AXFO.ST
8 AXIS.ST
9 AZN.ST
10 BALD-B.ST
.. ...
Browse[1]> gsub(pattern = '[-.](.*)$', replacement = '', x = x[,1])
[1] "c(\"AAK" # Wrong behaviour
Browse[1]> gsub(pattern = '[-.](.*)$', replacement = '', x = as.data.frame(x[,1]))
[1] "c(\"AAK" # Still wrong behaviour
Browse[1]> y <- as.data.frame(x[,1])
Browse[1]> gsub(pattern = '[-.](.*)$', replacement = '', x = y[,1]) # Now it's right(!)
[1] "AAK" "ABB" "ALFA" "ALIV" "AOI" "ATCO" "AXFO" "AXIS" "AZN" "BALD" "BETS" "BILL" "BOL" "CAST" "COMH" "EKTA" "ELUX" "ENQ" "ERIC" "FABG" "GETI" "HEXA" "HM" "HOLM" "HPOL" "HUFV"
[27] "HUSQ" "ICA" "IJ" "INDT" "INDU" "INVE" "JM" "KINV" "LATO" "LIFCO" "LOOM" "LUMI" "LUND" "LUPE" "MEDA" "MELK" "MIC" "MTG" "NCC" "NDA" "NIBE" "NOBI" "ORI" "PEAB" "RATO" "SAAB"
[53] "SAND" "SCA" "SEB" "SECU" "SHB" "SKA" "SKF" "SOBI" "SSAB" "STE" "SWED" "SWMA" "TEL2" "TIEN" "TLSN" "TREL" "VOLV" "WALL"