你如何用0替换空单元?

时间:2013-09-11 20:29:41

标签: r

我需要用R中的零(0)替换空单元格。我有一个这样的数据框:

dput(DF)

structure(list(CHANNEL = structure(c(1L, 1L, 1L), .Label = "Native BlackBerry App", class = "factor"), 
    DATE = structure(c(1L, 1L, 1L), .Label = "01/01/2011", class = "factor"), 
    HOUR = structure(c(3L, 1L, 2L), .Label = c("1:00am-2:00am", 
    "2:00am-3:00am", "Midnight-1:00am"), class = "factor"), UNIQUE_USERS = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor"), LOGON_VOLUME = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor")), .Names = c("CHANNEL", 
"DATE", "HOUR", "UNIQUE_USERS", "LOGON_VOLUME"), row.names = c(NA, 
-3L), class = "data.frame")

我有这个功能:

sapply(df, function (x) 
     as.numeric(gsub("(^ +)|( +$)", "0", x))) 

我收到这些错误,而不是工作。

[ reached getOption("max.print") -- omitted 422793 rows ]
Warning messages:
1: In FUN(X[[4L]], ...) : NAs introduced by coercion
2: In FUN(X[[4L]], ...) : NAs introduced by coercion
3: In FUN(X[[4L]], ...) : NAs introduced by coercion
4: In FUN(X[[4L]], ...) : NAs introduced by coercion

更新:  当我将此函数应用于df:

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) )

我明白了:

  CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" ""           ""          
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   ""           ""          
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   ""           ""  

1 个答案:

答案 0 :(得分:4)

您在sapply中定义了一个匿名函数,然后从不使用该函数的参数。

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) ) #===> change df to x

你还将所有强制转换为数值,导致非{1}字符串的NA值。由于data.frame的每一列都是原子矢量,它只能包含一种类型的数据。因此,所有元素的通用数据类型都是字符。

也许你打算这样做......

sapply( df , gsub , pattern = "^\\s*$" , replacement = 0 )

     CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" "0"          "0"         
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   "0"          "0"         
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   "0"          "0"  

使用gsub之后 将转换为整数,对于包含数字字符表示之外的任何列,您也会得到NA 。如果需要更改整列,可以检查整列是否为空,如果是,则替换为零。您不能在同一列中包含字符元素和数字元素。

len <- colSums( sapply( df , grepl , pattern = "^\\s*$" ) )    
df[ , len > 0 ] <- rep( 0 , nrow(df) )
#                CHANNEL       DATE            HOUR UNIQUE_USERS LOGON_VOLUME
#1 Native BlackBerry App 01/01/2011 Midnight-1:00am            0            0
#2 Native BlackBerry App 01/01/2011   1:00am-2:00am            0            0
#3 Native BlackBerry App 01/01/2011   2:00am-3:00am            0            0