如何用R中的NA替换空单元格?

时间:2016-10-18 04:43:48

标签: html r xml na

我是R的新手,并且一直在尝试一些例子,但我无法将所有空单元格更改为NA。

library(XML)
theurl <- "http://www.pro-football-reference.com/teams/sfo/1989.htm"
table <- readHTMLTable(theurl)
table

谢谢。

2 个答案:

答案 0 :(得分:2)

您从readHTMLTable获得的结果是为您提供了两个表的列表,因此您需要处理每个列表元素,这可以使用lapply

完成
table <- lapply(table, function(x){
     x[x == ""] <- NA
     return(x)
})


table$team_stats
Player  PF  Yds  Ply  Y/P TO FL 1stD  Cmp Att  Yds TD Int NY/A 1stD Att  Yds TD Y/A 1stD  Pen  Yds 1stPy
1      Team Stats 442 6268 1021  6.1 25 14  350  339 483 4302 35  11  8.1  209 493 1966 14 4.0  124  109  922    17
2      Opp. Stats 253 4618  979  4.7 37 16  283  316 564 3235 15  21  5.3  178 372 1383  9 3.7   76   75  581    29
3 Lg Rank Offense   1    1 <NA> <NA>  2 10    1 <NA>  20    2  1   1    1 <NA>  13   10 12  13 <NA> <NA> <NA>  <NA>
4 Lg Rank Defense   3    4 <NA> <NA> 11  9    9 <NA>  25   11  3   9    5 <NA>   1    3  3   8 <NA> <NA> <NA>  <NA>

答案 1 :(得分:2)

您有一系列data.frames因素,但实际数据主要是数字。使用type.convert转换为适当的类型会自动为您插入相应的NA

df_list <- lapply(table, function(x){
    x[] <- lapply(x, function(y){type.convert(as.character(y), as.is = TRUE)}); 
    x
})

df_list[[1]][, 1:18]
##            Player  PF  Yds  Ply Y/P TO FL 1stD Cmp Att Yds.1 TD Int NY/A 1stD.1 Att.1 Yds.2 TD.1
## 1      Team Stats 442 6268 1021 6.1 25 14  350 339 483  4302 35  11  8.1    209   493  1966   14
## 2      Opp. Stats 253 4618  979 4.7 37 16  283 316 564  3235 15  21  5.3    178   372  1383    9
## 3 Lg Rank Offense   1    1   NA  NA  2 10    1  NA  20     2  1   1  1.0     NA    13    10   12
## 4 Lg Rank Defense   3    4   NA  NA 11  9    9  NA  25    11  3   9  5.0     NA     1     3    3

或者更简洁,但有很多包,

library(tidyverse)    # for purrr functions and readr::type_convert
library(janitor)      # for clean_names

df_list <- map(table, ~.x %>% clean_names() %>% dmap(as.character) %>% type_convert())

df_list[[1]]
## # A tibble: 4 × 23
##            player    pf   yds   ply   y_p    to    fl x1std   cmp   att yds_2    td   int  ny_a
##             <chr> <int> <int> <int> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
## 1      Team Stats   442  6268  1021   6.1    25    14   350   339   483  4302    35    11   8.1
## 2      Opp. Stats   253  4618   979   4.7    37    16   283   316   564  3235    15    21   5.3
## 3 Lg Rank Offense     1     1    NA    NA     2    10     1    NA    20     2     1     1   1.0
## 4 Lg Rank Defense     3     4    NA    NA    11     9     9    NA    25    11     3     9   5.0
## # ... with 9 more variables: x1std_2 <int>, att_2 <int>, yds_3 <int>, td_2 <int>, y_a <dbl>,
## #   x1std_3 <int>, pen <int>, yds_4 <int>, x1stpy <int>