部分合并两个数据集并在R中填写NA

时间:2016-01-24 18:04:41

标签: r merge dplyr na missing-data

我有两个数据集

a =具有数千次不同天气事件观测的原始数据集

   STATE       EVTYPE
1     AL WINTER STORM
2     AL      TORNADO
3     AL    TSTM WIND
4     AL    TSTM WIND
5     AL    TSTM WIND
6     AL         HAIL
7     AL    HIGH WIND
8     AL    TSTM WIND
9     AL    TSTM WIND
10    AL    TSTM WIND

b =字典表,其中包含某些天气事件的标准拼写。

                    EVTYPE       evmatch
1    HIGH SURF ADVISORY          <NA>
2         COASTAL FLOOD COASTAL FLOOD
3           FLASH FLOOD   FLASH FLOOD
4             LIGHTNING     LIGHTNING
5             TSTM WIND          <NA>
6       TSTM WIND (G45)          <NA>

两者都被df_new

合并到evtype
library(dplyr)
df_new <- left_join(a, b, by = c("EVTYPE"))

   STATE       EVTYPE           evmatch
1     AL WINTER STORM      WINTER STORM
2     AL      TORNADO              NA
3     AL    TSTM WIND THUNDERSTORM WIND
4     AL    TSTM WIND THUNDERSTORM WIND
5     AL    TSTM WIND THUNDERSTORM WIND
6     AL         HAIL              NA
7     AL    HIGH WIND         HIGH WIND
8     AL    TSTM WIND THUNDERSTORM WIND
9     AL    TSTM WIND THUNDERSTORM WIND
10    AL    TSTM WIND THUNDERSTORM WIND
11    AL   HEAVY RAIN        NA
12    AL  FLASH FLOOD       NA
13    AL    TSTM WIND THUNDERSTORM WIND
14    AL   HEAVY RAIN        NA
15    AL    TSTM WIND THUNDERSTORM WIND

填写缺失的NA

正如您在df_new$evmatch中所看到的,有一个NAs。如何合并数据集,但evmatch中的所有NA都填入EVTYPE中的相应单词。例如......

想要输出

 STATE       EVTYPE           evmatch
1     AL WINTER STORM      WINTER STORM
2     AL      TORNADO           TORNADO
3     AL    TSTM WIND THUNDERSTORM WIND
4     AL    TSTM WIND THUNDERSTORM WIND
5     AL    TSTM WIND THUNDERSTORM WIND
6     AL         HAIL              HAIL
7     AL    HIGH WIND         HIGH WIND
8     AL    TSTM WIND THUNDERSTORM WIND
9     AL    TSTM WIND THUNDERSTORM WIND
10    AL    TSTM WIND THUNDERSTORM WIND
11    AL   HEAVY RAIN        HEAVY RAIN
12    AL  FLASH FLOOD       FLASH FLOOD
13    AL    TSTM WIND THUNDERSTORM WIND
14    AL   HEAVY RAIN        HEAVY RAIN
15    AL    TSTM WIND THUNDERSTORM WIND

1 个答案:

答案 0 :(得分:1)

对问题的评论中给出的答案:

1:使用基础R

方法1:

df_new$evmatch <- with(df_new, ifelse(is.na(evmatch), EVTYPE, evmatch))

方法2:

df_new$evmatch[is.na(df_new$evmatch] <- df_new$EVTYPE[is.na(df_new$evmatch]

注意:确保两个变量都是字符,否则会出现错误结果。如果需要,请使用as.character进行转换。

2:使用 data.table

library(data.table)
setDT(df_new)[is.na(evmatch), evmatch := EVTYPE]

3:使用 dplyr

library(dplyr)
filter(df_new, is.na(evmatch) %>% 
         select(evmatch) <- filter(df_new, is.na(evmatch) %>% 
                                     select(EVTYPE)