在我的数据框的一列中,我有一些空单元格。数据如下所示:
LoanID PaymentMonth Country
112345 201301 {null}
112345 201402 {null}
112345 201403 UK
234567 201301 US
234567 201302 {null}
234567 201303 {null}
我需要为不同的贷款ID替换null。期望的结果就像这样
LoanID PaymentMonth Country
112345 201301 UK
112345 201402 UK
112345 201403 UK
234567 201301 US
234567 201302 US
234567 201303 US
我该如何解决这个问题?
答案 0 :(得分:1)
使用tidyverse
:
library(tidyr)
library(dplyr)
df %>%
mutate(Country = case_when(Country == '{null}' ~ NA_character_,
TRUE ~ Country)) %>%
group_by(LoanID) %>%
fill(Country, .direction = 'up') %>%
fill(Country, .direction = 'down')
#> Source: local data frame [6 x 3]
#> Groups: LoanID [2]
#>
#> LoanID PaymentMonth Country
#> <int> <int> <fctr>
#> 1 112345 201301 UK
#> 2 112345 201402 UK
#> 3 112345 201403 UK
#> 4 234567 201301 US
#> 5 234567 201302 US
#> 6 234567 201303 US
df <- read.table(text = 'LoanID PaymentMonth Country
112345 201301 {null}
112345 201402 {null}
112345 201403 UK
234567 201301 US
234567 201302 {null}
234567 201303 {null}', header = T, stringsAsFactors = F)
或者,如果可能,从一开始就清理您的输入数据,放弃mutate
步骤:
df <- read.table(text = 'LoanID PaymentMonth Country
112345 201301 {null}
112345 201402 {null}
112345 201403 UK
234567 201301 US
234567 201302 {null}
234567 201303 {null}', header = T, na.string = '{null}')
df %>%
group_by(LoanID) %>%
fill(Country, .direction = 'up') %>%
fill(Country, .direction = 'down')
答案 1 :(得分:0)
假设“国家/地区”为character
类且{null}
为字符串,我们可以将其替换为NA
,然后使用na.locf
中的zoo
用相邻的非NA元素替换缺失值
library(zoo)
df1$Country[df1$Country=="{null}"] <- NA
df1$Country <- with(df1, ave(Country, LoanID, FUN = function(x)
na.locf(na.locf(x, na.rm = FALSE), fromLast=TRUE)))
df1
# LoanID PaymentMonth Country
#1 112345 201301 UK
#2 112345 201402 UK
#3 112345 201403 UK
#4 234567 201301 US
#5 234567 201302 US
#6 234567 201303 US
根据评论,也可以按'LoanID'进行分组,然后使用第一个非{{null}'元素更新'Country'列
library(dplyr)
df1 %>%
group_by(LoanID) %>%
mutate(Country = Country[Country!= "{null}"][1L])
# LoanID PaymentMonth Country
# <int> <int> <chr>
#1 112345 201301 UK
#2 112345 201402 UK
#3 112345 201403 UK
#4 234567 201301 US
#5 234567 201302 US
#6 234567 201303 US
df1 <- structure(list(LoanID = c(112345L, 112345L, 112345L, 234567L,
234567L, 234567L), PaymentMonth = c(201301L, 201402L, 201403L,
201301L, 201302L, 201303L), Country = c("{null}", "{null}", "UK",
"US", "{null}", "{null}")), .Names = c("LoanID", "PaymentMonth",
"Country"), class = "data.frame", row.names = c(NA, -6L))