条件语句不同意循环中的列表索引

时间:2018-02-09 15:10:23

标签: r function

我的数据看起来像这样:

df <- data.frame(pop = c("Spades", "Spades", "Spades", "Clubs", "Clubs", "Clubs", "Diamonds", "Diamonds", "Hearts", "Hearts"),
            type = c("Ace", "Two", "Three", "Ace", "Two", "Three", "Ace", "Two", "King", "Queen"),
            V1 = c(4, 3, NA, 7, NA, NA, 5, 12, NA, NA),
            V2 = c(16, 23, NA, 15, NA, NA, 8, 19, NA, NA))

我需要将NA归为0,但仅限于非常具体的情况。对于每个pop(填充)和类型,数据(V1,V2等)必须包含所有NA或所有数字。所以在这个例子中,黑桃流行音乐在V1和V2中缺少黑桃三行的数据,而黑桃王牌和黑桃二星则有数据。因此,Spades-Three的V1和V2需要从NA变为0.同样适用于Clubs pop。

结果数据集应如下所示:

df2 <- data.frame(pop = c("Spades", "Spades", "Spades", "Clubs", "Clubs", "Clubs", "Diamonds", "Diamonds", "Hearts", "Hearts"),
            type = c("Ace", "Two", "Three", "Ace", "Two", "Three", "Ace", "Two", "King", "Queen"),
            V1 = c(4, 3, 0, 7, 0, 0, 5, 12, NA, NA),
            V2 = c(16, 23, 0, 15, 0, 0, 8, 19, NA, NA))

我可以使用此代码执行此插补:

ID <- unique(df$pop)  

for (i in 1:length(ID)) {
   dftemp <- filter(df, pop == paste(ID[i]))
   # Number of unique categories for a pop-type combination
   num_type <- length(dftemp$type)
   # Number of NA's in that combination for V1
   num_na <- sum(is.na(dftemp$V1) == TRUE)
   print(num_type)
   print(num_na)
   if (num_na < num_type && num_na > 0) {
     # print(paste(ID[i]))
     df$V1[with(df, pop == paste(ID[i]) & is.na(V1))] <- 0
     df$V2[with(df, pop == paste(ID[i]) & is.na(V2))] <- 0
   }
}

我的问题是扩大规模。我需要为更多列执行此操作,因此我想将列名放入一个列表中,然后我可以通过循环传递它。但出于某种原因,在上一个if循环中,从

更改

df$V1[with(df, pop == paste(ID[i]) & is.na(V1))] <- 0

df[newlist[k]][with(df, pop == paste(ID[i]) & is.na(newlist[k]))] <- 0

(其中newlist <- c("V1", "V2", "V3", "V4")等) 使pop == paste(ID[i])条件不再有效。如果我指定pop == "Spades",那么它可以工作,但显然这比旧方法效率更低。

最终目标是创建一个函数,我可以传递df名称和列列表以使其工作,但我发现自己因这个问题而陷入困境。

我目前编写函数的尝试看起来像这样:

imputezero <- function(df, columnlist) {
  for (i in 1:length(ID)) {
    for (x in 1:length(columnlist)) {
      dftemp <- filter(df, pop == paste(ID[i]))
      num_type <- length(dftemp$type)
      num_na <- sum(is.na(dftemp[collist[x]]) == TRUE)
      if (num_na < num_type && num_na > 0) {
        df[columnlist[x]][with(df, pop == paste(ID[i]) & is.na(df[columnlist[x]]))] <- 0
        return(df)
      }
    }
  }
}

list_status <- c("V1", "V2")
test_df <- imputezero(df, list_status)

那么我怎样才能让df[columnlist[x]][with(df, pop == paste(ID[i]) & is.na(df[columnlist[x]]))] <- 0工作?

如果我的一般方法完全错误或者有办法消除所有噪音,我也欢迎任何反馈。

2 个答案:

答案 0 :(得分:0)

您可以使用mutate_at中的dplyr来实现此目标,它可以按任意数量的列进行扩展

如果我理解正确,你可以这样做:

df %>%
  group_by(pop) %>%
  mutate_at(.funs = funs(ifelse(is.na(.) & sum(is.na(.)) != n(), 0, .)), 
            .vars = vars(-type))

答案 1 :(得分:0)

当num_ra等于num_type或num_na为零时,我更改函数中的if以跳过循环。然后我执行df [columnlist [x]] [with(df,pop == paste(ID [i])&amp; is.na(df [columnlist [x]]))]&lt; - 0行代码。我将return(df)移动到函数的末尾。这似乎有效。

imputezero <- function(df, columnlist) {
  for (i in 1:length(ID)) {
    for (x in 1:length(columnlist)) {
      dftemp <- filter(df, pop == paste(ID[i]))
      num_type <- length(dftemp$type)
      num_na <- sum(is.na(dftemp[columnlist[x]]) == TRUE)
      if (num_na == num_type | num_na == 0) {
        next
      }
        df[columnlist[x]][with(df, pop == paste(ID[i]) & is.na(df[columnlist[x]]))] <- 0

    }
  }
  return(df)
}