用于比较列的循环函数

时间:2015-12-04 20:41:51

标签: r

我有一个非常大的数据集,包括400个字符串和数字变量。我想比较每两个相关的列3& 4,5和6等。我将比较第三个变量(.x)与第四个(.y),第五个与第六个,第七个与第八个,依此类推按以下方式:if(.y)为NA,然后我们用(.x)中相应行的值替换NA。例如,如果数字.y是NA,我们用数字.x中的相应值替换NA,这将是5.再次,如果day.y是NA,我们将day.y中的NA替换为来自day.x的相应值。 3.我如何编写一个loope函数来做到这一点?

A<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
B<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
number.x<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
number.y<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
day.x<-c(1,3,4,5,6,7,8,1,NA,3,5,3)
day.y<-c(4,5,6,7,8,7,8,1,2,3,5,NA)
school.x<-c("a","b","b","c","n","f","h","NA","F","G","z","h")
school.y<-c("a","b","b","c","m","g","h","NA","NA","G","H","T")
city.x<- c(1,2,3,7,5,8,7,5,6,7,5,1)
city.y<- c(1,2,3,5,5,7,7,NA,NA,3,4,5)
df<-data.frame(A,B,number.x,number.y,day.x,day.y,school.x,school.y,city.x,city.y)

2 个答案:

答案 0 :(得分:1)

这是一个针对您的问题的黑客攻击方法,它要求每两列都要相互比较。

library(dplyr)

start_group <- seq(1, length(df), by = 2)
df2 <- data.frame(id = 1:nrow(df))
for(i in start_group){

  i <- i
  j <- i + 1

  dnames <- df[, c(i, j)] %>%
    names

  df_ <- data.frame(col1 = df[, i],
                    col2 = df[, j]) %>%
    mutate(col1 = ifelse(is.na(col1), col2 %>% paste, col1 %>% paste)) %>%
    mutate(col2 = ifelse(is.na(col2), col1 %>% paste, col2 %>% paste))

  names(df_) <- dnames

  df2 <- cbind(df2, df_)

}
df2[, -1]

   number.x number.y day.x day.y school.x school.y city.x city.y
1         1        3     1     4        a        a      1      1
2         2        4     3     5        b        b      2      2
3         3        5     4     6        b        b      3      3
4         4        6     5     7        c        c      7      5
5         5        1     6     8        n        m      5      5
6         6        2     7     7        f        g      8      7
7         7        7     8     8        h        h      7      7
8         6        6     1     1       NA       NA      5      5
9         7        7     2     2        F        F      6      6
10        5        5     3     3        G        G      7      3
11        5        5     5     5        z        H      5      4
12        6        6     3     3        h        T      1      5

答案 1 :(得分:0)

考虑以下基本R解决方案。从本质上讲,它循环显示一个不同的列干名称列表(数字,日期,学校,班级),并将.x列中的NA值替换为NA列中相应的.y值,反之亦然。注意:学校列需要从因素转换为字符,其中一行在NA.x列中都有.y

# CONVERT TO CHARACTER (NOTE: NA VALUE BECOME "NA" STRINGS)
df[,c('school.x', 'school.y')] <- 
  sapply(df[,c('school.x', 'school.y')], as.character)

# SET UP FINAL DF
finaldf <- df

# OBTAIN UNIQUE LIST OF COLUMNS STEM (W/O x AND y SUFFIXES)
distinctcols <- unique(gsub("[.][x]|[.][y]", "", names(df)[49:ncol(df)]))

# LOOP THROUGH COLUMN STEM REPLACING NA VALUES
for (col in distinctcols) {
  # REPLACE NA .x COLUMN VALUES
  finaldf[is.na(finaldf[paste0(col,'.x')])|finaldf[paste0(col,'.x')]=="NA",
     paste0(col,'.x')] <- 
  finaldf[is.na(finaldf[paste0(col,'.x')])|finaldf[paste0(col,'.x')]=="NA",
     paste0(col,'.y')]

  # REPLACE NA .y COLUMN VALUES       
  finaldf[is.na(finaldf[paste0(col,'.y')])|finaldf[paste0(col,'.y')]=="NA", 
     paste0(col,'.y')] <- 
  finaldf[is.na(finaldf[paste0(col,'.y')])|finaldf[paste0(col,'.y')]=="NA",
     paste0(col,'.x')]    
}

输出

    number.x    number.y    day.x   day.y   school.x    school.y    city.x  city.y
1          1           3        1       4          a           a         1       1
2          2           4        3       5          b           b         2       2
3          3           5        4       6          b           b         3       3
4          4           6        5       7          c           c         7       5
5          5           1        6       8          n           m         5       5
6          6           2        7       7          f           g         8       7
7          7           7        8       8          h           h         7       7
8          6           6        1       1         NA          NA         5       5
9          7           7        2       2          F           F         6       6
10         5           5        3       3          G           G         7       3
11         5           5        5       5          z           H         5       4
12         6           6        3       3          h           T         1       5