根据数据框R中另一列中的值复制值

时间:2020-04-10 07:28:01

标签: r

我写了一些循环遍历文件夹中xlsx文件的代码。在循环的某个阶段,数据帧如下图所示。我要实现的是将B列的值与A列的值一起复制。因此:复制B列,直到A列中的组更改值。如果A列中的组没有值,请将其留空。这将导致第二个数据帧

'A' 'B' 'C'    'D'  'E'
 1  50  'ABCD'  10  20
 1      'JNHF'
 1      'edfw'
 2  100 'b984'
 2      'abcd'
 2      'abcd'
 3      'abcd'  24
 3      'b984'
 4 25   'JNHF'
 4      'JNHF'
 4      'b984'

结果将是这样:

'A' 'B' 'C'    'D' 'E' 
 1  50  'ABCD' 10  20
 1  50  'JNHF' 10  20
 1  50  'edfw' 10  20
 2  100 'b984'
 2  100 'abcd'
 2  100 'abcd'
 3      'abcd' 24
 3      'b984' 24
 4  25  'JNHF'
 4  25  'JNHF'
 4  25  'b984'

为此,我编写了以下代码。

 names <- c('B','D','E')

 for(j in 1:length(names)){
  for(i in 2:nrow(df)){
    if(df[,names[j]][i] == '' & df[,names[1]][i] == df[,names[1]][i-1] ){
        df[,numbers[j]][i] <- df[,names[j]][i-1] 
     }
    }
 }  

代码返回:

 Error in if (df[, names[j]][i] == "" & df[, names[1]][i] == df[, names[1]][i -  : 
   argument is of length zero

我该如何解决?

3 个答案:

答案 0 :(得分:0)

pr/number替换空白,然后使用NA

tidyr::fill

数据

library(dplyr)

df %>% mutate_at(vars(names), na_if, "") %>% group_by(A) %>% tidyr::fill(names)

#       A B     C    
#   <int> <chr> <fct>
# 1     1 50    ABCD 
# 2     1 50    JNHF 
# 3     1 50    edfw 
# 4     2 100   b984 
# 5     2 100   abcd 
# 6     2 100   abcd 
# 7     3 NA    abcd 
# 8     3 NA    b984 
# 9     4 25    JNHF 
#10     4 25    JNHF 
#11     4 25    b984 

答案 1 :(得分:0)

Base R解决方案(使用@RonakShah提供的数据-谢谢):

# Convert factors to character strings: clean_df => data.frame
clean_df <- data.frame(lapply(df, function(w){if(is.factor(w)){as.character(w)}else{w}}), 
                       stringsAsFactors = FALSE)

# Replace blank stirngs with values filled downwards grouping by A: stdout
data.frame(lapply(clean_df, function(x){
        return(ave(x, clean_df$A, FUN = function(z){
          ifelse(any(!(is.na(z))), na.omit(z)[cumsum(!is.na(z))], NA)
          }
        )
      )
    }
  )
)

答案 2 :(得分:0)

@Timminator,对names变量和for循环进行了少许修改,如下所示:

names <- c("'B'","'D'","'E'")
#
for(j in 1:length(names)){
  for(i in 2:nrow(df)){
    if(df[i,names[j]] == '' & df[i,1] == df[i-1,1]){
       df[i,names[j]] <- df[i-1,names[j]] 
    }
  }
}

我们可以获得以下期望的输出

> df
   'A' 'B'    'C' 'D' 'E'
1    1  50 'ABCD'  10  20
2    1  50 'JNHF'  10  20
3    1  50 'edfw'  10  20
4    2 100 'b984'        
5    2 100 'abcd'        
6    2 100 'abcd'        
7    3     'abcd'  24    
8    3     'b984'  24    
9    4  25 'JNHF'        
10   4  25 'JNHF'        
11   4  25 'b984' 

使用以下数据作为输入

df<- structure(list("'A'" = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L,4L), 
                    "'B'" = c("50", "", "", "100", "", "", "", "", "25", "", ""), 
                    "'C'" = structure(c(2L, 7L, 6L, 1L, 4L, 4L, 4L, 5L, 3L, 7L, 5L
                    ), .Label = c("'b984'", "'ABCD'", "'JNHF'", "'abcd'", 
                                  "'b984'", "'edfw'", "'JNHF'"), class = "factor"), 
                    "'D'" = c("10", "", "", "", "", "", "24", "", "", "", ""), 
                    "'E'" = c("20","", "", "", "", "", "", "", "", "", "")), row.names = c(NA,-11L), class = "data.frame")