我写了一些循环遍历文件夹中xlsx文件的代码。在循环的某个阶段,数据帧如下图所示。我要实现的是将B列的值与A列的值一起复制。因此:复制B列,直到A列中的组更改值。如果A列中的组没有值,请将其留空。这将导致第二个数据帧
'A' 'B' 'C' 'D' 'E'
1 50 'ABCD' 10 20
1 'JNHF'
1 'edfw'
2 100 'b984'
2 'abcd'
2 'abcd'
3 'abcd' 24
3 'b984'
4 25 'JNHF'
4 'JNHF'
4 'b984'
结果将是这样:
'A' 'B' 'C' 'D' 'E'
1 50 'ABCD' 10 20
1 50 'JNHF' 10 20
1 50 'edfw' 10 20
2 100 'b984'
2 100 'abcd'
2 100 'abcd'
3 'abcd' 24
3 'b984' 24
4 25 'JNHF'
4 25 'JNHF'
4 25 'b984'
为此,我编写了以下代码。
names <- c('B','D','E')
for(j in 1:length(names)){
for(i in 2:nrow(df)){
if(df[,names[j]][i] == '' & df[,names[1]][i] == df[,names[1]][i-1] ){
df[,numbers[j]][i] <- df[,names[j]][i-1]
}
}
}
代码返回:
Error in if (df[, names[j]][i] == "" & df[, names[1]][i] == df[, names[1]][i - :
argument is of length zero
我该如何解决?
答案 0 :(得分:0)
用pr/number
替换空白,然后使用NA
。
tidyr::fill
数据
library(dplyr)
df %>% mutate_at(vars(names), na_if, "") %>% group_by(A) %>% tidyr::fill(names)
# A B C
# <int> <chr> <fct>
# 1 1 50 ABCD
# 2 1 50 JNHF
# 3 1 50 edfw
# 4 2 100 b984
# 5 2 100 abcd
# 6 2 100 abcd
# 7 3 NA abcd
# 8 3 NA b984
# 9 4 25 JNHF
#10 4 25 JNHF
#11 4 25 b984
答案 1 :(得分:0)
Base R解决方案(使用@RonakShah提供的数据-谢谢):
# Convert factors to character strings: clean_df => data.frame
clean_df <- data.frame(lapply(df, function(w){if(is.factor(w)){as.character(w)}else{w}}),
stringsAsFactors = FALSE)
# Replace blank stirngs with values filled downwards grouping by A: stdout
data.frame(lapply(clean_df, function(x){
return(ave(x, clean_df$A, FUN = function(z){
ifelse(any(!(is.na(z))), na.omit(z)[cumsum(!is.na(z))], NA)
}
)
)
}
)
)
答案 2 :(得分:0)
@Timminator,对names
变量和for
循环进行了少许修改,如下所示:
names <- c("'B'","'D'","'E'")
#
for(j in 1:length(names)){
for(i in 2:nrow(df)){
if(df[i,names[j]] == '' & df[i,1] == df[i-1,1]){
df[i,names[j]] <- df[i-1,names[j]]
}
}
}
我们可以获得以下期望的输出
> df
'A' 'B' 'C' 'D' 'E'
1 1 50 'ABCD' 10 20
2 1 50 'JNHF' 10 20
3 1 50 'edfw' 10 20
4 2 100 'b984'
5 2 100 'abcd'
6 2 100 'abcd'
7 3 'abcd' 24
8 3 'b984' 24
9 4 25 'JNHF'
10 4 25 'JNHF'
11 4 25 'b984'
使用以下数据作为输入
df<- structure(list("'A'" = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L,4L),
"'B'" = c("50", "", "", "100", "", "", "", "", "25", "", ""),
"'C'" = structure(c(2L, 7L, 6L, 1L, 4L, 4L, 4L, 5L, 3L, 7L, 5L
), .Label = c("'b984'", "'ABCD'", "'JNHF'", "'abcd'",
"'b984'", "'edfw'", "'JNHF'"), class = "factor"),
"'D'" = c("10", "", "", "", "", "", "24", "", "", "", ""),
"'E'" = c("20","", "", "", "", "", "", "", "", "", "")), row.names = c(NA,-11L), class = "data.frame")