在R

时间:2017-02-12 12:13:13

标签: r

我在一个结构相同的目录中有几个xlsx文件(即A,B,C列);每个文件都是一天的数据。 我需要导入R中的所有数据,并找出一天和下一天之间的差异。

files <- list.files(pattern = ".xlsx")
for (i in seq_along(files)) {
    assign(paste("Day", i, sep = "."), read.xlsx(files[i]))
}

我无法弄清楚如何使用导入的数据。 例如

Day.1 <- data.frame(Day.1)
Day.1$A <- as.character(Day.1$A)
Day.2 <- data.frame(Day.2)
Day.2$A <- as.character(Day.2$A)
anti_join (Day.1, Day.2)

这段代码运行正常但是如何使用变量?

Day.[i] <- data.frame(Day.[i])
Day.[i]$A <- as.character(Day.[i]$A)
Day.[i+1] <- data.frame(Day.[i+1])
Day.[i+1]$A <- as.character(Day.[i+1]$A)
anti_join (Day.[i], Day.[i+1])

我尝试在单个数据框中导入所有文件,但我对如何使用新数据有类似的问题

file.list <- list.files(pattern='*.xlsx')
days.list <- lapply(file.list, read_excel)
days <- rbindlist(days.list, idcol = "id")
days <- data.frame(days)
days$B <- as.character(days$B)

但我不知道该怎么做:

day1 <- filter(days, id==1)
day2 <- filter(days, id==2)
diff1 <- anti_join (day1, day2, by=c("B", "C"))

使用计数器变量(i)

day(i) <- filter(days, id==(i))
day(i+1) <- filter(days, id==(i+1))
diff1 <- anti_join (day1, day2, by=c("B", "C"))

1 个答案:

答案 0 :(得分:1)

考虑在( days )和( days + 1 )的数据帧列表之间使用基本R Mapmapply的包装器),分别是dplyr::anti_join的左侧和右侧。当然,最后一天不会进行前瞻性比较。

library(xlsx)
library(dplyr)

file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, function(f){
    read.xlsx(f, 1, stringsAsFactors = FALSE)
})

left_days <- df.list[1:length(df.list)-1]    # SUBSET OUT LAST DAY
right_days <- df.list[2:length(df.list)]     # SUBSET OUT FIRST DAY 

# WITHOUT ARGS
anti_join_list <- Map(anti_join, left_days, right_days)

# WITH ARGS
anti_join_list <- Map(function(x,y) anti_join(x, y, by=c("B", "C")), left_days, right_days)