在R中,我希望一次一个地提取我的目录中的每个csv文件作为数据帧并为每个项目执行求和
例如,在路径/数据中,我有以下4个文件:
View_Mag_2018_03_01
View_Mag_2018_03_02
View_Mag_2018_03_03
View_Mag_2018_03_04
每个文件都有一个如下所示的数据框:
place number
1 chamber1 1
2 chamber2 1
3 chamber3 2
4 chamber4 4
5 chamber1 1
6 chamber3 3
我想为每个数据帧创建4个数据框(chamber1,chamber2,chamber3,chamber4),第二列的数字总和以及第一列的csv文件名提取的日期:
chamber1 df的例子:
date sum
1 O1/03/2018 2
chamber2 df的例子:
date sum
1 O1/03/2018 1
依此类推4个已创建的数据框,依此类推,所有目录文件在这4个数据框上添加行
感谢您的帮助
答案 0 :(得分:0)
试
# need the file vector (import with list.files)
files <- c("View_Mag_2018_03_01.csv",
"View_Mag_2018_03_02.csv",
"View_Mag_2018_03_03.csv",
"View_Mag_2018_03_04.csv")
比我们将其转换为日期
library(tidyverse)
dates <- files %>%
as_tibble() %>%
separate(value, c("View", "Mag", "Y", "M", "D"), sep = "_") %>%
select(Y, M, D) %>%
unite(date, c("D", "M", "Y")) %>%
mutate(date=as.Date(date,"%d_%m_%Y")) %>%
as.data.frame()
现在我们读取所有文件并创建一个主df
df <- NULL
for (i in seq_along(files)) {
df[[i]] <- read_csv(files[i]) %>% select(place, number) %>%
group_by(place) %>%
summarise(sum=sum(number)) %>%
mutate(date=dates[i,1])
}
result <- bind_rows(df)
现在我们分成4 df
chamber1 <- result %>% filter(place=="chamber1") %>% select(date, sum)
chamber2 <- result %>% filter(place=="chamber2") %>% select(date, sum)
chamber3 <- result %>% filter(place=="chamber3") %>% select(date, sum)
chamber4 <- result %>% filter(place=="chamber4") %>% select(date, sum)
现在你有4个房间df
。
应该有用,请告诉我。
答案 1 :(得分:0)
首先让它重现
df0 <- read.table(text=" place number
1 chamber1 1
2 chamber2 1
3 chamber3 2
4 chamber4 4
5 chamber1 1
6 chamber3 3")
files <- c(
'View_Mag_2018_03_01',
'View_Mag_2018_03_02',
'View_Mag_2018_03_03',
'View_Mag_2018_03_04')
dir.create("Path/data",recursive = TRUE)
sapply(file.path("Path/data",files),write.csv,x=df0)
<强>溶液强>
我们读取列表中的data.frames,使用文件名添加日期列。
dfs <- lapply(file.path(path,files),
function(x) cbind(date=as.Date(substr(x,nchar(x)-9,nchar(x)),format="%Y_%m_%d"),
read.csv(x)))
然后我们将所有数据帧组合在一起
big_df <- do.call(rbind,dfs)
汇总以获得我们的结果
agg <- aggregate(number ~ date +place,big_df,sum)
# date place number
# 1 2018-03-01 chamber1 2
# 2 2018-03-02 chamber1 2
# 3 2018-03-03 chamber1 2
# 4 2018-03-04 chamber1 2
# 5 2018-03-01 chamber2 1
# 6 2018-03-02 chamber2 1
# 7 2018-03-03 chamber2 1
# 8 2018-03-04 chamber2 1
# 9 2018-03-01 chamber3 5
# 10 2018-03-02 chamber3 5
# 11 2018-03-03 chamber3 5
# 12 2018-03-04 chamber3 5
# 13 2018-03-01 chamber4 4
# 14 2018-03-02 chamber4 4
# 15 2018-03-03 chamber4 4
# 16 2018-03-04 chamber4 4
然后你可以根据需要分开
splitdf <- split(agg,agg$place)
# $chamber1
# date place number
# 1 2018-03-01 chamber1 2
# 2 2018-03-02 chamber1 2
# 3 2018-03-03 chamber1 2
# 4 2018-03-04 chamber1 2
#
# $chamber2
# date place number
# 5 2018-03-01 chamber2 1
# 6 2018-03-02 chamber2 1
# 7 2018-03-03 chamber2 1
# 8 2018-03-04 chamber2 1
#
# $chamber3
# date place number
# 9 2018-03-01 chamber3 5
# 10 2018-03-02 chamber3 5
# 11 2018-03-03 chamber3 5
# 12 2018-03-04 chamber3 5
#
# $chamber4
# date place number
# 13 2018-03-01 chamber4 4
# 14 2018-03-02 chamber4 4
# 15 2018-03-03 chamber4 4
# 16 2018-03-04 chamber4 4
你可以使用带有lapply的assign来获得不同的data.frames,但你可能不应该这样做。
<强>清理强>
unlink(path)