将大型CSV文件拆分为每周和每日窗口

时间:2017-09-26 11:52:42

标签: r csv dataframe split nested-loops

我是R的初学者用户,我有一个包含每周和每日信息的大型CVS文件,我想一次读取它然后将文件存储到小文件(窗口)中以单独处理它们。

原始文件是具有周(整数)和日(从周一到周五的字符)和其他属性的数据框。

正如我所说,我希望将文件存储到:W1,W2,W3,...... Wn(周数取决于信息而我现在不提前,但它在10-之间11) 而且我想存储每天信息D1,D2,D3,D4,D5

我尝试了以下代码,但它没有像我期望的那样工作。

myclasses = read.csv("C:/myfile.csv") 
i=1
weekdays <- list('Monday','Tuesday','Wednesday','Thursday','Friday')
for (i <= myclasses$Week_number)
{
tmp1 <- paste("W", i, sep = "")
assign(tmp1, myclasses %>% filter(Week_number == i))
j = 'Monday'
for (j in weekdays)
{
tmp2 <- paste("D", j, sep = "")
assign(tmp2, myclasses %>% filter(Week_number == i,Day == j )) 
}
i = (i +1)
}

我也试过循环,但它导致创建了大量的文件。 为了清楚起见,我想处理天窗,直到创建一周窗口,然后是第二周,直到创建第二周窗口,依此类推。

你可以帮帮我吗?

1 个答案:

答案 0 :(得分:1)

我根据你在上面的评论中显示的输入在这里采取了刺。这应该允许您将文件保存为单独的csv或data.frames。

# this assume the 1st column are row name/id.
dt<-read.table(text="
Week_number Day hour Hour.Min W_interval interval Sensor_Location 
1 1 Monday 7 07:00:00 HS peak S3 
2 1 Monday 7 07:00:00 HS peak S1 
3 1 Monday 7 07:00:00 HS peak S2 
4 1 Monday 7 07:00:00 HS peak S1
5 2 Monday 7 07:00:00 HS peak S1
6 2 Tuesday 7 07:00:00 HS peak S1", header=T)

dt$Day<- as.numeric(dt$Day) #might have to be careful with the order of the dates
#ordered list would be more solid

#splitting based on Week_number and Day column
dt.split1<-split(dt, list(dt$Week_number,dt$Day))

library(stringr) #required for str_sub

#this should save the file as in W"X"D"X".csv in your current directory.
lapply(1:length(dt.split1), function(i) write.csv(dt.split1[[i]], 
                                            file = paste0("W",str_sub(names(dt.split1)[i],1,1),
                                                          "D",str_sub(names(dt.split1)[i],-1), 
                                                          ".csv"),
                                            row.names = FALSE))
#output:W1D1.csv

#alternatively if you want them as data frame.
list2env(dt.split1,envir=.GlobalEnv)