按日拆分数据集并将其另存为数据框

时间:2014-03-26 14:00:24

标签: r

我有一个包含2个月数据的数据集(2月和3月)。我是否可以知道如何在白天将数据分成59个数据子集并将其保存为数据框(2月为28天,3月为31天)?最好根据日期以不同的名称保存数据帧,即20140201,20140202等。

    df <- structure(list(text = structure(c(4L, 6L, 5L, 2L, 8L, 1L), .Label = c(" Terpilih Jadi Maskapai dengan Pelayanan Kabin Pesawat cont", 
    "booking number ZEPLTQ I want to cancel their flight because they can not together  with my wife and kids", 
    "Can I change for the traveler details because i choose wrongly for the Mr or Ms part", 
    "cant do it with cards either", "Coming back home AK", "gotta try PNNL", 
    "Jadwal penerbangan medanjktsblm tangalmasi ada kah", "Me and my Tart would love to flyLoveisintheAir", 
    "my flight to Bangkok onhas been rescheduled I couldnt perform seat selection now", 
    "Pls checks his case as money is not credited to my bank acctThanks\n\nCASLTP", 
    "Processing fee Whatt", "Tacloban bound aboardto get them boats Boats boats boats Tacloban HeartWork", 
    "thanks I chatted with ask twice last week and told the same thing"
    ), class = "factor"), created = structure(c(1L, 1L, 2L, 2L, 3L, 
    3L), .Label = c("1/2/2014", "2/2/2014", "5/2/2014", "6/2/2014"
    ), class = "factor")), .Names = c("text", "created"), row.names = c(NA, 
    6L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

您不需要输出多个数据帧。您只需要按照创建的&#39;年份和月份选择/子集它们。领域。所以这有两种方法可以做到这一点:1。如果你不打算再需要日期算术,那就更简单了

# 1. Leave 'created' a string, just use text substitution to extract its month&date components
df$created_mthyr <- gsub( '([0-9]+/)[0-9]+/([0-9]+)', '\\1\\2', df$created )

# 2. If you need to do arbitrary Date arithmetic, convert 'created' field to Date object
# in this case you need an explicit format-string 
df$created <- as.Date(df$created, '%M/%d/%Y')

# Now you can do either a) split
split(df, df$created_mthyr)
# specifically if you want to assign the output it creates to 3 dataframes:

df1 <- split(df, df$created_mthyr)[[1]]
df2 <- split(df, df$created_mthyr)[[2]]
df5 <- split(df, df$created_mthyr)[[3]]

# ...or else b) do a Split-Apply-Combine and perform arbitrary command on each separate subset. This is very powerful. See plyr/ddply documentation for examples.
require(plyr)
df1 <- dlply(df, .(created_mthyr))[[1]]
df2 <- dlply(df, .(created_mthyr))[[2]]
df5 <- dlply(df, .(created_mthyr))[[3]]

# output looks like this - strictly you might not want to keep 'created','created_mthyr':
> df1
#                          text  created created_mthyr
#1 cant do it with cards either 1/2/2014        1/2014
#2               gotta try PNNL 1/2/2014        1/2014

> df2                                                                                                         
#3                                                                                        
#Coming back home AK
#4 booking number ZEPLTQ I want to cancel their flight because they can not together  with my wife and kids
#   created created_mthyr
#3 2/2/2014        2/2014
#4 2/2/2014        2/2014