我正在努力解决一些非常基本的问题:根据时间格式对数据框进行排序(月份,或者,在这种情况下为“%B-%y”)。我的目标是计算各种月度统计数据,从总和开始。
数据框架相关部分的部分看起来像这样*(这很好,并且符合我的目标。我在此处包含它以显示可能来源的问题)* :
> tmp09
Instrument AccountValue monthYear ExitTime
1 JPM 6997 april-07 2007-04-10
2 JPM 7261 mei-07 2007-05-29
3 JPM 7545 juli-07 2007-07-18
4 JPM 7614 juli-07 2007-07-19
5 JPM 7897 augustus-07 2007-08-22
10 JPM 7423 november-07 2007-11-02
11 KFT 6992 mei-07 2007-05-14
12 KFT 6944 mei-07 2007-05-21
13 KFT 7069 juli-07 2007-07-09
14 KFT 6919 juli-07 2007-07-16
# Order on the exit time, which corresponds with 'monthYear'
> tmp09.sorted <- tmp09[order(tmp09$ExitTime),]
> tmp09.sorted
Instrument AccountValue monthYear ExitTime
1 JPM 6997 april-07 2007-04-10
11 KFT 6992 mei-07 2007-05-14
12 KFT 6944 mei-07 2007-05-21
2 JPM 7261 mei-07 2007-05-29
13 KFT 7069 juli-07 2007-07-09
14 KFT 6919 juli-07 2007-07-16
3 JPM 7545 juli-07 2007-07-18
4 JPM 7614 juli-07 2007-07-19
5 JPM 7897 augustus-07 2007-08-22
10 JPM 7423 november-07 2007-11-02
到目前为止,这么好,基于ExitTime的排序工作。 当我尝试计算每月的总数时,麻烦就开始了,然后尝试对此输出进行排序:
# Calculate the total results per month
> Tmp09Totals <- tapply(tmp09.sorted$AccountValue, tmp09.sorted$monthYear, sum)
> Tmp09Totals <- data.frame(Tmp09Totals)
> Tmp09Totals
Tmp09Totals
april-07 6997
augustus-07 7897
juli-07 29147
mei-07 21197
november-07 7423
如何按时间顺序对此输出进行排序?
我已经尝试过(除了将monthYear转换为另一种日期格式的各种尝试):order,sort,sort.list,sort_df,reshape,以及基于tapply,lapply,sapply,aggregate计算总和。甚至重写rownames(通过给他们一个从1到长度(tmp09.sorted2$AccountValue
)的数字也不起作用。我还尝试根据我在另一个问题中学到的内容给每个月份一个不同的ID,但是,R在区分不同的月份价值方面也遇到了困难。
此输出的正确顺序为april-07,mei-07,juli-07,augustus07, november-07
:
apr-07 6997
mei-07 21197
jul-07 29147
aug-07 7897
nov-07 7423
答案 0 :(得分:9)
以正确的顺序使用单独的Month
和Year
因子会更容易,并且在两个变量的并集上使用tapply
,例如:
## The Month factor
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = month.name)))
## for @Jura25's locale, we can't use the in built English constant
## instead, we can use this solution, from ?month.name:
## format(ISOdate(2000, 1:12, 1), "%B"))
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = format(ISOdate(2000, 1:12, 1), "%B"))))
##
## And the Year factor
tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y")))
这给了我们(在我的语言环境中):
> head(tmp09)
Instrument AccountValue monthYear ExitTime Month Year
1 JPM 6997 april-07 2007-04-10 April 2007
2 JPM 7261 mei-07 2007-05-29 May 2007
3 JPM 7545 juli-07 2007-07-18 July 2007
4 JPM 7614 juli-07 2007-07-19 July 2007
5 JPM 7897 augustus-07 2007-08-22 August 2007
10 JPM 7423 november-07 2007-11-02 November 2007
然后使用tapply
两个因素:
> with(tmp09, tapply(AccountValue, list(Month, Year), sum))
2007
April 6997
May 21197
July 29147
August 7897
November 7423
或通过aggregate
:
> with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum))
Month Year x
1 April 2007 6997
2 May 2007 21197
3 July 2007 29147
4 August 2007 7897
5 November 2007 7423
答案 1 :(得分:4)
尝试在动物园中使用"yearmon"
类,因为它会进行适当的排序。下面我们创建示例DF
数据框,然后我们添加一个YearMonth
类"yearmon"
列。最后,我们执行聚合。实际处理只是最后两行(另一部分只是创建样本数据框)。
Lines <- "Instrument AccountValue monthYear ExitTime
JPM 6997 april-07 2007-04-10
JPM 7261 mei-07 2007-05-29
JPM 7545 juli-07 2007-07-18
JPM 7614 juli-07 2007-07-19
JPM 7897 augustus-07 2007-08-22
JPM 7423 november-07 2007-11-02
KFT 6992 mei-07 2007-05-14
KFT 6944 mei-07 2007-05-21
KFT 7069 juli-07 2007-07-09
KFT 6919 juli-07 2007-07-16"
library(zoo)
DF <- read.table(textConnection(Lines), header = TRUE)
DF$YearMonth <- as.yearmon(DF$ExitTime)
aggregate(AccountValue ~ YearMonth + Instrument, DF, sum)
这给出了以下内容:
> aggregate(AccountValue ~ YearMonth + Instrument, DF, sum)
YearMonth Instrument AccountValue
1 Apr 2007 JPM 6997
2 May 2007 JPM 7261
3 Jul 2007 JPM 15159
4 Aug 2007 JPM 7897
5 Nov 2007 JPM 7423
6 May 2007 KFT 13936
7 Jul 2007 KFT 13988
略有不同的方法和输出直接使用read.zoo
。它每个仪器产生一列,每年/每月产生一行。我们在列中使用"NULL"
为monthYear
列分配适当的类,因为我们不会使用该列。我们还指定时间索引是剩余列的第3列,我们希望输入按第1列拆分为列。 FUN=as.yearmon
表示我们希望时间索引从"Date"
类转换为"yearmon"
类,并使用sum
汇总所有内容。
z <- read.zoo(textConnection(Lines), header = TRUE, index = 3,
split = 1, colClasses = c("character", "numeric", "NULL", "Date"),
FUN = as.yearmon, aggregate = sum)
生成的zoo对象如下所示:
> z
JPM KFT
Apr 2007 6997 NA
May 2007 7261 13936
Jul 2007 15159 13988
Aug 2007 7897 NA
Nov 2007 7423 NA
我们可能更喜欢将它保留为动物园对象以利用动物园中的其他功能,或者我们可以将其转换为如下数据框:data.frame(Time = time(z), coredata(z))
这使得时间成为单独的列或{{1}它使用行名称的时间。 as.data.frame(z)
也有效。
答案 2 :(得分:3)
您可以按reorder
函数重新排序因子级别。
tmp09$monthYear <- reorder(tmp09$monthYear, as.numeric(as.Date(tmp09$ExitTime)))
诀窍是使用日期的数字表示作为1970-01-01以来的天数(参见?Date
)并使用它的平均值作为参考。
答案 3 :(得分:1)
编辑:我最初误解了这个问题。首先复制问题中给出的数据,然后
> tmp09 <- read.table(file="clipboard", header=TRUE)
> Sys.setlocale(category="LC_TIME", locale="Dutch_Belgium.1252")
[1] "Dutch_Belgium.1252"
# create POSIXlt variable from monthYear
> tmp09$d <- strptime(paste("2007", tmp09$monthYear, sep="-"), "%Y-%B-%d")
# create ordered factor
> tmp09$dFac <- droplevels(cut(tmp09$d, breaks="month", ordered=TRUE))
> tmp09[order(tmp09$d), ]
Instrument AccountValue monthYear ExitTime d dFac
1 JPM 6997 april-07 2007-04-10 2007-04-07 2007-04-01
2 JPM 7261 mei-07 2007-05-29 2007-05-07 2007-05-01
11 KFT 6992 mei-07 2007-05-14 2007-05-07 2007-05-01
12 KFT 6944 mei-07 2007-05-21 2007-05-07 2007-05-01
3 JPM 7545 juli-07 2007-07-18 2007-07-07 2007-07-01
4 JPM 7614 juli-07 2007-07-19 2007-07-07 2007-07-01
13 KFT 7069 juli-07 2007-07-09 2007-07-07 2007-07-01
14 KFT 6919 juli-07 2007-07-16 2007-07-07 2007-07-01
5 JPM 7897 augustus-07 2007-08-22 2007-08-07 2007-08-01
10 JPM 7423 november-07 2007-11-02 2007-11-07 2007-11-01
> Tmp09Totals <- tapply(tmp09$AccountValue, tmp09$dFac, sum)
> Tmp09Totals
2007-04-01 2007-05-01 2007-07-01 2007-08-01 2007-11-01
6997 21197 29147 7897 7423
答案 4 :(得分:1)
看起来主要问题是如何按时间顺序对一系列Month-Year字符串进行排序。最简单的方法是在每个Month-Year字符串的开头预先挂起“01”并将它们排序为常规日期。所以采取你的最终数据框架Tmp09Totals,并执行此操作:
monYear <- rownames(Tmp09Totals)
sortedMonYear <- format(sort( as.Date( paste('01-', monYear, sep = ''),
'%d-%B-%y')),
'%B-%y')
Tmp09Totals[ sortedMonYear, , drop = FALSE]
答案 5 :(得分:0)
旧帖但值得采用data.table
方法:
按照@caracal
的描述读入数据并设置本地> Sys.setlocale(category="LC_TIME", locale="Dutch_Belgium.1252")
[1] "Dutch_Belgium.1252"
> tmp09 <- read.table(file="clipboard", header=TRUE)
> tmp09$ExitTime <- as.Date(tmp09$ExitTime)
按要求汇总数据
require(data.table)
> data.table(tmp09)[,
+ .(Tmp09Total = sum(AccountValue)),
+ by = .(Date = format(ExitTime, "%B-%y"))]
Date Tmp09Total
1: april-07 6997
2: mei-07 21197
3: juli-07 29147
4: augustus-07 7897
5: november-07 7423