我每天都会发布一份excel报告,我需要总结并提供趋势分析。此报告包含具有创建日期,工作项类型的工作项列表。如何计算2011年,2012年创建的工作项?另外,如何按工作项类型获取计数?到目前为止,我已经能够通过执行以下操作来加载excel数据并获得行数 -
library(gdata)
wi20121812 = read.xls("WorkItemReport20121812.xls")
nrow(wi20121812)
样本数据
> dput(head(workItemReport2))
structure(list(DocType = structure(c(6L, 7L, 6L, 6L, 8L, 6L), .Label = c("TYPE10WI",
"TYPE11WI", "TYPE12WI", "TYPE13WI", "TYPE14WI", "TYPE1WI", "TYPE2WI",
"TYPE3WI", "TYPE4WI", "TYPE5WI", "TYPE6WI", "TYPE7WI", "TYPE8WI",
"TYPE9WI"), class = "factor"), CreatedDate = structure(c(7L,
22L, 146L, 181L, 153L, 191L), .Label = c("1/10/12 15:43 AM/PM ",
"1/10/12 16:06 AM/PM ", "1/10/12 5:28 AM/PM ", "1/10/12 5:56 AM/PM ",
"1/11/12 19:51 AM/PM ", "1/11/12 5:26 AM/PM ", "1/12/11 21:58 AM/PM ",
"1/12/12 11:08 AM/PM ", "1/12/12 5:41 AM/PM ", "1/12/12 9:56 AM/PM ",
"1/13/12 14:01 AM/PM ", "1/13/12 15:08 AM/PM ", "1/13/12 15:11 AM/PM ",
"1/13/12 8:51 AM/PM ", "1/16/12 10:27 AM/PM ", "1/16/12 10:28 AM/PM ",
"1/16/12 16:37 AM/PM ", "1/16/12 7:52 AM/PM ", "1/18/12 15:02 AM/PM ",
"1/18/12 16:03 AM/PM ", "1/18/12 16:13 AM/PM ", "1/19/11 19:23 AM/PM ",
"1/20/12 10:48 AM/PM ", "1/20/12 12:23 AM/PM ", "1/20/12 8:38 AM/PM ",
"1/23/12 5:53 AM/PM ", "1/24/12 15:18 AM/PM ", "1/24/12 8:23 AM/PM ",
"1/24/12 8:58 AM/PM ", "1/25/12 11:38 AM/PM ", "1/25/12 5:28 AM/PM ",
"1/26/12 13:48 AM/PM ", "1/26/12 15:53 AM/PM ", "1/26/12 15:58 AM/PM ",
"1/26/12 16:13 AM/PM ", "1/26/12 16:18 AM/PM ", "1/26/12 7:33 AM/PM ",
"1/27/12 7:48 AM/PM ", "1/3/12 17:48 AM/PM ", "1/3/12 18:33 AM/PM ",
"1/3/12 9:07 AM/PM ", "1/30/12 11:22 AM/PM ", "1/30/12 22:52 AM/PM ",
"1/30/12 23:10 AM/PM ", "1/31/12 19:54 AM/PM ", "1/31/12 20:39 AM/PM ",
"1/31/12 5:42 AM/PM ", "1/31/12 9:42 AM/PM ", "1/4/12 14:02 AM/PM ",
"1/4/12 9:52 AM/PM ", "1/5/12 13:42 AM/PM ", "1/5/12 17:42 AM/PM ",
....
....
"9/6/12 9:02 AM/PM ", "9/7/12 11:48 AM/PM ", "9/7/12 12:58 AM/PM ",
"9/7/12 13:52 AM/PM ", "9/7/12 15:07 AM/PM ", "9/7/12 15:12 AM/PM ",
"9/7/12 15:22 AM/PM ", "9/7/12 15:47 AM/PM ", "9/7/12 15:52 AM/PM ",
"9/7/12 8:42 AM/PM ", "9/7/12 9:32 AM/PM ", "9/8/11 23:43 AM/PM "
), class = "factor")), .Names = c("DocType", "CreatedDate"), row.names = c(NA,
6L), class = "data.frame")
>
答案 0 :(得分:1)
您的问题的一部分仍然没有答案,“如何获得工作项类型的计数”非常简单。
res <- table(wi20121812[, "WorkItemType"])
这将为您提供一个简单的表格,告诉您每个WorkItemType发生的频率。如果你需要按比例而不是绝对计数,请在结果上运行prop.table():
prop.table(res)
或者同时做到这两点:
res <- prop.table(table(wi20121812[, "WorkItemType"]))
答案 1 :(得分:0)
您可以使用ddply
包中的plyr
:
res = ddply(df, "year", summarise, amount = length(year))
或使用count
形成相同的包(更容易):
res = count(df, "year")
其中df
是包含您的数据的data.frame
,year
是列的列名,其中包含详细描述该行创建年份的分类变量。