通过在两列中调整值来聚合

时间:2013-12-27 06:45:03

标签: r aggregate

我的样本数据文件的输入[实际数据包含40K记录]

structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("build", "client"), class = "factor"), 
V2 = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
9L, 9L), V3 = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("kwadmin", "kwbuildproject", 
"plugin.msvs"), class = "factor"), V4 = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("11/17/2013", 
"11/18/2013", "11/19/2013"), class = "factor"), V5 = structure(c(5L, 
5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 1L, 2L), .Label = c("INPRKUL1", 
"MUSTMAT1", "nzarvan", "semaols5", "USBVO-builduser"), class = "factor"), 
V6 = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 2L, 1L), .Label = c("fi-l-7001180", "in-l-kbxi012108", 
"nznpe-l-w700029", "sevst-l-0008645", "usbvo-w-0078540"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6"), class = "data.frame", row.names = c(NA, 
-15L))

这里我想使用聚合函数通过计算唯一用户名字段来计算唯一用户的数量,然后使用每个用户名来搜索文件以获取他们已登录的日期,然后计算其数量功能及其名称,特定用户在特定日期使用过。

以下是尝试的内容。我不太确定这是否正确,因为我是R的新手。任何帮助都将不胜感激。

dat<-read.csv("test.csv")
names(dat)<-c("Catagory","Feature_Version","Feature","Date","User_Name","Host_Name")
dat$Date[is.na(dat$Date)]<-0
dat$Date<-as.Date(dat$Date,"%m/%d/%Y")
dat<-na.omit(dat)
#agg<-aggregate(cbind(Date)~User_Name,FUN=mean,by=list(unique(tolower(dat$User_Name))))

如果这是一个愚蠢的问题,请原谅我。

1 个答案:

答案 0 :(得分:0)

我不确定你要找的是什么输出。以下是我对你的问题的解释。

计算唯一的用户名:

nlevels(dat$User_Name)
# [1] 5

计算每个用户和每个日期的唯一功能:

aggregate(Feature ~ User_Name + Date, dat, function(x) length(unique(x)))
#         User_Name       Date Feature
# 1         nzarvan 2013-11-17       1
# 2 USBVO-builduser 2013-11-17       2
# 3         nzarvan 2013-11-18       1
# 4        INPRKUL1 2013-11-19       1
# 5        MUSTMAT1 2013-11-19       1
# 6        semaols5 2013-11-19       1