R:在数据集中创建新列

时间:2017-12-12 14:14:49

标签: r data-mining

我有一个包含这些变量的交易数据集: 你可以在这里下载:https://yadi.sk/d/BIXivmVJ34Akbn

它有点不同,相反,如果 id ,则客户ID

id,mmc_code - 交易代码,tr_datetime,tr_type - 交易类型,金额,term_id - 终端ID,性别。

我想创建一个新列trans_count,它是每人每天的交易次数(id)。我怎样才能做到这一点?非常感谢。

我在这里分开了日期和时间。

trans_test<-read_csv("~/shared/minor3_2017/3-SecondYear-ML/hw_data/transactions_train.csv")
trans_train <- separate (trans_train, col=tr_datetime, into=c("day", "time"), sep=" ")
trans_train$day<-as.integer(trans_train$day)

dput(head(trans_train)) 

输出

structure(list(day = c(0L, 0L, 0L, 0L, 0L, 0L), time = c("03:16:05", 
"11:36:09", "11:37:11", "12:20:45", "12:36:57", "13:53:33"), 
mcc_code = c(6011L, 5499L, 5411L, 5912L, 5499L, 4814L), tr_type = c(2010L, 
1010L, 1010L, 1010L, 1010L, 1030L), amount = c(-950, -13.5, 
-271.43, -134, -544, -100), term_id = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), id = c(1726L, 1726L, 1726L, 1726L, 1726L, 1726L)), .Names = c("day", 
"time", "mcc_code", "tr_type", "amount", "term_id", "id"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

1 个答案:

答案 0 :(得分:0)

我不知道以您描述的方式添加列的简洁方法。但是,如果要创建新的摘要表,可以使用:

library(dplyr)

trans_train %>%
        group_by(day, id) %>%
        summarize(transactions_per_day_per_costumer = n())