以下是我要运行的代码。这是来自Coursera的代码。我无法在以下代码中运行sqldf函数:-
data = read.delim(file = 'purchases.txt', header = FALSE, sep = '\t', dec = '.')
str(data)
colnames(data) = c('customer_id', 'purchase_amount', 'date_of_purchase')
data$date_of_purchase = as.Date(data$date_of_purchase, "%Y-%m-%d")
data$days_since = as.numeric(difftime(time1 = "2016-01-01",
time2 = data$date_of_purchase,
units = "days"))
head(data)
summary(data)
library(sqldf)
customers = sqldf("SELECT customer_id ,
MIN(days_since) AS 'recency',
COUNT(*) AS 'frequency',
AVG(purchase_amount) AS 'amount'
FROM data GROUP BY 1")
答案 0 :(得分:1)
在使用sqldf
函数加载后,必须在R中安装sqldf()
软件包才能使用library()
函数。
要在R中安装sqldf
,请使用install.packages()
函数。
这是OP代码的完全可复制的版本,包括用于安装install.packages()
的{{1}}:
sqldf
...以及输出:
textFile <- "
001,42.5,2017-01-01
001,38.7,2017-05-02
002,47.9,2017-06-05"
# commented out original data read section
# data = read.delim(file = 'purchases.txt', header = FALSE, sep = '\t', dec = '.')
# str(data)
# replace with inline data and read.csv()
data <- read.csv(text=textFile,header=FALSE,stringsAsFactors=FALSE)
colnames(data) = c('customer_id', 'purchase_amount', 'date_of_purchase')
data$date_of_purchase = as.Date(data$date_of_purchase, "%Y-%m-%d")
data$days_since = as.numeric(difftime(time1 = "2016-01-01",
time2 = data$date_of_purchase,
units = "days"))
head(data)
summary(data)
# only need to run install.packages() once
install.packages("sqldf")
library(sqldf)
customers = sqldf("SELECT customer_id ,
MIN(days_since) AS 'recency',
COUNT(*) AS 'frequency',
AVG(purchase_amount) AS 'amount'
FROM data GROUP BY 1")
customers