我在R中有以下代码:
base_historicals <- sqldf("
select
date,
case when channel is null then 'Unknown' else channel end as channel,
phone,
count(*) as sales
from
transaction_phones as t
left join
acquisition as a
on
t.transaction_id = a.transaction_id
where
phone != 'droid'
group by
date,
case when channel is null then 'Unknown' else channel end,
phone
")
# Reshape the historicals to be what we need for HTS
base_historicals <- melt(base_historicals, id=c("phone","channel","date"), measured=c("sales"))
base_historicals <- cast(base_historicals, date ~ phone + channel, sum, add.missing = TRUE)
我希望这会给我一个数据框,每个电话/频道组合一列,每天一行,以及每个单元格中的相应销售数字。我使用add.missing参数,因为我希望reshape
假设,如果缺少电话/频道/日期的组合,它应该用零填充这些单元格。
奇怪的是,我已经为droid
手机获取了专栏,即使我已经明确地将其从我的数据框中过滤掉了!
使用sqldf进行验证证明它不在那里。
sqldf('select distinct phone from base_historicals')
,在第一个sqldf调用之后运行将返回所有电话的列表 - 并且机器人不在那里。但是只要我调用reshape
函数,它就会出现!
世界上reshape
如何知道数据框中甚至不存在的价值?我怎么能阻止这种行为?