我从Excel阅读销售交易表,我有兴趣知道每个地点出售第一批商品后1小时内的销售数量。而且,我想他们有多少是通过卡与现金/ 设A为销售报告,我想创建B。
A=
item Location Time Payment
X Canada 10:03:18 CreditC
X Canada 10:08:38 Cash
X Canada 10:24:46 Cash
X Canada 11:16:35 Cash
X US 10:00:16 Cash
X US 11:52:12 CreditC
Y Canada 2:08:38 CreditC
Y Canada 4:01:48 Cash
Y US 13:32:02 CreditC
Y US 14:07:03 Cash
item location first sale count CreditCard Cash
X Canada 10:03:18 3 1 2
X US 10:00:16 1 0 1
Y Canada 2:08:38 1 1 0
Y US 13:32:02 2 1 1
我做了这个,这给了我第6行和第6行的错误9.我写了一些可以完成这项工作的变通办法,但我想知道最好的办法是什么。
#group the transactions within the time interval
df['start'] = pd.to_datetime(df['Time'])
grouped = df.groupby(['item', 'Location', 'Time'])
df['end'] = (grouped['start'].transform(lambda grp: grp.min()+pd.Timedelta(minutes=interval)))
df['count'] = (df['start'] < df['end'])
df['CreditCard'] = (df.Payment.map(len) == 7 and df['start'] < df['end'])
Summary = pd.DataFrame(grouped['count'].sum()).reset_index()
Summary['CreditCard']=pd.Sereis(grouped['CreditCard'].sum(), index=Summary.index)
答案 0 :(得分:0)
您可以使用pd.crosstab
生成频率表:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Location': ['Canada', 'Canada', 'Canada', 'Canada', 'US', 'US', 'Canada', 'Canada', 'US', 'US'], 'Payment': ['CreditC', 'Cash', 'Cash', 'Cash', 'Cash', 'CreditC', 'CreditC', 'Cash', 'CreditC', 'Cash'], 'Time': ['10:03:18', '10:08:38', '10:24:46', '11:16:35', '10:00:16', '11:52:12', '2:08:38', '4:01:48', '13:32:02', '14:07:03'], 'item': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y']})
df['start'] = pd.to_datetime(df['Time'])
grouped = df.groupby(['item', 'Location'])
interval = 60
df['end'] = (grouped['start'].transform(lambda grp: grp.min()+pd.Timedelta(minutes=interval)))
# isolate just the rows where the transaction occurs within an hour of first sale
df2 = df.loc[(df['start'] < df['end'])]
result = pd.crosstab(index=[df2['item'], df2['Location']], columns=[df2['Payment']])
result['count'] = result['Cash'] + result['CreditC']
result['first sale'] = grouped['Time'].first()
产量
Payment Cash CreditC count first sale
item Location
X Canada 2 1 3 10:03:18
US 1 0 1 10:00:16
Y Canada 0 1 1 2:08:38
US 1 1 2 13:32:02
答案 1 :(得分:0)
interval = 60 # minutes
df.sort_values('Time', inplace=True)
gb = df.groupby(['item', 'Location'], sort=False).apply(
lambda group: group[group.Time <=
group.Time.iat[0] + pd.Timedelta(minutes=interval)].Payment)
gb = gb.reset_index().groupby(['item', 'Location']).Payment.value_counts()
gb = gb.unstack('Payment').fillna(0)
gb['count'] = gb.sum(axis=1)
>>> gb
Payment Cash CreditC count
item Location
X Canada 2 1 3
US 1 0 1
Y Canada 0 1 1
US 1 1 2
答案 2 :(得分:0)
import datetime as dt
def first_hour(x):
start = x.iloc[0]['Time']
end = start + dt.timedelta(hours=1)
df = x[(start <= x.Time) & (x.Time <= end)].groupby('Payment').count().T
df['count'] = df.sum()
df['first sale'] = start
return df.iloc[[0]]
B = A.groupby(['item', 'Location']).apply(first_hour).fillna(0)
B = B.reset_index()[['item', 'Location', 'first sale', 'count', 'CreditC', 'Cash']]
item Location first sale count CreditC Cash
0 X Canada 2016-05-09 10:03:18 0.0 1.0 2.0
1 X US 2016-05-09 10:00:16 0.0 0.0 1.0
2 Y Canada 2016-05-09 02:08:38 0.0 1.0 0.0
3 Y US 2016-05-09 13:32:02 0.0 1.0 1.0