我有以下数据
date qty p_id type
2014-08-04 21:04:00 3 a inward
2014-08-04 22:04:00 3 a outward
2014-08-04 21:04:00 10 b inward
2014-08-04 10:04:00 5 b outward
2014-10-04 21:04:00 40 c inward
2014-11-04 21:04:00 5 c outward
2014-10-05 21:04:00 10 c inward
2014-09-05 21:04:00 4 b outward
代码到目前为止我已经尝试过了。它看起来效率不高,数据也不合适。
df = pd.DataFrame({
'date': ['2014-08-04 21:04:00','2014-08-04 22:04:00','2014-08-04 21:04:00','2014-08-04 10:04:00','2014-10-04 21:04:00','2014-11-04 21:04:00','2014-10-05 21:04:00','2014-09-05 21:04:00'],
'p_id' :['a','a','b','b','c','c','c','b'],
'qty' :[3,3,10,5,40,5,10,4],
'type' :['inward','outward','inward','outward','inward','outward','inward','outward']
})
inward = df['type'] == 0
outward = df['type'] == 1
df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
df.type = df.type.map({0:'inward', 1:'outward'})
df.groupby(['p_id', 'type']).resample('D')['qty'].sum().unstack(1, fill_value=0)
df1 = df.groupby(['p_id', 'type']).resample('D')['qty'].sum().unstack(1, fill_value=0).reset_index()
df1.sort_values(['date', 'p_id'])
df1['opening'] = df1['closing'] = 0
for i in range(1, len(df1)):
df1.loc[i, 'opening'] = (df1.loc[i-1, 'closing'])
df1.loc[i, 'closing'] = (df1.loc[i, 'inward'] + df1.loc[i, 'opening']) - df1.loc[i, 'outward']
我试图获得以下结果,但失败了。
Date open inward outward close p_id
2014-08-04 0 3 3 0 a
2014-08-04 0 10 5 5 b
2014-08-04 0 40 5 35 c
2014-08-05 5 0 4 1 b
2014-08-05 35 10 0 45 c
2014-08-06 1 0 0 1 b
2014-08-06 45 0 0 45 c
答案 0 :(得分:1)
问题不是很明确,但我认为以下代码应该让您走上正轨。一切都以某种方式评论,应该清楚发生了什么。
import pandas as pd
df = pd.DataFrame({
'date': ['2014-08-04 21:04:00','2014-08-04 22:04:00','2014-08-04 21:04:00','2014-08-04 10:04:00','2014-10-04 21:04:00','2014-11-04 21:04:00','2014-10-05 21:04:00','2014-09-05 21:04:00'],
'p_id' :['a','a','b','b','c','c','c','b'],
'qty' :[3,3,10,5,40,5,10,4],
'type' :['inward','outward','inward','outward','inward','outward','inward','outward']
})
# change datetime strings to datetime objects
df.date = pd.to_datetime(df.date)
# change the datetime to date
df.date = df.date.apply(lambda x:x.date())
# Use pivot_table in order to perform such operations
df = pd.pivot_table(data=df,columns="type", values="qty", index=["p_id","date"])
# replace nans with zeros
df = df.fillna(0)
# move multiindex back to the columns and start a new, default index
df = df.reset_index()
# add the opening and closing calculation (not efficient, but not the problematic part after all)
df["opening"]=0
df["closing"]=0
for i in range(1, len(df)):
df.loc[i, 'opening'] = (df.loc[i-1, 'closing'])
df.loc[i, 'closing'] = (df.loc[i, 'inward'] + df.loc[i, 'opening']) - df.loc[i, 'outward']
# change the order of columns and index to the desired output outlay
df = df[["date","inward","outward","opening","closing","p_id"]]
df = df.set_index("date")
print df
这应该首先产生你想要的东西:
type inward outward opening closing p_id
date
2014-08-04 3.0 3.0 0.0 0.0 a
2014-08-04 10.0 5.0 0.0 5.0 b
2014-09-05 0.0 4.0 5.0 1.0 b
2014-10-04 40.0 0.0 1.0 41.0 c
2014-10-05 10.0 0.0 41.0 51.0 c
2014-11-04 0.0 5.0 51.0 46.0 c
答案 1 :(得分:0)
我不确定你的结果表(打开和关闭的定义)
import datetime as dt
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA=StringIO("""date;qty;p_id;type
2014-08-04 21:04:00;3;a;inward
2014-08-04 22:04:00;3;a;outward
2014-08-04 21:04:00;10;b;inward
2014-08-04 10:04:00;5;b;outward
2014-10-04 21:04:00;40;c;inward
2014-11-04 21:04:00;5;c;outward
2014-10-05 21:04:00;10;c;inward
2014-09-05 21:04:00;4;b;outward""")
df = pd.read_csv(TESTDATA, sep=";")
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: dt.datetime.strftime(x, '%Y-%m-%d'))
df = pd.pivot_table(df, columns=['type'], values = ['qty'], index=['date', 'p_id'])
df.reset_index( inplace=True, drop=False)
df.columns = ['date', 'p_id', 'inward', 'outward']
df.fillna(0, inplace=True)
df
给出:
date p_id inward outward
0 2014-08-04 a 3.0 3.0
1 2014-08-04 b 10.0 5.0
2 2014-09-05 b 0.0 4.0
3 2014-10-04 c 40.0 0.0
4 2014-10-05 c 10.0 0.0
5 2014-11-04 c 0.0 5.0