我尝试使用pd.concat([a, b], axis=1)
加入两个系列,但结果是一个填充了NaN
的数据框,这就是我的意思:
Series
by_status = odr.set_index('order_status')
g = by_status.groupby(['dt', 'product_id'])
payed_orders = g.size()
payed_orders.name = 'payed_orders'
refund_g = by_status.loc[[1,2,3], :].groupby(['dt', 'product_id'])
refund_orders = refund_g.size()
refund_orders.name = 'refund_orders'
# I'm going to concat refund_orders and payed_orders
>>>payed_orders.head()
dt product_id
2015-01-15 10001 1
10007 1
10016 14
10022 1
10023 1
Name: payed_orders, dtype: int64
>>>refund_orders.head()
dt product_id
2015-01-15 10007 1
10016 4
10030 1
2015-01-16 10007 3
10008 1
Name: refund_orders, dtype: int64
>>>pd.concat([payed_orders.head(), refund_orders.head()], axis=1, ignore_index=False)
payed_orders refund_orders
dt product_id
2015-01-15 10001 NaN NaN
10007 NaN NaN
10016 NaN NaN
10022 NaN NaN
10023 NaN NaN
10030 NaN NaN
2015-01-16 10007 NaN NaN
10008 NaN NaN
我不认为我必须犯一些明显的错误,但我真的无法解决,请帮助。
P.S。代码是从ipython笔记本复制的,不要对格式感到奇怪。
尝试通过ignore_index=True
,这里发生了什么:
>>>pd.concat([payed_orders.tail(), refund_orders.tail()], axis=1, ignore_index=True)
0 1
dt product_id
2015-09-07 1000081 NaN NaN
1000084 NaN NaN
1000094 NaN NaN
1000096 NaN NaN
1000124 NaN NaN
1000131 NaN NaN
1000132 NaN NaN
1000133 NaN NaN
1000134 NaN NaN
1000137 NaN NaN
所以这里的两个系列不能很好地结合在一起:
>>>a4.head().to_dict()
{'actual_suborders': {(datetime.date(2015, 1, 15), 10001): 1,
(datetime.date(2015, 1, 15), 10016): 10,
(datetime.date(2015, 1, 15), 10022): 1,
(datetime.date(2015, 1, 15), 10023): 1,
(datetime.date(2015, 1, 15), 10024): 1}}
>>>a5.head().to_dict()
{'refund_suborders': {(datetime.date(2015, 1, 15), 10007): 1,
(datetime.date(2015, 1, 15), 10016): 4,
(datetime.date(2015, 1, 15), 10030): 1,
(datetime.date(2015, 1, 16), 10007): 4,
(datetime.date(2015, 1, 16), 10008): 1}}
>>>pd.concat([a4.head(), a5.head()], axis=1)
actual_suborders refund_suborders
dt product_id
2015-01-15 10001 NaN NaN
10007 NaN NaN
10016 NaN NaN
10022 NaN NaN
10023 NaN NaN
10024 NaN NaN
10030 NaN NaN
2015-01-16 10007 NaN NaN
10008 NaN NaN
感谢所有决定看看这个伟大社区的人。
我已将上述系列的序列序列化,上传到evernote,包含加载和连接的代码
https://www.evernote.com/l/AH4AdfgOJJROuZSfGfDR_jZvA0zEpIHgyq0
答案 0 :(得分:2)
为了实现这一点,我必须从每个系列的旧索引的串联中创建唯一值。然后,当连接时,我将此作为参数传递给join_axes
:
import datetime
import pandas as pd
s1 = pd.Series([1, 10, 1, 1, 1],
name='actual_suborders',
index=[(dt.date(2015, 1, 15), 10001),
(dt.date(2015, 1, 15), 10016),
(dt.date(2015, 1, 15), 10022),
(dt.date(2015, 1, 15), 10023),
(dt.date(2015, 1, 15), 10024)])
s2 = pd.Series([1, 4, 1, 4, 1],
name='refund_suborders',
index=[(dt.date(2015, 1, 15), 10007),
(dt.date(2015, 1, 15), 10016),
(dt.date(2015, 1, 15), 10030),
(dt.date(2015, 1, 16), 10007),
(dt.date(2015, 1, 16), 10008)])
idx = set(pd.concat([s1.reset_index()['index'],
s2.reset_index()['index']],
ignore_index=True))
>>> pd.concat([s1, s2], axis=1, join_axes=[idx])
actual_suborders refund_suborders
(2015-01-15, 10022) 1 NaN
(2015-01-15, 10001) 1 NaN
(2015-01-15, 10023) 1 NaN
(2015-01-16, 10008) NaN 1
(2015-01-15, 10030) NaN 1
(2015-01-15, 10016) 10 4
(2015-01-15, 10007) NaN 1
(2015-01-16, 10007) NaN 4
(2015-01-15, 10024) 1 NaN
此外,您的索引似乎已在某处更改。您的by_status.groupby(['dt', 'product_id'])
操作应该会生成一个MultiIndex,但上面粘贴的a4.head()
和a5.head()
的结果表明它已更改为沿线某处的元组对。我怀疑这可能是最终的问题。
修改强>
我不明白为什么concat
无效,但我设法使用merge
实现了您的目标。
首先,重置索引。然后合并dt
和product_id
上的DataFrame:
a4.reset_index(inplace=True)
a5.reset_index(inplace=True)
>>> a4.merge(a5, on=['dt', 'product_id'], how='outer')
dt product_id actual_suborders refund_suborders
0 2015-01-15 10001 1 NaN
1 2015-01-15 10016 10 4
2 2015-01-15 10022 1 NaN
3 2015-01-15 10023 1 NaN
4 2015-01-15 10024 1 NaN
5 2015-01-15 10007 NaN 1
6 2015-01-15 10030 NaN 1
7 2015-01-16 10007 NaN 4
8 2015-01-16 10008 NaN 1