我正在使用Pandas在Python中工作,并且遇到以下问题。我有一个数据框,其中每个日期都有大量行描述加密货币数据。到达最后一个日期后,将在同一列中为另一种加密货币启动一个新的时间序列。我正在寻找一种操纵数据帧的方法,以便对于每个token_date,所有加密货币数据都显示在一行中,以便行的总数等于token_date的总数
当前df如下:
token_id token_caption token_date token_price_usd token_marketcap_usd
64 WAN Wanchain 2019-06-24 0.3817 40414601.0
64 WAN Wanchain 2019-07-01 0.3644 38683920.0
64 WAN Wanchain 2019-07-08 0.3557 37759781.0
64 WAN Wanchain 2019-07-15 0.2625 27824362.0
64 WAN Wanchain 2019-07-22 0.2545 27036722.0
...
57 MAID 2017-07-24 0.3775 170824959.0
57 MAID 2017-07-31 0.2917 132012254.0
57 MAID 2017-08-07 0.3589 162410652.0
57 MAID 2017-08-14 0.3763 170283706.0
57 MAID 2017-08-21 0.4615 208873303.0
...
我正在寻找实现以下目的的代码:(列拆分将执行约100次,最终约有201列)
token_date WAN Wanchain - Price WAN Wanchain - Marketcap ... MAID - Price MAID - Marketcap...
2019-06-24 0.3817 40414601.0 xxx xxx
2019-07-01 0.3644 38683920.0 xxx xxx
2019-07-08 0.3557 37759781.0 xxx xxx
...
感谢您的帮助。我是关于Python的初学者,对如何实现此目标一无所知。
谢谢!
答案 0 :(得分:2)
如果您将索引设置为['token_date', 'token_caption']
并解开标题以使其成为一列,那么您将得到一个非常干净的MultiIndex
列,其中包含您要查找的内容:
In [144]: df
Out[144]:
token_id token_caption token_date token_price_usd token_marketcap_usd
0 64 WAN Wanchain 2019-06-24 0.3817 40414601.0
1 64 WAN Wanchain 2019-07-01 0.3644 38683920.0
2 64 WAN Wanchain 2019-07-08 0.3557 37759781.0
3 64 WAN Wanchain 2019-07-15 0.2625 27824362.0
4 64 WAN Wanchain 2019-07-22 0.2545 27036722.0
5 57 MAID 2019-06-24 0.3775 170824959.0
6 57 MAID 2019-07-01 0.2917 132012254.0
7 57 MAID 2019-07-08 0.3589 162410652.0
8 57 MAID 2019-07-15 0.3763 170283706.0
9 57 MAID 2019-07-22 0.4615 208873303.0
In [145]: df.set_index(["token_date", "token_caption"])[["token_price_usd", "token_marketcap_usd"]].unstack().swaplevel(axis=1)
Out[145]:
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_price_usd token_price_usd token_marketcap_usd token_marketcap_usd
token_date
2019-06-24 0.3775 0.3817 170824959.0 40414601.0
2019-07-01 0.2917 0.3644 132012254.0 38683920.0
2019-07-08 0.3589 0.3557 162410652.0 37759781.0
2019-07-15 0.3763 0.2625 170283706.0 27824362.0
2019-07-22 0.4615 0.2545 208873303.0 27036722.0
答案 1 :(得分:1)
为什么不使用pivot
:
提供数据
token_id token_caption token_date token_price_usd token_marketcap_usd
64 WAN_Wanchain 2019-06-24 0.3817 40414601.0
64 WAN_Wanchain 2019-07-01 0.3644 38683920.0
64 WAN_Wanchain 2019-07-08 0.3557 37759781.0
64 WAN_Wanchain 2019-07-15 0.2625 27824362.0
64 WAN_Wanchain 2019-07-22 0.2545 27036722.0
57 MAID 2019-06-24 0.3775 170824959.0
57 MAID 2019-07-01 0.2917 132012254.0
57 MAID 2019-07-08 0.3589 162410652.0
57 MAID 2019-07-15 0.3763 170283706.0
57 MAID 2019-07-22 0.4615 208873303.0
请注意,我重复了日期,所以要匹配一些内容
df.pivot("token_date", "token_caption", ["token_price_usd", "token_marketcap_usd"])
给予
token_price_usd token_marketcap_usd
token_caption MAID WAN_Wanchain MAID WAN_Wanchain
token_date
2019-06-24 0.3775 0.3817 170824959.0 40414601.0
2019-07-01 0.2917 0.3644 132012254.0 38683920.0
2019-07-08 0.3589 0.3557 162410652.0 37759781.0
2019-07-15 0.3763 0.2625 170283706.0 27824362.0
2019-07-22 0.4615 0.2545 208873303.0 27036722.0
答案 2 :(得分:1)
我使用ivot_table并构造新的列名称:
df=df.pivot_table(index="token_date",columns="token_caption",values=["token_price_usd","token_marketcap_usd"])
token_marketcap_usd token_price_usd
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_date
2017-07-24 170824959.0 NaN 0.3775 NaN
2017-07-31 132012254.0 NaN 0.2917 NaN
2017-08-07 162410652.0 NaN 0.3589 NaN
2017-08-14 170283706.0 NaN 0.3763 NaN
2017-08-21 208873303.0 NaN 0.4615 NaN
2019-06-24 NaN 40414601.0 NaN 0.3817
2019-07-01 NaN 38683920.0 NaN 0.3644
2019-07-08 NaN 37759781.0 NaN 0.3557
2019-07-15 NaN 27824362.0 NaN 0.2625
2019-07-22 NaN 27036722.0 NaN 0.2545
df.columns=[ lev2+" - "+lev1.split("_")[1].title() for lev1,lev2 in df.columns]
df.reindex(sorted(df.columns.values,reverse=True) ,axis=1)
WAN Wanchain - Price WAN Wanchain - Marketcap MAID - Price MAID - Marketcap
token_date
2017-07-24 NaN NaN 0.3775 170824959.0
2017-07-31 NaN NaN 0.2917 132012254.0
2017-08-07 NaN NaN 0.3589 162410652.0
2017-08-14 NaN NaN 0.3763 170283706.0
2017-08-21 NaN NaN 0.4615 208873303.0
2019-06-24 0.3817 40414601.0 NaN NaN
2019-07-01 0.3644 38683920.0 NaN NaN
2019-07-08 0.3557 37759781.0 NaN NaN
2019-07-15 0.2625 27824362.0 NaN NaN
2019-07-22 0.2545 27036722.0 NaN NaN
最后,您可以应用“ reset_index”。