这是一个创建两级数据框的简单代码。
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.random.randn(6,4), columns=dates)
df1 = pd.DataFrame({'E' : ["foo1","foo2","foo3","foo4","foo5","foo6"]})
df2=pd.concat(dict(L0 = df, L1 = df1),axis=1)
df2输出:
L0 \
2013-01-01 00:00:00 2013-01-02 00:00:00 2013-01-03 00:00:00
0 0.530496 -1.536075 -0.592824
1 0.614626 0.146761 1.799287
2 -0.398504 -0.863021 -0.208724
3 0.901720 0.717144 1.504012
4 -0.570248 -0.967722 -0.478540
5 2.225644 2.452121 -0.131774
L1
2013-01-04 00:00:00 E
0 1.293738 foo1
1 1.469431 foo2
2 -2.084461 foo3
3 -0.199157 foo4
4 -1.627641 foo5
5 -1.970185 foo6
我有这三个问题。请帮助:
1)如何重新排列列,使日期按降序排列? 2)如何仅在列标题中显示日期(而不是时间戳)? 3)如果你将df2写入csv,它会创建一个空行。我读了一些QA,它表示多级输出的错误。这是固定的吗?如果没有,删除它的最佳方法是什么?
答案 0 :(得分:1)
假设您可以在构建df2
期间解决问题,那么
问题可以通过对df
的列进行排序然后转动列来解决
标签到字符串:
df = df.sort_index(ascending=False, axis=1)
df.columns = df.columns.format()
使用当前版本的pandas,0.21.0(dev),
df2.to_csv('/tmp/test.csv')
创建一个没有空行的CSV。如果您使用最新的稳定版本0.20.3进行尝试,我认为您会得到相同的结果(见下文)。
例如,
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.random.randn(6,4), columns=dates)
df1 = pd.DataFrame({'E' : ["foo1","foo2","foo3","foo4","foo5","foo6"]})
df = df.sort_index(ascending=False, axis=1)
df.columns = df.columns.format()
df2 = pd.concat(dict(L0=df, L1=df1),axis=1)
df2.to_csv('/tmp/test.csv')
使用内容
创建/tmp/test.csv
,L0,L0,L0,L0,L1
,2013-01-04,2013-01-03,2013-01-02,2013-01-01,E
0,0.02140012949846106,0.26277798576234707,0.3417048534674754,-0.2415864990096712,foo1
1,1.5529608360704856,0.04473119120484416,0.2563552549068564,-0.7234609815350183,foo2
2,0.3197702495146119,-0.4796536804964018,-1.0049744963838612,0.039249748655535384,foo3
3,-1.5129389373140296,-0.2528463527601262,-0.22930219559242235,-0.6661663277403631,foo4
4,0.03756426242171489,0.20880577998533037,1.0229358239647364,0.6539470866256701,foo5
5,-1.8477638391042324,-0.8315712350681457,-0.0743680147471108,0.8503850287138673,foo6
顺便说一下,您可能还想考虑这种格式,这似乎更紧凑:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.random.randn(6,4), columns=dates)
df1 = pd.DataFrame({'E' : ["foo1","foo2","foo3","foo4","foo5","foo6"]})
df = df.T
df.columns = df1['E']
print(df)
产量
E foo1 foo2 foo3 foo4 foo5 foo6
2013-01-01 0.166074 0.398726 -0.410202 0.397486 -0.811873 0.462652
2013-01-02 0.406810 -0.313234 0.062569 -0.140924 -1.087162 1.600549
2013-01-03 -0.573118 1.331461 -0.115200 -1.934654 -1.427441 -0.889541
2013-01-04 -0.919885 -1.197192 -0.476039 1.186531 1.013803 0.400977