我有DataFrame
这样:
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)
df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
m1 m2 m3 m4
bar one 2016-01 -0.0 1.0 3.0 2.0
two 2016-02 1.0 1.0 1.0 2.0
baz one 2016-01 -1.0 -1.0 2.0 1.0
two 2016-02 1.0 2.0 1.0 2.0
foo one 2016-01 1.0 -0.0 -0.0 -0.0
two 2016-02 -2.0 -0.0 -0.0 -0.0
qux one 2016-01 -0.0 -0.0 -1.0 1.0
two 2016-02 -0.0 -0.0 1.0 -0.0
假设我想在m2和m4的列名中替换2016年的所有2016年,以便2016行的m1和m3值不是m2和m4的值。因此2017行将具有m2和m4的值,但不具有m1和m3的值。与此DataFrame
类似的东西:
m1 m2 m3 m4
bar one 2016-01 -0.0 0.0 3.0 0.0
two 2016-02 1.0 0.0 1.0 0.0
one 2017-01 0.0 1.0 0.0 2.0
two 2017-02 0.0 1.0 0.0 2.0
baz one 2016-01 -1.0 0.0 2.0 0.0
two 2016-02 1.0 0.0 1.0 0.0
one 2017-01 0.0 -1.0 0.0 1.0
two 2017-02 0.0 2.0 0.0 2.0
我已经尝试unstack()
数据框并重命名每个列,但这似乎不起作用,我不确定原因。
df = df.unstack()
df.unstack()['m2'] = df.unstack()['m2'].rename(columns = lambda t: t.replace('2016','2017'))
答案 0 :(得分:1)
import numpy as np
import pandas as pd
np.random.seed(2017)
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)
df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
df2 = df[['m2', 'm4']]
df2.index = pd.MultiIndex.from_arrays(
[df.index.get_level_values(i) for i in [0,1]]
+ [df.index.get_level_values(-1).str.replace('2016','2017')])
result = pd.concat([df[['m1','m3']], df2], axis=0).fillna(0)
result = result.sort_index(level=[0,2,1])
print(result)
转换
m1 m2 m3 m4
bar one 2016-01 -1.0 -0.0 1.0 1.0
two 2016-02 -0.0 -0.0 -0.0 -0.0
baz one 2016-01 1.0 -0.0 -1.0 -0.0
two 2016-02 -1.0 1.0 1.0 -0.0
foo one 2016-01 -0.0 -0.0 -1.0 -1.0
two 2016-02 2.0 -0.0 -0.0 -0.0
qux one 2016-01 1.0 2.0 -0.0 2.0
two 2016-02 1.0 1.0 -0.0 -0.0
到
m1 m2 m3 m4
bar one 2016-01 -1.0 0.0 1.0 0.0
two 2016-02 -0.0 0.0 -0.0 0.0
one 2017-01 0.0 -0.0 0.0 1.0
two 2017-02 0.0 -0.0 0.0 -0.0
baz one 2016-01 1.0 0.0 -1.0 0.0
two 2016-02 -1.0 0.0 1.0 0.0
one 2017-01 0.0 -0.0 0.0 -0.0
two 2017-02 0.0 1.0 0.0 -0.0
foo one 2016-01 -0.0 0.0 -1.0 0.0
two 2016-02 2.0 0.0 -0.0 0.0
one 2017-01 0.0 -0.0 0.0 -1.0
two 2017-02 0.0 -0.0 0.0 -0.0
qux one 2016-01 1.0 0.0 -0.0 0.0
two 2016-02 1.0 0.0 -0.0 0.0
one 2017-01 0.0 2.0 0.0 2.0
two 2017-02 0.0 1.0 0.0 -0.0
答案 1 :(得分:0)
我不确定我是否理解你的问题,这就是我所做的和输出。
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)
df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
df = df.reset_index()
df['level_2'] = df['level_2'].str.replace("2016","2017")
这给了我输出:
level_0 level_1 level_2 m1 m2 m3 m4
0 bar one 2017-01 -0.0 -1.0 -0.0 -0.0
1 bar two 2017-02 -0.0 -1.0 2.0 2.0
2 baz one 2017-01 -2.0 1.0 -0.0 1.0
3 baz two 2017-02 -0.0 1.0 -1.0 2.0
4 foo one 2017-01 1.0 -0.0 -1.0 -0.0
5 foo two 2017-02 -1.0 -2.0 1.0 -0.0
6 qux one 2017-01 1.0 1.0 -0.0 1.0
7 qux two 2017-02 1.0 -1.0 2.0 -1.0
如果你能根据这点告诉我你的期望,我会修改我的答案。