如何在熊猫系列中强制分配索引

时间:2017-05-12 00:33:32

标签: python pandas

我在熊猫中有一系列

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['A','B','C'])
>>> df
   A  B  C
0  2  4  4
1  4  3  3
2  5  9  1 

将此df的堆叠输出放入变量后

sta=df.stack()

这基本上会导致sta成为一系列堆叠的原件;现在该系列没有索引。

期望

 head1 head2 
0  A    2
   B    4
   C    4
1  A    4
   B    3
   C    3
2  A    5
   B    9
   C    1

1.如何在结果系列中强制命名标题?

  1. 如何在df中选择哪些列应该是我的索引?我的旧索引会转移到堆叠变量吗?
  2. 3.sta是一个系列,是否有办法将其强制转换为数据框?

    1. 我可以选择从df中选择要堆叠的参数吗?例如:只堆栈B,C列并保留A完好无损?
    2. 谢谢

2 个答案:

答案 0 :(得分:0)

这就是你追求的吗?您可以随意命名列。

from pandas import DataFrame, Series 
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import re

# Step 1
# Cleanup values within NOC and Rank. Start off with changing the values within Total
# Replace the value of Total which is Null to NaN

df.loc[:, 'Total2'] = df['Total'].isnull()

# Step 2
# Filter Total equal to Nan and shift the row values from Rank to Total - Rank to Bronze

df.ix[df.Total2 == True, 'Total'] = df['Bronze']
df.ix[df.Total2 == True, 'Bronze'] = df['Silver']
df.ix[df.Total2 == True, 'Silver'] = df['Gold']
df.ix[df.Total2 == True, 'Gold'] = df['NOC']
df.ix[df.Total2 == True, 'NOC'] = df['Rank']

# Step 3
# Clean up the Rank column. Create a new column which reveal only digit value

df['Rank2'] = pd.to_numeric(df['Rank'], errors='coerce')
df['fill_forward'] = df['Rank2'].fillna(method='ffill')
del df['Rank']
del df['Rank2']
del df['Total2']
df = df.rename(columns={'fill_forward': 'Rank'})

答案 1 :(得分:0)

1.使用rename_axis或为新索引名称分配index names

sta = sta.rename_axis(['head1','head2'])
print (sta)
head1  head2
0      A        2
       B        4
       C        4
1      A        4
       B        3
       C        3
2      A        5
       B        9
       C        1
sta.index.names = ['head1','head2']
print (sta)
head1  head2
0      A        2
       B        4
       C        4
1      A        4
       B        3
       C        3
2      A        5
       B        9
       C        1
dtype: int32

2.使用原始DataFrame的列和索引创建新索引 - 创建MultiIndex

3.对于DataFrame,使用to_frame新名称列:

df = sta.to_frame('Stacked')
print (df)
     Stacked
0 A        2
  B        4
  C        4
1 A        4
  B        3
  C        3
2 A        5
  B        9
  C        1

4.如果需要在堆栈前只选择一些列:

sta1 = df[['B','C']].stack()
print (sta1)
0  B    4
   C    4
1  B    3
   C    3
2  B    9
   C    1
dtype: int32

如果需要新的堆叠DataFrame,可以使用reset_index创建来自MultiIndex的列:

df = sta.rename_axis(['head1','head2']).reset_index(name='stacked')
print (df)
   head1 head2  stacked
0      0     A        2
1      0     B        4
2      0     C        4
3      1     A        4
4      1     B        3
5      1     C        3
6      2     A        5
7      2     B        9
8      2     C        1

或者指定新的列名称:

df = sta.reset_index()
df.columns = ['head1','head2', 'stacked']
print (df)
   head1 head2  stacked
0      0     A        2
1      0     B        4
2      0     C        4
3      1     A        4
4      1     B        3
5      1     C        3
6      2     A        5
7      2     B        9
8      2     C        1