Question

我在熊猫中有一系列

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['A','B','C'])
>>> df
   A  B  C
0  2  4  4
1  4  3  3
2  5  9  1

将此df的堆叠输出放入变量后

sta=df.stack()

这基本上会导致sta成为一系列堆叠的原件;现在该系列没有索引。

期望

 head1 head2 
0  A    2
   B    4
   C    4
1  A    4
   B    3
   C    3
2  A    5
   B    9
   C    1

1.如何在结果系列中强制命名标题？

如何在df中选择哪些列应该是我的索引？我的旧索引会转移到堆叠变量吗？

3.sta是一个系列，是否有办法将其强制转换为数据框？

我可以选择从df中选择要堆叠的参数吗？例如：只堆栈B，C列并保留A完好无损？

谢谢

Answer 1

这就是你追求的吗？您可以随意命名列。

from pandas import DataFrame, Series 
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import re

# Step 1
# Cleanup values within NOC and Rank. Start off with changing the values within Total
# Replace the value of Total which is Null to NaN

df.loc[:, 'Total2'] = df['Total'].isnull()

# Step 2
# Filter Total equal to Nan and shift the row values from Rank to Total - Rank to Bronze

df.ix[df.Total2 == True, 'Total'] = df['Bronze']
df.ix[df.Total2 == True, 'Bronze'] = df['Silver']
df.ix[df.Total2 == True, 'Silver'] = df['Gold']
df.ix[df.Total2 == True, 'Gold'] = df['NOC']
df.ix[df.Total2 == True, 'NOC'] = df['Rank']

# Step 3
# Clean up the Rank column. Create a new column which reveal only digit value

df['Rank2'] = pd.to_numeric(df['Rank'], errors='coerce')
df['fill_forward'] = df['Rank2'].fillna(method='ffill')
del df['Rank']
del df['Rank2']
del df['Total2']
df = df.rename(columns={'fill_forward': 'Rank'})

Answer 2

1.使用rename_axis或为新索引名称分配index names：

sta = sta.rename_axis(['head1','head2'])
print (sta)
head1  head2
0      A        2
       B        4
       C        4
1      A        4
       B        3
       C        3
2      A        5
       B        9
       C        1

sta.index.names = ['head1','head2']
print (sta)
head1  head2
0      A        2
       B        4
       C        4
1      A        4
       B        3
       C        3
2      A        5
       B        9
       C        1
dtype: int32

2.使用原始DataFrame的列和索引创建新索引 - 创建MultiIndex。

3.对于DataFrame，使用to_frame新名称列：

df = sta.to_frame('Stacked')
print (df)
     Stacked
0 A        2
  B        4
  C        4
1 A        4
  B        3
  C        3
2 A        5
  B        9
  C        1

4.如果需要在堆栈前只选择一些列：

sta1 = df[['B','C']].stack()
print (sta1)
0  B    4
   C    4
1  B    3
   C    3
2  B    9
   C    1
dtype: int32

如果需要新的堆叠DataFrame，可以使用reset_index创建来自MultiIndex的列：

df = sta.rename_axis(['head1','head2']).reset_index(name='stacked')
print (df)
   head1 head2  stacked
0      0     A        2
1      0     B        4
2      0     C        4
3      1     A        4
4      1     B        3
5      1     C        3
6      2     A        5
7      2     B        9
8      2     C        1

或者指定新的列名称：

df = sta.reset_index()
df.columns = ['head1','head2', 'stacked']
print (df)
   head1 head2  stacked
0      0     A        2
1      0     B        4
2      0     C        4
3      1     A        4
4      1     B        3
5      1     C        3
6      2     A        5
7      2     B        9
8      2     C        1

如何在熊猫系列中强制分配索引

2 个答案: