我在熊猫中有一系列
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['A','B','C'])
>>> df
A B C
0 2 4 4
1 4 3 3
2 5 9 1
将此df
的堆叠输出放入变量后
sta=df.stack()
这基本上会导致sta成为一系列堆叠的原件;现在该系列没有索引。
期望
head1 head2
0 A 2
B 4
C 4
1 A 4
B 3
C 3
2 A 5
B 9
C 1
1.如何在结果系列中强制命名标题?
3.sta是一个系列,是否有办法将其强制转换为数据框?
谢谢
答案 0 :(得分:0)
这就是你追求的吗?您可以随意命名列。
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import re
# Step 1
# Cleanup values within NOC and Rank. Start off with changing the values within Total
# Replace the value of Total which is Null to NaN
df.loc[:, 'Total2'] = df['Total'].isnull()
# Step 2
# Filter Total equal to Nan and shift the row values from Rank to Total - Rank to Bronze
df.ix[df.Total2 == True, 'Total'] = df['Bronze']
df.ix[df.Total2 == True, 'Bronze'] = df['Silver']
df.ix[df.Total2 == True, 'Silver'] = df['Gold']
df.ix[df.Total2 == True, 'Gold'] = df['NOC']
df.ix[df.Total2 == True, 'NOC'] = df['Rank']
# Step 3
# Clean up the Rank column. Create a new column which reveal only digit value
df['Rank2'] = pd.to_numeric(df['Rank'], errors='coerce')
df['fill_forward'] = df['Rank2'].fillna(method='ffill')
del df['Rank']
del df['Rank2']
del df['Total2']
df = df.rename(columns={'fill_forward': 'Rank'})
答案 1 :(得分:0)
1.使用rename_axis
或为新索引名称分配index names
:
sta = sta.rename_axis(['head1','head2'])
print (sta)
head1 head2
0 A 2
B 4
C 4
1 A 4
B 3
C 3
2 A 5
B 9
C 1
sta.index.names = ['head1','head2']
print (sta)
head1 head2
0 A 2
B 4
C 4
1 A 4
B 3
C 3
2 A 5
B 9
C 1
dtype: int32
2.使用原始DataFrame
的列和索引创建新索引 - 创建MultiIndex
。
3.对于DataFrame,使用to_frame
新名称列:
df = sta.to_frame('Stacked')
print (df)
Stacked
0 A 2
B 4
C 4
1 A 4
B 3
C 3
2 A 5
B 9
C 1
4.如果需要在堆栈前只选择一些列:
sta1 = df[['B','C']].stack()
print (sta1)
0 B 4
C 4
1 B 3
C 3
2 B 9
C 1
dtype: int32
如果需要新的堆叠DataFrame,可以使用reset_index
创建来自MultiIndex
的列:
df = sta.rename_axis(['head1','head2']).reset_index(name='stacked')
print (df)
head1 head2 stacked
0 0 A 2
1 0 B 4
2 0 C 4
3 1 A 4
4 1 B 3
5 1 C 3
6 2 A 5
7 2 B 9
8 2 C 1
或者指定新的列名称:
df = sta.reset_index()
df.columns = ['head1','head2', 'stacked']
print (df)
head1 head2 stacked
0 0 A 2
1 0 B 4
2 0 C 4
3 1 A 4
4 1 B 3
5 1 C 3
6 2 A 5
7 2 B 9
8 2 C 1