Python pandas pivot multiindex

时间:2015-05-12 22:53:49

标签: python pandas pivot multi-index

Input

I have the following file public async void Run(IBackgroundTaskInstance taskInstance) { _accelerometer = Accelerometer.GetDefault(); if (null != _accelerometer) { if(moveIsActive) { uint minReportIntervalMsecs = _accelerometer.MinimumReportInterval; _accelerometer.ReportInterval = minReportIntervalMsecs > 500 ? minReportIntervalMsecs : 500 //accBckgrndSaved = "Report interval of accelerometer ms:" + _accelerometer.ReportInterval + Environment.NewLine + accBckgrndSaved; // Subscribe to accelerometer ReadingChanged events. _accelerometer.ReadingChanged += new TypedEventHandler<Accelerometer, AccelerometerReadingChangedEventArgs>(ReadingChangedAsync); } // Take a deferral that is released when the task is completed. _deferral = taskInstance.GetDeferral(); // Get notified when the task is canceled. taskInstance.Canceled += new BackgroundTaskCanceledEventHandler(OnCanceled); // Store a setting so that the app knows that the task is running. ApplicationData.Current.LocalSettings.Values[Constants.BCKSET_IS_BACKTASKACTIVE] = true; } } :

input.txt

Desired output

How can I create a new data frame, that uses columns D and E as an index? I want a triangular matrix that looks something like this:

D E F G H
a 1 b 1 4
a 1 c 1 5
b 2 c 2 6

1st attempt

I am importing the data frame and I am trying to do a pivot like this:

   a1 b1 c1 b2 c2
a1  0  4  5  0  0
b1     0  0  0  0
c1        0  0  0
b2           0  6
c2              0

import pandas as pd df1 = pd.read_csv( 'input.txt', index_col=[0,1], delim_whitespace=True, usecols=['D','E','F','G','H']) df2 = df1.pivot(index=['D', 'E'], columns=['F','G'], values='H') looks like this:

df1

F G H D E a 1 b 1 4 1 c 1 5 b 2 c 2 6 looks like this:

df1.index

MultiIndex(levels=[['a', 'b'], [1, 2]], labels=[[0, 0, 1], [0, 0, 1]], names=['D', 'E']) fails to be generated and I get this error message:

df2

2nd attempt

I thought I had solved it like this:

`KeyError: "['D' 'E'] not in index"`

import pandas as pd df = pd.read_csv( 'input.txt', delim_whitespace=True, usecols=['D','E','F','G','H'], dtype={'D':str, 'E':str, 'F':str, 'G':str, 'H':float}, ) pivot = pd.pivot_table(df, values='H', index=['D','E'], columns=['F','G']) looks like this:

pivot

But when I try to do this to convert it to a symmetric matrix:

F     b   c    
G     1   1   2
D E            
a 1   4   5 NaN
b 2 NaN NaN   6

Then I get this error:

pivot.add(df.T, fill_value=0).fillna(0)

3rd attempt and solution

I found a solution here. It is also what @Moritz suggested, but I'm new to pandas and didn't understand his comment. I did this:

ValueError: cannot join with no level specified and no overlapping names

This data frame is generated:

import pandas as pd
df1 = pd.read_csv(
    'input.txt', index_col=[0,1], delim_whitespace=True,
    usecols=['D','E','F','G','H'],
    dtype={'D':str, 'E':str, 'F':str, 'G':str, 'H':float}
    )
df1['DE'] = df1['D']+df1['E']
df1['FG'] = df1['F']+df1['G']
df2 = df1.pivot(index='DE', columns='FG', values='H')

Afterwards I do FG b1 c1 c2 DE a1 4 5 NaN b2 NaN NaN 6 to convert the triangular matrix to a symmetric matrix. Is generating new columns really the easiest way to accomplish what I want? My reason for doing all of this is that I want to generate a heat map with matplotlib and hence need the data to be in matrix form. The final matrix/dataframe looks like this:

df3 = df2.add(df2.T, fill_value=0).fillna(0)

0 个答案:

没有答案