从多索引数据框架pandas获取列值

时间:2016-10-19 14:41:57

标签: python pandas

我有一个多索引数据框如下所示:

    1                2

panning  sec        panning     sec

 None    5.0        None        0.0
 None    6.0        None        1.0
Panning  7.0        None        2.0 
 None    8.0        Panning     3.0
 None    9.0        None        4.0
 Panning  10.0      None        5.0

我正在迭代行并在平移列中有值'平移'的地方获取索引

 ide=[]
 for index,row in dfs.iterrows():
        if [row[:, 'Panning'][row[:, 'Panning'] == 'Panning']]:
               ide.append(row[:, 'Panning'][row[:, 'Panning'] == 'Panning'].index.tolist())

print ide

如果我执行上面的代码,我得到输出

[[],[],[1],[2],[],[1]]

表示值正在平移的索引

现在,我也希望得到相应的秒值,比如第3行的值平移我希望得到秒值7.0和索引1.我希望O \ P是

[[],[],[1,7.0],[2,3.0],[],[1,10]]

基本上我需要O / P作为平移值的索引和秒列中的后续值的组合。

3 个答案:

答案 0 :(得分:2)

考虑下面的设置参考中的pd.DataFrame df

方法1

  • xs横截面
  • any(1)检查是否有任何行
df.loc[df.xs('Panning', axis=1, level=1).eq('Panning').any(1)]

enter image description here

方法2

  • stack
  • query
  • unstack
df.stack(0).query('Panning == "Panning"').stack().unstack([-2, -1])

enter image description here

仅返回sec

df.xs('sec', axis=1, level=1)[df.xs('Panning', axis=1, level=1).eq('Panning').any(1)]

enter image description here

<强> 设置
参考

from StringIO import StringIO
import pandas as pd

txt = """None    5.0        None        0.0
None    6.0        None        1.0
Panning  7.0        None        2.0 
None    8.0        Panning     3.0
None    9.0        None        4.0
Panning  10.0      None        5.0"""

df = pd.read_csv(StringIO(txt), delim_whitespace=True, header=None)

df.columns = pd.MultiIndex.from_product([[1, 2], ['Panning', 'sec']])
df

enter image description here

答案 1 :(得分:1)

您可以使用:

print (dfs)
         1              2     
   Panning   sec  Panning  sec
0     None   5.0     None  0.0
1     None   6.0     None  1.0
2  Panning   7.0     None  2.0
3     None   8.0  Panning  3.0
4     None   9.0     None  4.0
5  Panning  10.0     None  5.0

循环解决方案

ide=[]
for index,row in dfs.iterrows():
    if (row[:, 'Panning'] == 'Panning').any():
        idx1 = row[:, 'Panning'][row[:, 'Panning'] == 'Panning'].index.tolist()
        idx2 = row.loc[(idx1, 'sec')].values.tolist()[0]
        idx1.append(idx2)
        ide.append(idx1)
    else:
        ide.append([])

print (ide)
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]

堆叠式解决方案

stacked = dfs.stack(0).reset_index(level=1)
mask = stacked['Panning'] == 'Panning'
L = stacked[mask].reindex(dfs.index).drop('Panning', axis=1).fillna('').values.tolist()
print (L)
[['', ''], ['', ''], ['1', 7.0], ['2', 3.0], ['', ''], ['1', 10.0]]

print ([x if not x == ['', ''] else [] for x in L])
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]

<强>解释

#stacked top level of MultiIndex in column
#create column from 1. level of index values
stacked = dfs.stack(0).reset_index(level=1)
print (stacked)
  level_1  Panning   sec
0       1     None   5.0
0       2     None   0.0
1       1     None   6.0
1       2     None   1.0
2       1  Panning   7.0
2       2     None   2.0
3       1     None   8.0
3       2  Panning   3.0
4       1     None   9.0
4       2     None   4.0
5       1  Panning  10.0
5       2     None   5.0
#boolean indexing
#http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
mask = stacked['Panning'] == 'Panning'
print (mask)
0    False
0    False
1    False
1    False
2     True
2    False
3    False
3     True
4    False
4    False
5     True
5    False
Name: Panning, dtype: bool

print (stacked[mask])
  level_1  Panning   sec
2       1  Panning   7.0
3       2  Panning   3.0
5       1  Panning  10.0
#reindex by original index, remove column Panning
print (stacked[mask].reindex(dfs.index).drop('Panning', axis=1))
  level_1   sec
0     NaN   NaN
1     NaN   NaN
2       1   7.0
3       2   3.0
4     NaN   NaN
5       1  10.0

#replace NaN to '' and generate list of list
L = stacked[mask].reindex(dfs.index).drop('Panning', axis=1).fillna('').values.tolist()
print (L)
[['', ''], ['', ''], ['1', 7.0], ['2', 3.0], ['', ''], ['1', 10.0]]

#replace empty lists by empty list
print ([x if not x == ['', ''] else [] for x in L])
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]

答案 2 :(得分:0)

df.iterrows()返回Series,如果您需要原始的index,则需要拨打name Series的{​​{1}}这样的内容:

for index,row in df.iterrows():
    print row.name