我有一个多索引数据框如下所示:
1 2
panning sec panning sec
None 5.0 None 0.0
None 6.0 None 1.0
Panning 7.0 None 2.0
None 8.0 Panning 3.0
None 9.0 None 4.0
Panning 10.0 None 5.0
我正在迭代行并在平移列中有值'平移'的地方获取索引
ide=[]
for index,row in dfs.iterrows():
if [row[:, 'Panning'][row[:, 'Panning'] == 'Panning']]:
ide.append(row[:, 'Panning'][row[:, 'Panning'] == 'Panning'].index.tolist())
print ide
如果我执行上面的代码,我得到输出
[[],[],[1],[2],[],[1]]
表示值正在平移的索引
现在,我也希望得到相应的秒值,比如第3行的值平移我希望得到秒值7.0和索引1.我希望O \ P是
[[],[],[1,7.0],[2,3.0],[],[1,10]]
基本上我需要O / P作为平移值的索引和秒列中的后续值的组合。
答案 0 :(得分:2)
考虑下面的设置参考中的pd.DataFrame
df
方法1
xs
横截面any(1)
检查是否有任何行df.loc[df.xs('Panning', axis=1, level=1).eq('Panning').any(1)]
方法2
stack
query
unstack
df.stack(0).query('Panning == "Panning"').stack().unstack([-2, -1])
仅返回sec
列
df.xs('sec', axis=1, level=1)[df.xs('Panning', axis=1, level=1).eq('Panning').any(1)]
<强> 设置 强>
参考
from StringIO import StringIO
import pandas as pd
txt = """None 5.0 None 0.0
None 6.0 None 1.0
Panning 7.0 None 2.0
None 8.0 Panning 3.0
None 9.0 None 4.0
Panning 10.0 None 5.0"""
df = pd.read_csv(StringIO(txt), delim_whitespace=True, header=None)
df.columns = pd.MultiIndex.from_product([[1, 2], ['Panning', 'sec']])
df
答案 1 :(得分:1)
您可以使用:
print (dfs)
1 2
Panning sec Panning sec
0 None 5.0 None 0.0
1 None 6.0 None 1.0
2 Panning 7.0 None 2.0
3 None 8.0 Panning 3.0
4 None 9.0 None 4.0
5 Panning 10.0 None 5.0
循环解决方案:
ide=[]
for index,row in dfs.iterrows():
if (row[:, 'Panning'] == 'Panning').any():
idx1 = row[:, 'Panning'][row[:, 'Panning'] == 'Panning'].index.tolist()
idx2 = row.loc[(idx1, 'sec')].values.tolist()[0]
idx1.append(idx2)
ide.append(idx1)
else:
ide.append([])
print (ide)
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]
堆叠式解决方案:
stacked = dfs.stack(0).reset_index(level=1)
mask = stacked['Panning'] == 'Panning'
L = stacked[mask].reindex(dfs.index).drop('Panning', axis=1).fillna('').values.tolist()
print (L)
[['', ''], ['', ''], ['1', 7.0], ['2', 3.0], ['', ''], ['1', 10.0]]
print ([x if not x == ['', ''] else [] for x in L])
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]
<强>解释强>:
#stacked top level of MultiIndex in column
#create column from 1. level of index values
stacked = dfs.stack(0).reset_index(level=1)
print (stacked)
level_1 Panning sec
0 1 None 5.0
0 2 None 0.0
1 1 None 6.0
1 2 None 1.0
2 1 Panning 7.0
2 2 None 2.0
3 1 None 8.0
3 2 Panning 3.0
4 1 None 9.0
4 2 None 4.0
5 1 Panning 10.0
5 2 None 5.0
#boolean indexing
#http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
mask = stacked['Panning'] == 'Panning'
print (mask)
0 False
0 False
1 False
1 False
2 True
2 False
3 False
3 True
4 False
4 False
5 True
5 False
Name: Panning, dtype: bool
print (stacked[mask])
level_1 Panning sec
2 1 Panning 7.0
3 2 Panning 3.0
5 1 Panning 10.0
#reindex by original index, remove column Panning
print (stacked[mask].reindex(dfs.index).drop('Panning', axis=1))
level_1 sec
0 NaN NaN
1 NaN NaN
2 1 7.0
3 2 3.0
4 NaN NaN
5 1 10.0
#replace NaN to '' and generate list of list
L = stacked[mask].reindex(dfs.index).drop('Panning', axis=1).fillna('').values.tolist()
print (L)
[['', ''], ['', ''], ['1', 7.0], ['2', 3.0], ['', ''], ['1', 10.0]]
#replace empty lists by empty list
print ([x if not x == ['', ''] else [] for x in L])
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]
答案 2 :(得分:0)
df.iterrows()
返回Series
,如果您需要原始的index
,则需要拨打name
Series
的{{1}}这样的内容:
for index,row in df.iterrows():
print row.name