我正在处理一个看起来像这样的多索引数据框:
(抱歉写了null而不是NaN)
找出突出显示的模式的最有效方法是什么?
我希望得到像这样的结果:
提前感谢任何见解!
谁想玩它:
from io import StringIO
import pandas as pd
df1_text = """ A B C
STAND1 CH1 NaN NaN NaN
STAND1 CH2 NaN 11.2 NaN
STAND1 CH3 12.4 7.0 NaN
STAND1 CH4 10.2 2.0 NaN
STAND2 CH1 NaN 2.5 NaN
STAND2 CH2 NaN 11.2 NaN
STAND2 CH3 NaN NaN 6.3
STAND2 CH4 NaN NaN 23.5
STAND3 CH1 NaN NaN NaN
STAND3 CH2 12.3 NaN NaN
STAND3 CH3 5.3 4.5 NaN
STAND3 CH4 7.2 25.6 NaN"""
df1 = pd.read_csv(StringIO(df1_text), delim_whitespace=True)
答案 0 :(得分:1)
这是一种方法。简而言之,您可以使用
df2 = df.swaplevel(0,1).unstack().notnull()
print(pd.Series(np.dot(df2.index, df2)).value_counts())
第一行创建df2
,它将通道列与9列非空的单元格的布尔指示符对齐,例如。
# A B C
# STAND1 STAND2 STAND3 STAND1 STAND2 STAND3 STAND1 STAND2 STAND3
# CH1 False False False False True False False False False
# CH2 False False True True True False False False False
# CH3 True False True True False True False True False
# CH4 True False True True False True False True False
第二步的目标是用表示事件的字符串替换df2
中的每一列。使用Python字符串可以乘以整数的事实,我们得到
np.dot([CH1, CH2, CH3, CH4], [True, True, False, False]) <==>
'CH1' * True + 'CH2' * True + 'CH3' * False + 'CH4' * False <==>
'CH1' * 1 + 'CH2' * 1 + 'CH3' * 0 + 'CH4' * 0 <==>
'CH1' + 'CH2' <==>
'CH1CH2'
这有一个美化缺陷,即省略逗号并包含一个空的&#34;事件
完整示例:
from io import StringIO
import pandas as pd
df1_text = """ A B C
STAND1 CH1 NaN NaN NaN
STAND1 CH2 NaN 11.2 NaN
STAND1 CH3 12.4 7.0 NaN
STAND1 CH4 10.2 2.0 NaN
STAND2 CH1 NaN 2.5 NaN
STAND2 CH2 NaN 11.2 NaN
STAND2 CH3 NaN NaN 6.3
STAND2 CH4 NaN NaN 23.5
STAND3 CH1 NaN NaN NaN
STAND3 CH2 12.3 NaN NaN
STAND3 CH3 5.3 4.5 NaN
STAND3 CH4 7.2 25.6 NaN"""
df1 = pd.read_csv(StringIO(df1_text), delim_whitespace=True)
# solution
df2 = df.swaplevel(0,1).unstack().notnull()
print(pd.Series(np.dot(df2.index, df2)).value_counts())
# In [559]: df.swaplevel(0,1).unstack().notnull()
# Out[559]:
# A B C
# STAND1 STAND2 STAND3 STAND1 STAND2 STAND3 STAND1 STAND2 STAND3
# CH1 False False False False True False False False False
# CH2 False False True True True False False False False
# CH3 True False True True False True False True False
# CH4 True False True True False True False True False
# In [560]: np.dot(df2.index, df2)
# Out[560]:
# array(['CH3CH4', '', 'CH2CH3CH4', 'CH2CH3CH4', 'CH1CH2', 'CH3CH4', '',
# 'CH3CH4', ''], dtype=object)
# In [561]: pd.Series(np.dot(df2.index, df2)).value_counts()
# Out[561]:
# CH3CH4 3
# 3
# CH2CH3CH4 2
# CH1CH2 1
# dtype: int64