问题看起来像这样:
我有一个具有2级多索引的数据帧left
,表示事件tpc
发生在时间区域onset
中的点mc
上。每个事件都在(staff, voice)
定义的层中发生:
mc onset staff voice tpc dynamics chords
section ix
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 NaN NaN
3 0 0 1 1 4 NaN NaN
4 0 0 1 1 1 NaN NaN
5 0 0 1 1 0 NaN NaN
6 0 3/4 2 2 1 NaN NaN
7 0 3/4 2 1 1 NaN NaN
然后,存在带有其他事件right
的数据帧('dynamic', 'chords')
,需要将其填充到left
中:
mc onset staff voice dynamics chords
0 0 0 1 1 f NaN
1 0 0 1 1 NaN I
2 0 1/2 2 1 p NaN
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
right
的所有事件都必须显示在left
left
个事件,则为这些事件填写left
的相应列(即,加入['mc', 'onset', 'staff', 'voice']
;例如第0、1行,4)left
中的staff
个事件同时发生,则为这些事件填写left
的相应列(即,加入['mc', 'onset', 'staff']
;例如第4行)left
事件同时发生,则为这些事件填写left
的相应列(即,加入['mc', 'onset']
,例如第3行) left
事件同时发生,则发出警告并保留以进行进一步处理(例如,第2行)right
中两个相同类型的事件同时发生,则发出警告并连接值(例如第3行和第4行) mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6
7 0 3/4 2 1 1 NaN I6I64
WARNING: These events could not be attached:
mc onset staff voice dynamics chords
2 0 1/2 2 1 p NaN
WARNING: These events are simultaneous:
mc onset staff voice dynamics chords
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
由于我想避免迭代right
的方法,因此我尝试了以下方法:
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
此方法不起作用,因为在第一次合并后,match
如下所示:
mc onset staff voice dynamics chords tpc
0 2 0 0 1 1 f NaN 0
3 0 0 1 1 f NaN 4
4 0 0 1 1 f NaN 1
5 0 0 1 1 f NaN 0
2 0 0 1 1 NaN I 0
3 0 0 1 1 NaN I 4
4 0 0 1 1 NaN I 1
5 0 0 1 1 NaN I 0
7 0 3/4 2 1 NaN I64 1
由于match
的索引不是唯一的,因此赋值left = match
不能完全正常工作(结果中缺少dynamics
),并且使用fillna
的注释方法默默地什么也没做。另外,我还要进行两次相同的合并,以使left_index
正确分配,然后right_index
丢弃匹配的行。
面对这些问题,我在连接之前对right
进行了预处理,以将同时发生的事件合并为一行:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print(f"WARNING:Two simultaneous events in row {df.iloc[0].name}")
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
(由于某种未知的原因,用fillna
注释掉的方法再次无济于事。执行相同的合并两次的问题仍然存在。)结果是我可以接受的一种方法,但是确实可以不能区分right
的各层,因此看起来像这样:
mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 f I
1 0 0 2 1 0 f I
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6I64
7 0 3/4 2 1 1 NaN I6I64
WARNING:Two simultaneous events at:
mc onset
3 0 3/4
WARNING: These events could not be attached:
mc onset dynamics chords
1 0 1/2 p NaN
通常如何解决?
以下是复制的源代码:
import pandas as pd
import numpy as np
from fractions import Fraction
left_dict = {'mc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 0,
(0, 4): 0,
(0, 5): 0,
(0, 6): 0,
(0, 7): 0},
'onset': {(0, 0): Fraction(0, 1),
(0, 1): Fraction(0, 1),
(0, 2): Fraction(0, 1),
(0, 3): Fraction(0, 1),
(0, 4): Fraction(0, 1),
(0, 5): Fraction(0, 1),
(0, 6): Fraction(3, 4),
(0, 7): Fraction(3, 4)},
'staff': {(0, 0): 2,
(0, 1): 2,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 2},
'voice': {(0, 0): 1,
(0, 1): 1,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 1},
'tpc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 4,
(0, 4): 1,
(0, 5): 0,
(0, 6): 1,
(0, 7): 1},
'dynamics': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan},
'chords': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan}}
left = pd.DataFrame.from_dict(left_dict)
right_dict = {'mc': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'onset': {0: Fraction(0, 1),
1: Fraction(0, 1),
2: Fraction(1, 2),
3: Fraction(3, 4),
4: Fraction(3, 4)},
'staff': {0: 1, 1: 1, 2: 2, 3: 1, 4: 2},
'voice': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'dynamics': {0: 'f', 1: np.nan, 2: 'p', 3: np.nan, 4: np.nan},
'chords': {0: np.nan, 1: 'I', 2: np.nan, 3: 'I6', 4: 'I64'}}
right = pd.DataFrame.from_dict(right_dict)
attempt1 = True
if attempt1:
left_features = ['mc', 'onset', 'staff', 'voice', 'tpc']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
#left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
else:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print("WARNING:Two simultaneous events at:")
print(df.iloc[:1][['mc', 'onset']])
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
答案 0 :(得分:0)
事实证明,解决我的问题的最简单方法是使用循环:
isnan = lambda num: num != num
right_features = ['dynamics', 'chords']
for i, r in right.iterrows():
same_os = left.loc[(left.mc == r.mc) & (left.onset == r.onset)]
if len(same_os) > 0:
same_staff = same_os.loc[same_os.staff == r.staff]
same_voice = same_staff.loc[same_staff.voice == r.voice]
if len(same_voice) > 0:
fill = same_voice
elif len(same_staff) > 0:
fill = same_staff
else:
fill = same_os
for f in right_features:
if not isnan(r[f]):
F = left.loc[fill.index, f]
notna = F.notna()
if notna.any():
print(f"WARNING:Feature existed and was concatenated: {F[notna]}")
left.loc[F[notna].index, f] += r[f]
left.loc[F[~notna].index, f] = r[f]
else:
left.loc[fill.index, f] = r[f]
else:
print(f"WARNING:Event could not be attached: {r}")