我有以下SQL查询,用于查找特定begin
的{{1}}和end
之间的重叠:
note_id
这需要花费一整天的时间。我正在尝试按照此线程转换为熊猫合并: pandas-join-dataframe-with-condition
到目前为止,我想到了(select a.*, b.*
from test.analytical_cui_mipacq_concepts_new a
inner join test.analytical_cui_mipacq_concepts_new b on (
( b.begin>=a.begin and b.begin<=a.end )
or
( b.begin<=a.begin and b.end>=a.begin )
)
where ((a.system='metamap' and b.system!=a.system) or (a.system='metamap' and b.system=a.system and a.id_ != b.id_ and a.note_id = b.note_id))
是我的原始数据帧,new
是我识别特定个人的方式,note_id
是数据库表中的pk):< / p>
id_
运行此命令时,出现以下错误:
a = new.copy()
b = new.copy()
b.columns
b = b.rename(index=str, columns={'end':'end_x', 'begin': 'begin_x', 'cui': 'cui_x',
'old_cui': 'old_cui_x', 'type': 'type_x',
'polarity': 'polarity_x', 'id_':'id_x'})
c = a.merge(b, how='inner', on=['note_id'])
print(len(a), len(b), len(c))
c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
| ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
(((c.system=='metamap') & (c.system!=c.system_x))
| ((c.system_x=='metamap') & (c.system==c.system_x)
& (c.id_ != c.id_x) & (c.note_id == c.note_id_x)))]
甚至在谷歌搜索之后,都不确定这意味着什么。
数据如下:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-e8c0d060f2a0> in <module>()
32 print(len(a), len(b), len(c))
33 c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
---> 34 | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
35 (((c.system=='metamap') & (c.system!=c.system_x))
36 | ((c.system_x=='metamap') & (c.system==c.system_x)
/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
1674
1675 elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676 raise ValueError("Can only compare identically-labeled "
1677 "Series objects")
1678
ValueError: Can only compare identically-labeled Series objects