将复杂的SQL连接转换为Pandas合并

时间:2019-03-08 02:37:23

标签: python pandas join merge

我有以下SQL查询,用于查找特定begin的{​​{1}}和end之间的重叠:

note_id

这需要花费一整天的时间。我正在尝试按照此线程转换为熊猫合并: pandas-join-dataframe-with-condition

到目前为止,我想到了(select a.*, b.* from test.analytical_cui_mipacq_concepts_new a inner join test.analytical_cui_mipacq_concepts_new b on ( ( b.begin>=a.begin and b.begin<=a.end ) or ( b.begin<=a.begin and b.end>=a.begin ) ) where ((a.system='metamap' and b.system!=a.system) or (a.system='metamap' and b.system=a.system and a.id_ != b.id_ and a.note_id = b.note_id)) 是我的原始数据帧,new是我识别特定个人的方式,note_id是数据库表中的pk):< / p>

id_

运行此命令时,出现以下错误:

a = new.copy()
b = new.copy()
b.columns

b = b.rename(index=str, columns={'end':'end_x', 'begin': 'begin_x', 'cui': 'cui_x', 
                                 'old_cui': 'old_cui_x', 'type': 'type_x', 
                                 'polarity': 'polarity_x', 'id_':'id_x'}) 

c = a.merge(b, how='inner', on=['note_id'])

print(len(a), len(b), len(c))
c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x)) 
       | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
      (((c.system=='metamap') &  (c.system!=c.system_x)) 
       | ((c.system_x=='metamap') & (c.system==c.system_x) 
          & (c.id_ != c.id_x) & (c.note_id == c.note_id_x)))]

甚至在谷歌搜索之后,都不确定这意味着什么。

数据如下:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-e8c0d060f2a0> in <module>()
     32 print(len(a), len(b), len(c))
     33 c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x)) 
---> 34        | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
     35       (((c.system=='metamap') &  (c.system!=c.system_x)) 
     36        | ((c.system_x=='metamap') & (c.system==c.system_x) 

/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
   1674 
   1675         elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676             raise ValueError("Can only compare identically-labeled "
   1677                              "Series objects")
   1678 

ValueError: Can only compare identically-labeled Series objects

0 个答案:

没有答案