Question

我有2个数据框。标题和部分是两列之一。我需要检查第二个数据框中是否存在特定标题和部分的组合。

E.g。数据框torange包含列标题，s_low，s_high和其他标题; usc包含列标题和部分。

如果torange

中有以下行

title   s_low   s_high
  1        1      17

如果有一行

，代码需要签入usc

title   section
 1        1

和一行

title   section
 1          17

存在;并通过扩展torange中s_low和s_high之间的范围，创建一个新表来编写usc中的标题，部分和其余列。

我已经编写了下面的代码，但不知怎的，它只是在几次迭代后才能工作/停止。我怀疑'i'的计数器有问题，可能是语法错误。此外，它可能与any()

的语法有关

import MySQLdb as db
from pandas import DataFrame
from pandas.io.sql import frame_query
cnxn = db.connect('127.0.0.1','xxxxx','xxxxx','xxxxxxx', charset='utf8', use_unicode=True )
torange = frame_query("SELECT title, s_low, s_high, post, pre, noy, rnum from torange", cnxn)
usc = frame_query("SELECT title, section from usc", cnxn)

i=0
for row in torange:
    t =  torange.title[i]
    s_low = torange.s_low[i]
    s_high = torange.s_high[i]
    for row in usc:
        if (any(usc.title == t) & any(usc.section == s_low)):
            print 't', t, 's_low' , s_low,'i', i
            if (any(usc.title == t) & any(usc.section == s_high)):
                print 't', t, 's_high', s_high, 'i',i
                print i, '*******************match************************'
    i=i+1

（请忽略priint语句。这是我正在做的更大任务的一部分，打印用作检查以查看正在发生的事情。）

非常感谢这方面的任何帮助。

Answer 1

你的整个检查和迭代都搞砸了。您在row中迭代usc，但您的any()条件会检查usc，而不是row。而且，row是两个循环的迭代器。这是一个更清洁的起点：

for index, row in torange.iterrows(): 
    t = row['title']
    s_low = row['s_low']
    s_high = row['s_high']
    uscrow = usc[(usc.title == t) & (usc.section == slow)]
    # uscrow now contains all the rows in usc that fulfill your condition.

如何检查pandas数据框中是否存在2列中的值组合

1 个答案: