Question

我也是Pandas和Python的新手。通过从mysql数据库导入2个表创建了2个数据帧：

范围
标题

ranges数据框：

title s_low s_high post pre noy
1       104  106b   0   2   0
1       1     5     1   0   0

此处，每行代表一系列部分，最后三列的数据与范围内的部分相关。 s_low表示范围的下端部分，s_high表示高端部分。有几十个标题，每个标题有很多部分。

titles数据框:(它包含与所有标题下所有部分相关的所有数据。

title   section
1   1
1   101
1   102
1   103
1   104
1   105
1   106
1   106a
1   106b
1   107
1   108
1   109
1   110
1   111
1   112
1   112a
1   112b
1   113
1   114
1   2
1   201
1   202
1   203
1   204
1   205
1   206
1   207
1   208
1   209
1   210
1   211
1   212
1   213
1   3
1   4
1   5
1   6
1   7
1   8

我必须扩展ranges中的范围，并将每个范围之间的部分写入新数据框以及最后三列中的值：post，pre和noy。

这是我到目前为止生成的代码。

import MySQLdb as db
from pandas import DataFrame
from pandas.io.sql import frame_query
import pandas as pd

cnxn = db.connect('xxxx','xxxx','xxxx','xxxx', charset='utf8', use_unicode=True )
ranges = frame_query("SELECT * from ranges", cnxn)
titles = frame_query("SELECT title, section from titles", cnxn)

exp = pd.DataFrame(columns = ['title', 'section', 'post', 'pre', 'noy'])

for index, row in ranges.iterrows():
    t = row['title']
    s_low = row['s_low']
    s_low1 = str(t)+'$'+s_low
    s_high = row['s_high']
    s_high1 = str(t)+'$'+s_high
    post = row['post']
    pre = row['pre']
    noy = row['noy']
    x=0
    for i, r in titles.iterrows():
        title = r['title']
        sec = r['section']
        if ((str(t)+'$'+s_low) == (str(title)+'$'+sec)):
            x = <index of sec>

我正在使用字符串连接，因为有多个标题和多个部分;相同的部分代码可以出现在不同的标题下。我知道我需要在找到s_low之后遍历标题索引，直到我达到s_high并将值写入新数据帧exp。我无法让s_low的索引继续进行。

上面粘贴的范围样本中第一行的样本输出（exp数据框）将是：

title section post pre noy
1      104     0    2   0
1      105     0    2   0
1      106     0    2   0
1      106a    0    2   0
1      106b    0    2   0

非常感谢任何帮助。

从另一个数据帧中的pandas数据帧中搜索范围的两端，并将其扩展到新的数据帧

0 个答案: