每回合从两个不同的for循环连接字符串

时间:2016-09-01 21:50:28

标签: python string python-2.7 loops concatenation

我有两个不同的for循环,它们运行相同的次数并在每次迭代时生成一个字符串。 (我正在抓取一个html文件)我希望第一个循环中的字符串合并/连接/附加来自第二个循环的字符串FOR EACH ITERATION(这是棘手的部分)这是我的代码:

from bs4 import BeautifulSoup

bsObj = BeautifulSoup(open("samfull.html"), "html.parser")
tableList = bsObj.find_all("table", {"class":"width100 menu_header_top_emr"}) 
tdList = bsObj.find_all("td", {"class":"menu_header width100"})

for table in tableList:
    first_part_of_row_string = ''
    item = table.find_all("span", {"class":"results_body_text"})
    for i in range(len(item)):
        first_part_of_row_string += (item[i].get_text().strip() + ", ")

for td in tdList:
    second_part_of_row_string = ''
    items = td.find_all("span", {"class":"results_body_text"})
    for i in range(len(items)):
        second_part_of_row_string += (items[i].get_text().strip() + ", ")

举个例子:

for table in tableList循环的示例结果是:

a,b,
1,2,
father, mother,

for td in tdList循环是:

c, d, e,
3, 4, 5,
son, daughter, twin,

我想将每次迭代的first_part_of_row_string与每次迭代的second_part_of_row_string结合起来

所以我想打印出来:

a, b, c, d, e,
1, 2, 3, 4, 5
father, mother, son, daughter, twin,

这两个循环的每次迭代都有效first_part_of_row_string + second_part_of_row_string

tableList和tdList的长度相同,因此两个循环将始终返回相同的行数。如果td在tableList中引用的同一个表中,我可以在一个循环中,不幸的是它不是。在html中,tableList定义中指定的类的表后面跟着另一个没有类的表,但是总是包含一个带有tdList中指定的类的td。下面包含此html的示例。整个页面有几千行,所以我把它放在一个单独的链接上。link

<table cellspacing="0" cellpadding="0"
        style="margin-left: auto; margin-right: auto;" class="width100 menu_header_top_emr">
        <tbody>
            <tr>
                <td style="width:80px;">
                    <div style="width:70px;background-color:#B2EE98; border:1px solid grey; padding:2px 5px 2px 5px; text-align:center;">Entity</div>
                </td>
                <td style="padding-left:5px;">
                    <span class="results_body_text"><h5 style="vertical-align: middle;">Rascal X-Press, Inc.</h5></span>
                </td>
                <td style="width:130px;">
                    <div class="right">
                    <span class="results_title_text">Status:</span> 
                    <span class="results_body_text">
                        Submitted
                    </span>
                    </div>
                </td>
                <td style="width:22px;">
                    <a href="" class="more_duns_link_emr right" style="display: inline;"><img
                        id="more_duns_link_emr"
                        src="/SAMSearch/styles/img/expand-small-blue.png" style="padding:8px 8px 8px 2px;" 
                        alt="Expand Search Result for Rascal X-Press, Inc."></a>
                    <a href="" class="hide_duns_link_emr off right" style="display: none;"><img
                        id="hide_duns_link_emr"
                        src="/SAMSearch/styles/img/collapse-small-blue.png" style="padding:8px 8px 8px 2px;" 
                        alt="Collapse Search Result for Rascal X-Press, Inc."></a>
                </td>
            </tr>
        </tbody>
    </table>    
    <table>
        <tbody>
            <tr>
                <td class="menu_header width100">
                    <table>
                        <tr>
                            <td style="width:25%;">
                                <span class="results_title_text">DUNS:</span> <span class="results_body_text"> 012361296</span>
                            </td>                                                       
                            <td style="width:25%;">
                            </td>

                            <!-- label as CAGE when US Territory is listed as Country -->
                            <td style="width:27%;">
                                    <span class="results_title_text">CAGE Code:</span> <span class="results_body_text"></span>

                            </td>
                            <td style="width:15%" rowspan="2">
                                <input type="button" value="View Details" title="View Details for Rascal X-Press, Inc." class="center" style="height:25px; width:90px; vertical-align:middle; margin:7px 3px 7px 3px;" onClick="viewEntry('4420848', '1472652382619')" />
                            </td>
                        </tr>
                        <tr>
                            <td colspan="2">
                                <span class="results_title_text">Has Active Exclusion?: </span>
                                <span class="results_body_text">
                                    No
                                </span>
                            </td>
                            <td>
                                <span class="results_title_text">DoDAAC:</span> <span class="results_body_text"></span>
                            </td>
                        </tr>
                        <tr>
                            <td colspan="2">
                                <span class="results_title_text">Expiration Date:</span>
                                <span class="results_body_text">
                                </span>
                            </td>
                            <td colspan="2"><span class="results_title_text">Delinquent Federal Debt?</span>
                                <span class="results_body_text">
                                        No 
                                </span>
                            </td>
                        </tr>
                        <tr>
                            <td colspan="2"><span class="results_title_text">Purpose of Registration:</span>
                                <span class="results_body_text">
                                    Federal Assistance Awards Only
                                </span>
                            </td>
                        </tr>
                    </table>
                    <div class="off_duns_emr" style="display: none;">
                        <table class="resultbox1 menu_header width100"
                            style="margin-left: auto; margin-right: auto;" cellpadding="2">
                            <tbody>
                                <tr>
                                    <td colspan="3"><span class="results_title_text">Address:</span>
                                    <span class="results_body_text">1372 State Hwy 37</span></td>
                                </tr>

                                <tr>
                                    <td style="width:212px;"><span class="results_title_text">City:</span>
                                    <span class="results_body_text">West Frankfort</span></td>

                                    <td style="width:200px;"><span class="results_title_text">State/Province:</span>
                                    <span class="results_body_text">IL</span></td>
                                </tr>
                                <tr>
                                    <td style="width:130px;"><span class="results_title_text">ZIP Code:</span>
                                    <span class="results_body_text">62896-5007</span></td>

                                    <td style="width:200px;"><span class="results_title_text">Country:</span>
                                    <span class="results_body_text">UNITED STATES</span></td>
                                </tr>
                            </tbody>
                        </table>
                    </div>
                </td>
            </tr>
        </tbody>
    </table></td>
            </tr>
    </tbody>
</table>                                </li>
                            </td>
                        </tr>
                    </table>

2 个答案:

答案 0 :(得分:0)

有很多方法可以做你要求的事情,这里有一个非常简单的方法:

tableList = [
    ["a", "b"],
    ["1", "2"],
    ["father", "mother"]
]

tdList = [
    ["c", "d", "e"],
    ["3", "4", "5"],
    ["son", "daughter", "twin"]
]

len_list = max(len(tableList), len(tdList))

for i in range(len_list):
    print ", ".join(tableList[i] + tdList[i])

答案 1 :(得分:0)

使用zip,并使用join代替连接逗号:

for table,td in zip(tableList,tdList):
    a = ', '.join(table.find_all("span", {"class":"results_body_text"}))
    b = ', '.join(td.find_all("span", {"class":"results_body_text"}))
    print(a, b, sep=', ')

如果您使用的是Python 3.5,则可以使用更强大的解包语法:

for table,td in zip(tableList,tdList):
    a = table.find_all("span", {"class":"results_body_text"})
    b = td.find_all("span", {"class":"results_body_text"})
    print(*a, *b, sep=', ')

如果您使用的是Python 2,请将行from __future__ import print_function放在代码顶部并使用Python 3的打印函数语法,或者只需手动加入所有内容:

for table,td in zip(tableList,tdList):
    a = table.find_all("span", {"class":"results_body_text"})
    b = td.find_all("span", {"class":"results_body_text"})
    print ', '.join(a+b)