Question

我的问题非常简单，但作为Python的初学者，我仍然找不到答案..

我使用以下代码从网上提取一些数据：

from bs4 import BeautifulSoup
import urllib2

teams = ("http://walterfootball.com/fantasycheatsheet/2015/traditional")
page = urllib2.urlopen(teams)
soup = BeautifulSoup(page, "html.parser")

f = open('output.txt', 'w')

nfl = soup.findAll('li', "player")
lines = [span.get_text(strip=True) for span in nfl]

lines = str(lines)
f.write(lines)
f.close()

但输出相当混乱。

有没有一种优雅的方式来获得这样的结果？

1. Eddie Lacy, RB, Green Bay Packers. Bye: 7 $60
2. LeVeon Bell, RB, Pittsburgh Steelers. Bye: 11 $60
3. Marshawn Lynch, RB, Seattle Seahawks. Bye: 9 $59
...

Answer 1

只需在列表中使用str.join并.rstrip("+")关闭+：

nfl = soup.findAll('li', "player")
lines = ("{}. {}\n".format(ind,span.get_text(strip=True).rstrip("+"))
         for ind, span in enumerate(nfl,1))
print("".join(lines))

哪会给你：

1. Eddie Lacy, RB, Green Bay Packers. Bye: 7$60
2. LeVeon Bell, RB, Pittsburgh Steelers. Bye: 11$60
3. Marshawn Lynch, RB, Seattle Seahawks. Bye: 9$59
4. Adrian Peterson, RB, Minnesota Vikings. Bye: 5$59
5. Jamaal Charles, RB, Kansas City Chiefs. Bye: 9$54
..................

要分开我们可以分割的价格，或使用re.sub在美元符号前添加空格并写下每一行：

import re
with open('output.txt', 'w') as f:
    for line in lines:
        line = re.sub("(\$\d+)$", r" \1", line, 1)
        f.write(line)

现在输出是：

1. Eddie Lacy, RB, Green Bay Packers. Bye: 7 $60
2. LeVeon Bell, RB, Pittsburgh Steelers. Bye: 11 $60
3. Marshawn Lynch, RB, Seattle Seahawks. Bye: 9 $59
4. Adrian Peterson, RB, Minnesota Vikings. Bye: 5 $59
5. Jamaal Charles, RB, Kansas City Chiefs. Bye: 9 $54

您可以str.rsplit在$上拆分一次并重新加入空格，也可以这样做：

with open('output.txt', 'w') as f:
    for line in lines:
        line,p = line.rsplit("$",1)
        f.write("{} ${}".format(line,p))

Answer 2

遍历列表lines并写下每一行：

for num, line in enumerate(lines, 1):
    f.write('{}. {}\n'.format(num, line))

enumerate用于获取(num, line)对。

顺便说一下，你最好使用with语句而不是手动关闭文件对象：

with open('output.txt', 'w') as f:
    for num, line in enumerate(lines, 1):
        f.write('{}. {}\n'.format(num, line))

美丽的汤 - 如何清理提取数据？

2 个答案: