在Python中格式化表中的文本

时间:2013-04-19 17:23:32

标签: python python-3.x web-scraping beautifulsoup tabular

我在创建一个动态的表以解决各种结果时遇到了问题。

我写了一个屏幕刮刀来从http://finance.yahoo.com拉出股票并打印公司名称,它的符号,以及它的当前股价。

然而输出如下:

 Microsoft Corporation MSFT 29.76

 Apple Inc. AAPL 396.77

 SPDR S&P 500 SPY 155.25

 Google Inc. GOOG 787.76

我希望它看起来像

Microsoft Corporation        MSFT      29.76

Apple Inc.                   AAPL      396.77

SPDR S&P 500                 SPY       155.25

Google Inc.                  GOOG      787.76

我昨天开始使用Python,并使用3.3.1

我目前的代码如下:

import re
import urllib.request
import cgi
from bs4 import BeautifulSoup

price = [0,0,0,0]
namesList = ["string1", "string2", "string3", "string4"]
stocksList = ["msft","aapl","spy","goog"]

def HTML():
    i = 0
    while i < len(stocksList):
        htmlPull = urllib.request.urlopen("http://finance.yahoo.com/q?s="+stocksList[i]+"&ql=1")
        htmlPull = htmlPull.read().decode('utf-8')
        regex = '<span id="yfs_l84_'+stocksList[i]+'">(.+?)</span>'
        pattern = re.compile(regex)
        price[i] = re.findall(pattern,htmlPull)
        htmlParse = BeautifulSoup(htmlPull)
        title = htmlParse.title.contents
        namesList[i] = title        
        i+=1

formatPrice(price)
formatStock(namesList)
formatOutput(namesList, stocksList, price)

def formatPrice(price):
    k=0
    while k < len(price):
        cleaner = str(price[k])
        cleaner = cleaner.replace("[","")
        cleaner = cleaner.replace("]","")
        cleaner = cleaner.replace("'","")
        price[k] = float(cleaner)
        k+=1

def formatStock(namesList):
    k = 0
    while k <len(namesList):
        capital = stocksList[k]
        capital = capital.upper()
        cleaner = str(namesList[k])
        cleaner = cleaner.replace("Summary for ", "")
        cleaner = cleaner.replace(":"," ")
        cleaner = cleaner.replace("- Yahoo! Finance'","")
        cleaner = cleaner.replace("['","")
        cleaner = cleaner.replace("]","")
        cleaner = cleaner.replace(";","")
        cleaner = cleaner.replace(capital, "")
        namesList[k] = cleaner;
        k+=1

    def formatOutput(namesList, stocksList, price):
        i = 0
        while i < len(price):
        capital = stocksList[i]
        capital = capital.upper()
        print(namesList[i],capital, price[i])
        print("")
        i+=1
HTML()

尝试了打印({0},{1},{2} .format(namesList,capital,price [i])),各种类型的{:&lt; 16}变体等。它似乎只影响一行,我试图让它考虑列,或表,或可能是文本和空白应填充的一定数量的空间。我不确定这里究竟是什么解决方案,所以我问你们所有人:)

正如您可以通过我的代码告诉我,我对编程很新,所以如果有更好的方法在这段代码中做任何事情,我会很乐意听取纠正,建议和建议。

1 个答案:

答案 0 :(得分:1)

您希望根据列中最长的项目设置宽度。

在Python中,您使用max来查找某些组中最大的一组。所以,在循环之外,你可以这样做:

names_width = max(len(name) for name in namesList)
stock_width = max(len(stock) for stock in stockList)

然后,按照您所说的尝试格式化每一行:

print({0:{3}}  {1:{4}}  {2}.format(namesList[i],
                                   capital,
                                   price[i],
                                   names_width,
                                   stock_width))