将数据转换为pandas数据帧

时间:2016-12-19 18:40:34

标签: python pandas dataframe

有没有办法将我网上抓取的数据转换成pandas DataFrame?

报废数据是股票基本面数据Ex。 abt: 2.71 6.00 abt =股票代码,2.71 =价格与账面比率,6.00 = PEG比率

我尝试使用空数据框声明变量并使用.append()函数但没有运气

我猜测数据应该以某种方式转换才能传递给数据帧,但我现在知道如何做到这一点。

使用评论中的建议重做代码,现在数据框出来是空的???

import time
import urllib.request
import urllib.parse
import pandas as pd

sp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']
#stock = 'a'

data = []

color_list = ['<span style="color:#aa0000;">', '<span style="color:#008800;">']
color_close = '</span>'


def finvizPBStats(stock):

    try:

        sourceCode = urllib.request.urlopen('http://finviz.com/quote.ashx?t='+stock).read()
        sourceCodeString = sourceCode.decode()
        pbr = sourceCodeString.split('P/B</td><td width="8%" class="snapshot-td2" align="left"><b>')[1].split('</b></td>')[0]

        for color in color_list:
            if color in pbr:
                pbr = pbr.split(color)[1].split(color_close)[0]
                pbr = float(pbr)

    except Exception as e:
        if Exception:
            pass 

    return        


def finvizPEGStats(stock):

    try: 

        sourceCode = urllib.request.urlopen('http://finviz.com/quote.ashx?t='+stock).read()
        sourceCodeString = sourceCode.decode()  
        PEG = sourceCodeString.split('PEG</td><td width="8%" class="snapshot-td2" align="left"><b>')[1].split('</b></td>')[0]
        for color in color_list:
            if color in PEG:
                PEG = PEG.split(color)[1].split(color_close)[0]
                PEG = float(PEG)

    except Exception as e:
        if Exception:
            pass

    return

for stock in sp500short:
    pbr = finvizPBStats(stock)
    PEG = finvizPEGStats(stock)
    data.append([pbr, PEG])

df = pd.DataFrame(index=sp500short, columns=['pbr', 'PEG'])

print(df)     

2 个答案:

答案 0 :(得分:1)

首先,我会让你的函数返回输出数据:pbrPEG。然后你可以做这样的事情:

data = []
for stock in sp500short:
    pbr, PEG = finvizKeyStats(stock)
    data.append([pbr, PEG])
    time.sleep(1)

pd.DataFrame(data, index=sp500short, columns=['pbr', 'PEG'])

答案 1 :(得分:0)

我使用了execlp( "./secret/path/to/foo", "./foo", "bar", "bletch", NULL ); 并获得了整个数据表

BeautifulSoup

enter image description here

专注于特定比率

有了比率列表,您可以轻松访问相关数据。

import urllib
from bs4 import BeautifulSoup
from io import StringIO
import pandas as pd

sp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']

def get_fin(sym):
    try:
        sourceCode = urllib.request.urlopen('http://finviz.com/quote.ashx?t='+sym).read()
        soup = BeautifulSoup(sourceCode, 'lxml')
        table = soup.find("table", attrs={"class":"snapshot-table2"})
        tdf = pd.read_html(StringIO(table.__repr__()))
        vals = tdf[0].values.reshape(-1, 2)
        return pd.Series(vals[:, 1], vals[:, 0]).rename(sym)
    except:
        pass

df = pd.concat([get_fin(sym) for sym in sp500short], axis=1)

df.head()

enter image description here

<强> 注意:
我质疑我使用ratios = ['P/E', 'PEG'] df.loc[ratios] 获取html字符串。