解码数据框中的一列,然后删除“ b'\ xc2 \ xa” n“ \ xc2 \ xa0”

时间:2019-06-13 04:18:56

标签: python python-3.x pandas

我有两个问题。

  1. 我所有的列都以字母“ b”开头。我想摆脱此字符并将所有值转换为float。 (我附加了整个数据框的图像)。

enter image description here

  1. 对于“价格”列,此附加编码为“ \ xc2 \ xa”。我想删除它并保留十进制值。 (我附上了本专栏的图片)。

enter image description here

通过将列转换为字符串然后使用以下代码,我能够删除该列的'b'字符:

price.replace('b','')

但是当我使用“ \ xc2 \ xa”尝试此代码时,它不起作用。我还认为将所有列都转换为字符串效率不高,那么还有什么更好的选择?

这是我的全部代码(如果有帮助的话):

import requests
import pandas as pd
from bs4 import BeautifulSoup

Base_url = ("https://www.nseindia.com/live_market/dynaContent/live_watch/fxTracker/optChainDataByExpDates.jsp")

page = requests.get(Base_url)

soup = BeautifulSoup(page.content, 'html.parser')
table_it = soup.find_all(class_="opttbldata")

spot = soup.select_one("div:contains('REFERENCE RATE') > strong").text
ATM = (round(float(spot)*4))/4
OTMCE = ATM + 0.50
OTMPE = ATM - 0.50

table_cls_1 = soup.find_all(id = "octable")
col_list = []

for mytable in table_cls_1:
    table_head = mytable.find('thead')

    try:
        rows = table_head.find_all('tr')
        for tr in rows:
            cols = tr.find_all('th')
            for th in cols:
                er = th.text
                ee = er.encode('utf-8')
                col_list.append(ee)
    except:
        print('no thread')

col_list_fnl = [e for e in col_list if e not in ('CALLS', 'PUTS', 'Chart', '\xc2\xa0')]

table_cls_2 = soup.find(id = "octable")
all_trs = table_cls_2.find_all('tr')
req_row = table_cls_2.find_all('tr')

df = pd.DataFrame(index=range(0,len(req_row)-3),columns = col_list_fnl)

row_marker = 0

for row_number, tr_nos in enumerate(req_row):
    if row_number <= 1 or row_number == len(req_row)-1:
        continue # To insure we only choose non empty rows

    td_columns = tr_nos.find_all('td')

    # Removing the graph column
    select_cols = td_columns[1:22]
    cols_horizontal = range(0,len(select_cols))

    for nu, column in enumerate(select_cols):

        utf_string = column.get_text()
        utf_string = utf_string.strip('\n\r\t": ')
        tr = utf_string.encode('utf-8')

        df.iloc[row_marker,[nu]] = tr

    row_marker += 1

print(df)

1 个答案:

答案 0 :(得分:1)

我根据@ cs95和@eyllanesc的注释更改了您的代码。我可以执行代码而不会出错,它会产生一个没有字节编码的数据帧。

import requests
import pandas as pd
from bs4 import BeautifulSoup

Base_url = ("https://www.nseindia.com/live_market/dynaContent/live_watch/fxTracker/optChainDataByExpDates.jsp")

page = requests.get(Base_url)

soup = BeautifulSoup(page.text, 'html.parser')
table_it = soup.find_all(class_="opttbldata")

spot = soup.select_one("div:contains('REFERENCE RATE') > strong").text
ATM = (round(float(spot)*4))/4
OTMCE = ATM + 0.50
OTMPE = ATM - 0.50

table_cls_1 = soup.find_all(id = "octable")
col_list = []

for mytable in table_cls_1:
    table_head = mytable.find('thead')

    try:
        rows = table_head.find_all('tr')
        for tr in rows:
            cols = tr.find_all('th')
            for th in cols:
                er = th.text
                col_list.append(er)
    except:
        print('no thread')

col_list_fnl = [e for e in col_list if e not in ('CALLS', 'PUTS', 'Chart', '\xc2\xa0')]

table_cls_2 = soup.find(id = "octable")
all_trs = table_cls_2.find_all('tr')
req_row = table_cls_2.find_all('tr')

df = pd.DataFrame(index=range(0,len(req_row)-3),columns = col_list_fnl)

row_marker = 0

for row_number, tr_nos in enumerate(req_row):
    if row_number <= 1 or row_number == len(req_row)-1:
        continue # To insure we only choose non empty rows

    td_columns = tr_nos.find_all('td')

    # Removing the graph column
    select_cols = td_columns[1:22]
    cols_horizontal = range(0,len(select_cols))

    for nu, column in enumerate(select_cols):

        utf_string = column.get_text()
        utf_string = utf_string.strip('\n\r\t": ')
        tr = utf_string

        df.iloc[row_marker,[nu]] = tr

    row_marker += 1

display(df)

此打印:

enter image description here

添加

要正确地将列转换为唯一名称,并将值转换为浮点值,请执行以下操作:

cols = ['_first_col', 'Chart ', 'OI', 'Change in OI', 'Volume', 'IV', 'LTP', 'BidQty',
       'BidPrice', 'AskPrice_01', 'AskQty', 'Strike Price', 'BidQty', 'BidPrice',
       'AskPrice_02', 'AskQty', 'LTP', 'IV', 'Volume', 'Change in OI', 'OI',
       'Chart']
df.columns = cols

df.AskPrice_01 = df.AskPrice_01.apply(lambda x: float(x) if x != "-" else None)

df.AskPrice_02 = df.AskPrice_02.apply(lambda x: float(x) if x != "-" else None)

要过滤特定列,可以使用以下方法:

df[df.AskPrice_01 > 65.25].AskPrice_01

我希望这会有所帮助。祝您项目顺利!