如何使用python使用请求/熊猫提取表

时间:2019-04-02 11:55:35

标签: python pandas beautifulsoup request

我尝试使用代码来表示产品名称,年份和表的值,但是有些地方出现问题。 我的代码:

import requests
import pandas as pd
import pymysql



try:
    df = []
    dates1 = []

    try:

        url = 'http://cpmaindia.com/fiscal_custom_duty.php'
        html = requests.get(url).content
        tab_list = pd.read_html(html)
        tab = tab_list[0]
        tab.apply(lambda x: x.tolist(), axis=1)
        tab = tab.values.tolist()
        print(tab)

    except Exception as e:
        raise e


except Exception as e:
    raise e

我尝试了这个,但是没有得到欲望输出。 只想解析表。 谢谢

1 个答案:

答案 0 :(得分:1)

tab_list[0]产生以下内容:

print (tab)
                                                   0
0  <!-- function MM_swapImgRestore() { //v3.0  va...
1  Custom Duty  Import Duty on Petrochemicals (%)...
2  <!-- body { \tmargin-left: 0px; \tmargin-top: ...

您是要抢tab_list[8]吗?

此外,如果您使用熊猫从html读取表,则无需使用requests

import pandas as pd    

url = 'http://cpmaindia.com/fiscal_custom_duty.php'

tab_list = pd.read_html(url)

table = tab_list[8]

table.columns = table.iloc[0,:]
table = table.iloc[1:,2:-1]

输出:

print (table)
0  Import Duty on Petrochemicals (%)  ... Import Duty on Petrochemicals (%)
1                   Product / Year -  ...                             16/17
2                            Naphtha  ...                                 5
3                           Ethylene  ...                               2.5
4                          Propylene  ...                               2.5
5                          Butadiene  ...                               2.5
6                            Benzene  ...                               2.5
7                            Toluene  ...                               2.5
8                       Mixed Xylene  ...                               2.5
9                        Para Xylene  ...                                 0
10                      Ortho Xylene  ...                                 0
11                              LDPE  ...                               7.5
12                             LLDPE  ...                               7.5
13                              HDPE  ...                               7.5
14                                PP  ...                               7.5
15                               PVC  ...                               7.5
16                                PS  ...                               7.5
17                               EDC  ...                                 2
18                               VCM  ...                                 2
19                           Styrene  ...                                 2
20                               SBR  ...                                10
21                               PBR  ...                                10
22                               MEG  ...                                 5
23                               DMT  ...                                 5
24                               PTA  ...                                 5
25                               ACN  ...                                 5

[25 rows x 7 columns]