Question

从多个URL导入表并希望创建一个数据框，然后将其存储为csv文件。我正在努力从表中删除重复的描述，并且创建后无法操作数据框 dfmaster 。

也许pd.read_html是作为列表而不是数据框导入的？

我试图遍历传入表并使用；

for item in df:  
        if item not in dfmaster:            
            dfmaster.append(item)   
            print(dfmaster)

但这似乎列出了令人反感的重复行。

在附加到 dfmaster 和drop.duplicates

之后，我也尝试过df.drop[0]

producturls = ['https://www.interactivebrokers.com/en/index.php?f=2222&exch=ecbot&showcategories=FUTGRP',
               'https://www.interactivebrokers.com/en/index.php?f=2222&exch=cfe&showcategories=FUTGRP',
               'https://www.interactivebrokers.com/en/index.php?f=2222&exch=dtb&showcategories=FUTGRP&p=&cc=&limit=100&page=2'
               ]
dfmaster =[]

for url in producturls: 
    table = pd.read_html(url, index_col=None, header=None,)
    df = table[2]

    for item in df:  
        if item not in dfmaster:            
            dfmaster.append(item)   
            print(dfmaster)

    dfmaster.to_csv('IB_tickers.csv')

输出应将来自网站的所有表数据缝合到一个数据框中，而无需重复描述标题，然后将其创建并存储为可读的csv文件。

非常感谢您的光临。

Answer 1

这应该对您有用：

import pandas as pd
from tabulate import  tabulate

producturls = ['https://www.interactivebrokers.com/en/index.php?f=2222&exch=ecbot&showcategories=FUTGRP',
               'https://www.interactivebrokers.com/en/index.php?f=2222&exch=cfe&showcategories=FUTGRP',
               'https://www.interactivebrokers.com/en/index.php?f=2222&exch=dtb&showcategories=FUTGRP&p=&cc=&limit=100&page=2'
               ]

df_list = []

for url in producturls:
    table = pd.read_html(url, index_col=None, header=None,)
    df = table[2]
    df_list.append(df)

dfmaster = pd.concat(df_list, sort=False)
dfmaster = dfmaster.drop_duplicates().reset_index(drop=True)
print(tabulate(dfmaster.head(), headers='keys'))
dfmaster.to_csv('IB_tickers.csv')

结果：

    IB Symbol    Product Description                                      Symbol    Currency
                                         (click link for more details)
--  -----------  -------------------------------------------------------  --------  ----------
 0  AC           Ethanol -CME                                             EH        USD
 1  AIGCI        Bloomberg Commodity Index                                AW        USD
 2  B1U          30-Year Deliverable Interest Rate Swap Futures           B1U       USD
 3  DJUSRE       Dow Jones US Real Estate Index                           RX        USD
 4  F1U          5-Year Deliverable Interest Rate Swap Futures            F1U       USD

从多个URL导入表以创建单个数据框和CSV文件

1 个答案: