尝试处理ValueError时出现问题:没有找到pandas的表格?

时间:2016-11-04 04:22:26

标签: python python-3.x pandas append

我想从一系列链接中的所有表中读取和创建数据框。假设我有:

list_links = ['url1.com', 'url2.com', 'url3.com',...,'urln.com']

然后:

for url in lis:
    try:
        df = pd.read_html(url,index_col=None, header=0)
        lis.append(df)
        frame = pd.concat(url, ignore_index=True)
    except:
        pass

然而,我无法获得数据帧,没有任何反应:

In: frame

Out:

In: print(frame)

Out: 

哪种方法可以在每个链接中的所有表中将所有表附加到单个表中?请注意,某些链接没有表格...因此我尝试了pass。我也尝试过这个:

import multiprocessing
def process_url(url):
    df_url = pd.read_html(url)
    df = pd.concat(df_url, ignore_index=True) 
    return df_url

pool = multiprocessing.Pool(processes=4)
pool.map(process_url, lis)

然后:

ValueError                                Traceback (most recent call last)
<ipython-input-3-46e04cfd0bfe> in <module>()
      7 
      8 pool = multiprocessing.Pool(processes=4)
----> 9 pool.map(process_url, lis)

/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    258         in a list that is returned.
    259         '''
--> 260         return self._map_async(func, iterable, mapstar, chunksize).get()
    261 
    262     def starmap(self, func, iterable, chunksize=None):

/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py in get(self, timeout)
    606             return self._value
    607         else:
--> 608             raise self._value
    609 
    610     def _set(self, i, obj):

ValueError: No tables found

我也试过这个:

import multiprocessing
def process_url(url):
    df_url = pd.read_html(url)
    df = pd.concat(df_url, ignore_index=True) 
    return df_url

pool = multiprocessing.Pool(processes=4)
try:
    dfs_ = pool.map(process_url, lis)
except: 
    pass

没有任何反应。

1 个答案:

答案 0 :(得分:0)

您实际上没有加入数据帧。如果你试试这个怎么办:

df_list = []
for url in list_links:
    try:
        df = pd.read_html(url, index_col=None, header=0)
        df_list.append(df)
    except:
        pass

df = pd.concat(df_list)