pandas read_html不存储完整的数据

时间:2016-12-29 06:18:55

标签: python pandas web-scraping

我在pandas中使用read_html函数从一些html表中提取数据。但由于某种原因,输出在一定大小后被削减:

示例:

0     RECKITT BENCKISER INDIA PRIVATE LIMITED  Vs.ST...
1     SMT. SONY AND ANOTHER  Vs.  STATE OF UTTARAKHA...
2     BHATIA BHAWAN DHARAMSHALA  Vs.  STATE OF UTTAR...
3     MOHD. YASEEN AND OTHERS  Vs.  STATE OF UTTARAK...
4     DR. ADITYA PRAKASH SINGH  Vs.  STATE OF UTTARA...
5     DR. MANOJ KUMAR UNIYAL  Vs.  STATE OF UTTARAKH...
6     DR. LALIT MOHAN PANDEY  Vs.  STATE OF UTTARAKH...
7     SUBHAM SAINI AND ANOTHER  Vs.  STATE OF UTTARA...

在这种情况下,这里的表应该存有STATE OF UTTARAKHAND(+更多数据)

源代码:

<span class="style2">RECKITT BENCKISER INDIA PRIVATE LIMITED
</span><br><span class="style4"> Vs.</span><br><span  
class="style2">STATE OF UTTARAKHAND AND ANOTHER
</span></td><td width="20%"

如何解决此问题。

我只是在做:

df = pd.read_html(test,flavor='html5lib',header=0)
print (df)

1 个答案:

答案 0 :(得分:0)

获得完整的文字。由于列宽有限,它没有显示全文。

检查一下:

import pandas as pd

pd.set_option('max_colwidth',400)
df=pd.read_html('http://pastebin.com/raw/p7vfb2JG')[0]
df.head()

输出:

enter image description here