将pandas.read_html中的列表重新转换为数据帧

时间:2015-11-09 21:14:48

标签: python pandas

我正在处理HTML页面并将其数据转换为RDF。我将html表格数据加载到数据框中,现在尝试用下划线替换空格。我收到这个错误:

<div id="products">Products DIV</div>

在某些时候,数据帧正在转换为列表。为什么会这样?我该怎么做才能找回数据框?

这是代码:

'list' object has no attribute 'replace'

更新1

将创建的列表重新转换为数据帧是否有意义?如果是,那怎么做呢?

更新2

#Parsing data from website
Immunization_Coverage_Data_url = 'http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tswucoveragebcg.html'

#Adding website data to dataframe. Matching by year identifies the specific table of interest
df1 = pd.DataFrame()
df1 = pd.read_html(Immunization_Coverage_Data_url, match='2014', flavor='bs4', header=0, index_col=None, skiprows=0, attrs=None, parse_dates=False, tupleize_cols=False, thousands=', ', encoding=None)
#print df1

    # Preparing URIs
    url = 'http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tswucoveragebcg/'
    url_start = '<'
    url_end = '>'
    DBpedia_resource_url = 'http://dbpedia.org/page/'

    #Cleaning data in the dataframe
    df2 = pd.DataFrame()
    #df1.replace(to_replace=' ', value='_', inplace=True, limit=None, regex=False, method='pad', axis=None)
    df1[df1.replace({' ': '_'}, regex=True)]
    print df1

我收到以下错误:

If I add the following:

df2 = pd.concat(df1)
df2.replace(r' ', '_', regex=True)
print df2

0 个答案:

没有答案