我正在处理HTML页面并将其数据转换为RDF。我将html表格数据加载到数据框中,现在尝试用下划线替换空格。我收到这个错误:
<div id="products">Products DIV</div>
在某些时候,数据帧正在转换为列表。为什么会这样?我该怎么做才能找回数据框?
这是代码:
'list' object has no attribute 'replace'
更新1
将创建的列表重新转换为数据帧是否有意义?如果是,那怎么做呢?
更新2
#Parsing data from website
Immunization_Coverage_Data_url = 'http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tswucoveragebcg.html'
#Adding website data to dataframe. Matching by year identifies the specific table of interest
df1 = pd.DataFrame()
df1 = pd.read_html(Immunization_Coverage_Data_url, match='2014', flavor='bs4', header=0, index_col=None, skiprows=0, attrs=None, parse_dates=False, tupleize_cols=False, thousands=', ', encoding=None)
#print df1
# Preparing URIs
url = 'http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tswucoveragebcg/'
url_start = '<'
url_end = '>'
DBpedia_resource_url = 'http://dbpedia.org/page/'
#Cleaning data in the dataframe
df2 = pd.DataFrame()
#df1.replace(to_replace=' ', value='_', inplace=True, limit=None, regex=False, method='pad', axis=None)
df1[df1.replace({' ': '_'}, regex=True)]
print df1
我收到以下错误:
If I add the following:
df2 = pd.concat(df1)
df2.replace(r' ', '_', regex=True)
print df2