我有一个称为Combined的数据框。我将这个数据框的一个子集称为A。当我对Profilereport进行合并时,没有问题。当我为A做报告时,出现上述错误。这里的代码:
A = combined.loc[combined.xy== False]
pandas_profiling.ProfileReport(A) #this gives me the error
pandas_profiling.ProfileReport(combined.loc[combined.xy== False]) #same error
pandas_profiling.ProfileReport(combined) # no error
这是错误:
C:\Users\xy\AppData\Local\Continuum\Anaconda2\envs\py36\lib\site-packages\pandas_profiling\report.py:60: RuntimeWarning: invalid value encountered in longlong_scalars
width = int(freq / max_freq * 99) + 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-74-bf3aa50b97ad> in <module>()
----> 1 pandas_profiling.ProfileReport(A)
~\AppData\Local\Continuum\Anaconda2\envs\py36\lib\site-packages\pandas_profiling\__init__.py in __init__(self, df, **kwargs)
67
68 self.html = to_html(sample,
---> 69 description_set)
70
71 self.description_set = description_set
~\AppData\Local\Continuum\Anaconda2\envs\py36\lib\site-packages\pandas_profiling\report.py in to_html(sample, stats_object)
172 templates.template('freq_table'), templates.template('freq_table_row'), 10)
173 formatted_values['firstn_expanded'] = extreme_obs_table(stats_object['freq'][idx], templates.template('freq_table'), templates.template('freq_table_row'), 5, n_obs, ascending = True)
--> 174 formatted_values['lastn_expanded'] = extreme_obs_table(stats_object['freq'][idx], templates.template('freq_table'), templates.template('freq_table_row'), 5, n_obs, ascending = False)
175
176 rows_html += templates.row_templates_dict[row['type']].render(values=formatted_values, row_classes=row_classes)
~\AppData\Local\Continuum\Anaconda2\envs\py36\lib\site-packages\pandas_profiling\report.py in extreme_obs_table(freqtable, table_template, row_template, number_to_print, n, ascending)
123
124 for label, freq in six.iteritems(obs_to_print):
--> 125 freq_rows_html += _format_row(freq, label, max_freq, row_template, n)
126
127 return table_template.render(rows=freq_rows_html)
~\AppData\Local\Continuum\Anaconda2\envs\py36\lib\site-packages\pandas_profiling\report.py in _format_row(freq, label, max_freq, row_template, n, extra_class)
58
59 def _format_row(freq, label, max_freq, row_template, n, extra_class=''):
---> 60 width = int(freq / max_freq * 99) + 1
61 if width > 20:
62 label_in_bar = freq
ValueError: cannot convert float NaN to integer
我希望你们能帮助我。
答案 0 :(得分:0)
我通过以下功能解决了这个问题:
for c in DB:
print(c)
print(DB[c].dtypes)
if DB[c].dtypes != bool and DB[c].dtypes != np.float64 and DB[c].dtypes != np.uint64 and DB[c].dtypes != np.uint64 and DB[c].dtypes != np.uint8 and DB[c].dtypes != np.datetime64 and DB[c].dtypes != np.timedelta64 and DB[c].dtypes != np.dtype('<m8[ns]'):
DB[c] = DB[c].astype("str")
DB[c] = DB[c].astype("category")
elif DB[c].dtypes == bool:
DB[c] = DB[c].astype("int")
这会将除布尔值,日期和几种数字类型以外的所有数据类型转换为字符串,然后转换为类别。它必须先是字符串然后是类别,否则错误仍然存在。