我正在为许多不同的维基百科页面计算一些文章指标,例如文章长度和每个部分的引用。这些指标的类型是int或float。我把它们存储在一个词典中,并没有试图让它们成为熊猫来创建一些直方图和统计数据。当我尝试填充DataFrame时,即使我在所有度量值上调用float()
,df列的类型仍然是对象而不是某些数字类型。当它不是数字类型时,我不能在列表上调用数值运算。如何让pandas将此列识别为数字?
arts = {"Q774":
{"metrics":
{"fr": {"informativeness": 1.3500775193798449, "referencerate": 0.0026265931794695143, "completeness": 202.4, "numheadings": 19, "articlelength": 23224.0},
"en": {"informativeness": 7.602386920360031, "referencerate": 0.003673816096835846, "completeness": 308.8, "numheadings": 36, "articlelength": 47090.0},
"sw": {"informativeness": 0.0650467289719626, "referencerate": 0.0, "completeness": 18.400000000000002, "numheadings": 1, "articlelength": 232.0}} } }
df = pd.DataFrame(columns=['qid','lang','metric','val'])
for qid, attribdict in arts.iteritems():
for attrib, langdict in attribdict.iteritems():
if attrib == 'metrics':
for lang, metrics in langdict.iteritems():
for metric_name, metric_val in metrics.iteritems():
df = df.append({'qid': qid, 'lang':lang, 'metric':metric_name,'val':float(metric_val)}, ignore_index=True)
In [258]: df['val']
Out [258]:
0 1.350078
1 0.002626593
2 202.4
3 19
4 23224
5 7.602387
6 0.003673816
7 308.8
8 36
9 47090
10 0.06504673
11 0
12 18.4
13 1
14 232
Name: val, dtype: object
答案 0 :(得分:2)
你确定可以使用convert_objects
:
>>> df = df.convert_objects(convert_numeric=True)
>>> df[:2]
qid lang metric val
0 Q774 fr informativeness 1.350078
1 Q774 fr referencerate 0.002627
>>> df.dtypes
qid object
lang object
metric object
val float64