请有人解释为什么当我使用pandas创建一个简单的异构数据帧时,当我单独访问每一行时,数据类型会发生变化。
e.g。
scene_df = pd.DataFrame({
'magnitude': np.random.uniform(0.1, 0.3, (10,)),
'x-center': np.random.uniform(-1, 1, (10,)),
'y-center': np.random.uniform(-1, 1, (10,)),
'label': np.random.randint(2, size=(10,), dtype='u1')})
scene_df.dtypes
打印:
label uint8
magnitude float64
x-center float64
y-center float64
dtype: object
但是当我迭代行时:
[r['label'].dtype for i, r in scene_df.iterrows()]
我为标签获取了float64
[dtype('float64'),
dtype('float64'),
dtype('float64'),
dtype('float64'),
dtype('float64'),
...
编辑:
回答我打算用这个做的事情:
def square(mag, x, y):
wh = np.array([mag, mag])
pos = np.array((x, y)) - wh/2
return plt.Rectangle(pos, *wh)
def circle(mag, x, y):
return plt.Circle((x, y), mag)
shape_fn_lookup = [square, circle]
最终成为这段丑陋的代码:
[shape_fn_lookup[int(s['label'])](
*s[['magnitude', 'x-center', 'y-center']])
for i, s in scene_df.iterrows()]
这给出了我可能会绘制的一堆圆圈和正方形:
[<matplotlib.patches.Circle at 0x7fcf3ea00d30>,
<matplotlib.patches.Circle at 0x7fcf3ea00f60>,
<matplotlib.patches.Rectangle at 0x7fcf3eb4da90>,
<matplotlib.patches.Circle at 0x7fcf3eb4d908>,
...
]
甚至DataFrame.to_dict('records')
执行此数据类型转换:
type(scene_df.to_dict('records')[0]['label'])
答案 0 :(得分:1)
因为iterrows()
返回一个系列,其索引由每行的列名组成。
Pandas.Series只有一个dtype,因此会被下调到float64
:
In [163]: first_row = list(scene_df.iterrows())[0][1]
In [164]: first_row
Out[164]:
label 0.000000
magnitude 0.293681
x-center -0.628142
y-center -0.218315
Name: 0, dtype: float64 # <--------- NOTE
In [165]: type(first_row)
Out[165]: pandas.core.series.Series
In [158]: [(type(r), r.dtype) for i, r in scene_df.iterrows()]
Out[158]:
[(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64'))]
答案 1 :(得分:1)
我建议使用itertuples而不是interrow,因为iterrows为每一行返回一个Series,它不会在行中保留dtypes(dtypes在DataFrames的列中保留)。
[type(r.label) for r in scene_df.itertuples()]
输出:
[numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8]