Question

大熊猫0.19.2。

以下是一个例子：

testdf = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [1.0, 2.0, 3.0, 4.0]})
testdf.dtypes

输出：

A      int64
B    float64
dtype: object

现在一切看起来都很好，但我不喜欢的是（注意，第一个电话是pd.Series.iloc，第二个电话是pd.DataFrame.iloc）

print(type(testdf.A.iloc[0]))
print(type(testdf.iloc[0].A))

输出：

<class 'numpy.int64'>
<class 'numpy.float64'>

我在尝试理解为什么pd.DataFrame.join()操作几乎没有返回两个int64列的交叉点时发现它，而应该有很多。我的猜测是因为类型不一致可能与这种行为有关，但我不确定......我的简短调查揭示了上面的事情，现在我有点困惑。

如果有人知道如何解决它 - 我将非常感谢任何提示！

UPD

感谢@EdChum的评论。所以这是我生成的数据和加入/合并行为的示例

testdf.join(testdf, on='A', rsuffix='3')

    A   B   A3  B3 
0   1   1.0 2.0 2.0
1   2   2.0 3.0 3.0
2   3   3.0 4.0 4.0
3   4   4.0 NaN NaN

而且被认为是完全相同的 pd.merge(left=testdf, right=testdf, on='A') 返回

    A   B_x B_y
0   1   1.0 1.0
1   2   2.0 2.0
2   3   3.0 3.0
3   4   4.0 4.0

UPD2 在join和merge行为上复制@EdChum评论。问题是A.join(B, on='C')将使用A中的索引并将其与列B['C']连接，因为默认情况下，连接使用索引。在我的情况下，我只是使用merge来获得可取的结果。

Answer 1

这是预期的。 pandas每列跟踪dtypes。当你打电话给testdf.iloc[0]时，你要问大熊猫一排。它必须将整行转换为一系列。那行包含一个浮点数。因此，作为一个系列的行必须是浮动的。

但是，当pandas使用loc或iloc时，它会在您使用单个__getitem__时进行此转换

以下是testdf一个int列

的一些有趣的测试用例

testdf = pd.DataFrame({'A': [1, 2, 3, 4]})

print(type(testdf.iloc[0].A))
print(type(testdf.A.iloc[0]))

<class 'numpy.int64'>
<class 'numpy.int64'>

将其更改为OP测试用例

testdf = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [1.0, 2.0, 3.0, 4.0]})

print(type(testdf.iloc[0].A))
print(type(testdf.A.iloc[0]))

<class 'numpy.float64'>
<class 'numpy.int64'>

print(type(testdf.loc[0, 'A']))
print(type(testdf.iloc[0, 0]))
print(type(testdf.at[0, 'A']))
print(type(testdf.iat[0, 0]))
print(type(testdf.get_value(0, 'A')))

<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>

因此，当pandas使用loc或iloc时，它会在行之间进行一些转换，而我仍然无法完全理解。我确定它与loc和iloc的性质与at，iat，get_value不同的事实有关。 iloc和loc允许您使用索引数组和布尔数组访问数据框。虽然at，iat和get_value一次只能访问一个单元格。

尽管如此

testdf.loc[0, 'A'] = 10

print(type(testdf.at[0, 'A']))

当我们通过loc分配到该位置时，pandas可确保dtype保持一致。

Pandas DataFrame iloc破坏了数据类型

1 个答案: