Question

我是Kaggle和Python的新手，无法弄清楚如何转换这个数据集。对于任何熟悉的人，我都试图为泰坦尼克号教程重现基于性别的解决方案。

我有：


   PassengerId  Survived
0          892  0.184130
1          893  0.761143
2          894  0.184130
3          895  0.184130
4          896  0.761143

这给了我：

PassengerId Survived 0 892 0 1 893 1 2 894 0 3 895 0 4 896 1

我需要转换为：

for x in np.nditer(final_prediction, op_flags=['readwrite']): x[...]=(1 if x[...] >= 0.50 else 0)

同样，我并不真正了解Python，我尝试了一些解决方案，如：


   PassengerId  Survived
0          892  0.
1          893  1.

这给了我浮点数:(并且仍然在CSV文件中显示为0.0,1.0）

rounded_prediction = np.rint(final_prediction)

和

int_prediction = final_prediction.astype(int)

给我相同的（即0,1。）

以下内容：

            Invoke-Command -Session $session -ScriptBlock {
                param($srv,$login,$path,$...)

                #Make a PSDrive, since directly copying from UNC-path doesn't work due to credential-issues
                New-PSDrive -Name N -PSProvider FileSystem -root $path -Credential $login | out-null
            } -Args $using:server,$using:creds,$using:basePath,$using:...

给我全部0的

有什么想法吗？谢谢！

Answer 1

首先，请务必记住，您希望尽可能多地使用矢量化操作，因为这样可以加快代码速度！永远很重要。因此，大熊猫没有循环，而是有一种惊人的方式。

submission['Survived'] = submission['Survived'].astype(int)

请注意，这会截断值，所以在您的情况下，您可能想说：

submission['Survived][:] += 0.5在执行上述操作之前，当您转换为int时，将确保0.5的值为1，并将其下面的值设置为截断为0。

因此，使用函数pd.astype()

来更改dtype（可以使用df.dtypes找到列的类型）

可能是另一种表达它应该向上/向下舍入的方式，但是通过这种简单的数据操作它应该可以工作;）

Answer 2

您需要应用round，然后将结果转换为'int'以删除小数点。这应该有效：np.rint（final_prediction）.astype（np.int）

无法在python DataFrame / Array中将float转换为int

2 个答案: