熊猫:重新编制索引总是得到相同的结果

时间:2018-08-13 19:06:01

标签: pandas tensorflow-datasets

代码是:

from IPython import display
import numpy as np
import pandas as pd

california_housing_dataframe = pd.read_csv("https://dl.google.com/mlcc/mledu-datasets/california_housing_train.csv", sep=",")

california_housing_dataframe = california_housing_dataframe.reindex(
    np.random.permutation(california_housing_dataframe.index))


training_examples = california_housing_dataframe.head(12000)
validation_examples = california_housing_dataframe.tail(5000)

print("Training examples summary:")
display.display(training_examples.describe())
print("Validation examples summary:")
display.display(validation_examples.describe())

结果是:

Training examples summary:
          longitude         ...          median_house_value
count  12000.000000         ...                12000.000000
mean    -118.470274         ...               198037.593083
std        1.243589         ...               111857.499335
min     -121.390000         ...                14999.000000
25%     -118.940000         ...               117100.000000
50%     -118.210000         ...               170500.000000
75%     -117.790000         ...               244400.000000
max     -114.310000         ...               500001.000000

[8 rows x 9 columns]
Validation examples summary:
         longitude         ...          median_house_value
count  5000.000000         ...                 5000.000000
mean   -122.182510         ...               229532.878600
std       0.480337         ...               122520.063454
min    -124.350000         ...                14999.000000
25%    -122.400000         ...               130400.000000
50%    -122.140000         ...               213000.000000
75%    -121.910000         ...               303150.000000
max    -121.390000         ...               500001.000000

让我感到困惑的是,我每次都得到相同的结果,但是在https://colab.research.google.com/notebooks/mlcc/feature_sets.ipynb

处却得到了不同的结果。

我的代码或环境有问题吗?

2 个答案:

答案 0 :(得分:0)

每次运行时,您可能会获得相同的随机种子。尝试在脚本开始时将numpy随机种子设置为其他值:

using System;
using System.Data;
using System.Data.SqlClient;

class ExecuteScalar
{
  public static void Main()
  {
    SqlConnection mySqlConnection =new SqlConnection("server=(local)\\SQLEXPRESS;database=MyDatabase;Integrated Security=SSPI;");
    SqlCommand mySqlCommand = mySqlConnection.CreateCommand();
    mySqlCommand.CommandText ="SELECT COUNT(*) FROM Employee";
    mySqlConnection.Open();

    int returnValue = (int) mySqlCommand.ExecuteScalar();
    Console.WriteLine("mySqlCommand.ExecuteScalar() = " + returnValue);

    mySqlConnection.Close();
  }
}

尝试更改种子值,并查看其是否更改了随机化。如果真是这样,那么以下“ hack”应该可以为您每次运行提供随机输出:

np.random.seed(42)

答案 1 :(得分:0)

当我将随机代码更改为:

+-----------------------------+
| ID  player1  player2  Team  |
+-----------------------------+
| 1  John Doe  Anna Doe Team1 |
+-----------------------------+

但我不知道为什么。