Question

我已经查看了该问题的其他答案，但没有一个在帮助我。我正在尝试运行一个简单的随机砍伐森林算法。我有一小部分IP数据集，这些数据集被简化为仅包含数字。我仍然收到此错误。这些数字只有一列。 CSV如下所示：

176162144

176862141

176762141

176761141

176562141

Answer 1

您是否查看过此示例笔记本，并尝试将其与自己的数据一起使用？ https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/random_cut_forest/random_cut_forest.ipynb

简而言之，它会使用Pandas读取CSV文件并像这样训练模型：

rcf = RandomCutForest(role=execution_role,
                      train_instance_count=1,
                      train_instance_type='ml.m4.xlarge',
                      data_location='s3://{}/{}/'.format(bucket, prefix),
                      output_path='s3://{}/{}/output'.format(bucket, prefix),
                      num_samples_per_tree=512,
                      num_trees=50)

# automatically upload the training data to S3 and run the training job
rcf.fit(rcf.record_set(taxi_data.value.as_matrix().reshape(-1,1)))

您没有说出用例是什么，但是当您使用IP地址时，您可能会发现IP Insights内置算法也很有用：https://docs.aws.amazon.com/sagemaker/latest/dg/ip-insights.html

Answer 2

我正在使用之前提到的样本笔记本Julien Simon，但是在某些时候数据最终以字符串形式出现！ RCF算法的有趣之处在于它们必须在整数数据上运行。我要做的是确保将数组强制转换为int数组，以便进行双重检查和检查！有效。我对数据如何最终以字符串格式结束感到困惑，但是，这就是问题所在。简单的解决方案。

ClientError：无法解析csv：行1-1000，文件

2 个答案: