我正在尝试访问用于在this页面上复制Redshift基准的数据。如果您向下滚动到自己运行此基准部分,作者说可以在以下S3存储桶中访问数据,将[]
中的项目替换为我们感兴趣的格式和数据大小:
s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]
基于以上所述,我尝试使用链接以这种方式下载数据:
http://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/
但它不起作用。有人可以提供有关如何获取这些数据集的指导吗?
答案 0 :(得分:2)
如果我删除" n"从print("Please enter a number")
Numbered_Being_Entered_Input = input()
Numbered_Being_Entered = int(Numbered_Being_Entered_Input)
Accumulated_Number = 0
while Numbered_Being_Entered > 0:
print("The number you have entered is ", Numbered_Being_Entered)
Accumulated_Number = Numbered_Being_Entered + Accumulated_Number
print("The accumulated sum of all the numbers you have entered is ", Accumulated_Number)
Numbered_Being_Entered_Input = input()
Numbered_Being_Entered = int(Numbered_Being_Entered_Input)
if Numbered_Being_Entered < 0:
print("you have chosen the number 0 or a negative number")
print("Please enter a number higher than 0")
print("The accumulated sum of all the numbers you have entered before this error is ", Accumulated_Number)
Numbered_Being_Entered_Input = input()
Numbered_Being_Entered = int(Numbered_Being_Entered_Input)
我可以列出你的目录:
s3n://
从那里我可以得到个别路径,例如
$ aws s3 ls s3://big-data-benchmark/pavlo/text/tiny/
PRE crawl/
PRE rankings/
PRE uservisits/
2013-05-03 10:13:42 0 crawl_$folder$
2013-05-09 07:23:17 0 rankings_$folder$
2013-05-09 07:22:36 0 uservisits_$folder$
其https网址为:
https://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/crawl/part-00000
祝你好运!