从S3存储桶链接访问公共可用数据

时间:2015-09-29 00:23:05

标签: amazon-web-services amazon-s3

我正在尝试访问用于在this页面上复制Redshift基准的数据。如果您向下滚动到自己运行此基准部分,作者说可以在以下S3存储桶中访问数据,将[]中的项目替换为我们感兴趣的格式和数据大小:

s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]

基于以上所述,我尝试使用链接以这种方式下载数据:

http://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/

但它不起作用。有人可以提供有关如何获取这些数据集的指导吗?

1 个答案:

答案 0 :(得分:2)

如果我删除&#34; n&#34;从print("Please enter a number") Numbered_Being_Entered_Input = input() Numbered_Being_Entered = int(Numbered_Being_Entered_Input) Accumulated_Number = 0 while Numbered_Being_Entered > 0: print("The number you have entered is ", Numbered_Being_Entered) Accumulated_Number = Numbered_Being_Entered + Accumulated_Number print("The accumulated sum of all the numbers you have entered is ", Accumulated_Number) Numbered_Being_Entered_Input = input() Numbered_Being_Entered = int(Numbered_Being_Entered_Input) if Numbered_Being_Entered < 0: print("you have chosen the number 0 or a negative number") print("Please enter a number higher than 0") print("The accumulated sum of all the numbers you have entered before this error is ", Accumulated_Number) Numbered_Being_Entered_Input = input() Numbered_Being_Entered = int(Numbered_Being_Entered_Input) 我可以列出你的目录:

s3n://

从那里我可以得到个别路径,例如

    $ aws s3 ls s3://big-data-benchmark/pavlo/text/tiny/
    PRE crawl/
    PRE rankings/
    PRE uservisits/
    2013-05-03 10:13:42          0 crawl_$folder$
    2013-05-09 07:23:17          0 rankings_$folder$
    2013-05-09 07:22:36          0 uservisits_$folder$

其https网址为:

https://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/crawl/part-00000

祝你好运!