是否有一天(20140319.export.CSV.zip)数据从GDELT事件文件中丢失?

时间:2015-02-11 03:19:10

标签: r statistics data-analysis

我使用R和{GDELTtools}包来处理GDELT数据。

使用GetAllOfGDELT()或通过网络浏览器下载GDELT数据库时,似乎缺少一个文件(20140319.export.CSV.zip)。这会导致GetAllOfGDELT()失败,并为后续数据分析带来问题。

问题: 这是暂时的问题吗? 还有其他人遇到过同样的问题吗?

以下是相关代码和输出:

> # Download the entire GDELT database
> GetAllOfGDELT(local.folder = "./Data",
+               data.url.root = "http://data.gdeltproject.org/events/", 
+               force = FALSE)
The compressed GDELT data set is currently 12.3GB. It will take a long time to download and
requires a lot of room (12.3GB) where you store it. Please verify that you have sufficient free
space on the drive where you intend to store it.
Are you ready to proceed? (y/n) y
Downloading or verifying 1979.zip succeeded.
Downloading or verifying 1980.zip succeeded.
...
Downloading or verifying 20140317.export.CSV.zip succeeded.
Downloading or verifying 20140318.export.CSV.zip succeeded.
trying URL 'http://data.gdeltproject.org/events/20140319.export.CSV.zip'
Error in download.file(url = paste(data.url.root, f, sep = ""), destfile = paste(local.folder,  : 
  cannot open URL 'http://data.gdeltproject.org/events/20140319.export.CSV.zip'
In addition: Warning message:
In download.file(url = paste(data.url.root, f, sep = ""), destfile = paste(local.folder,  :
  cannot open: HTTP status was '404 Not Found'
>

以下是在线“所有GDELT事件档案”目录列表的显示方式:

20140321.export.CSV.zip (9.9MB) (MD5: d492ca38db3c8f40b657b0eb2415f950)
20140320.export.CSV.zip (10.6MB) (MD5: 8602497fdc0f54861c056d33fb64f3b8)
20140318.export.CSV.zip (10.7MB) (MD5: cf0c2a30b09cdbc28204eb0eca53db1e)
20140317.export.CSV.zip (9.8MB) (MD5: 61e70e4ff79e590abddd6f26f8dfa552)

来源:http://data.gdeltproject.org/events/index.html

下面提供了一个部分解决方法,但它只解决了下载剩余的2014/03/19年后事件文件的问题。

# Download the entire post-20140319 GDELT database
GetGDELT(start.date = "2014/03/20", 
         end.date = "2015/01/01", 
         local.folder = "./Data", 
         data.url.root = "http://data.gdeltproject.org/events/",
         verbose = TRUE)

注意:Google上有“20140319.export.CSV.zip”的结果为0,但其他文件会显示有用的结果。

0 个答案:

没有答案