无法在Python或R中读取usa.gov数据

时间:2017-10-03 14:25:43

标签: python json r

请浏览档案数据USA GOV Sample Data

现在我想在R中读取此文件,然后得到下面提到的错误

result = fromJSON(textFileName)
Error in fromJSON(textFileName) : unexpected character 'u'

当我想用Python阅读它然后得到下面提到的错误

import json 
records = [json.loads(line) for line in open(path)]



---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4088: character maps to <undefined>
&#13;
&#13;
&#13;

有人可以帮助我,我怎么能读到这种文件。

2 个答案:

答案 0 :(得分:0)

我无法在我的系统上获得问题OP(windows / Rstudio / Jupyter)。我四处寻找R并找到this,使其适应这种情况:

library(jsonlite)
out <- lapply(readLines("usagov_bitly_data2013-05-17-1368817803"), fromJSON)
df<-data.frame(Reduce(rbind, out))

虽然我在R中遇到的错误与你的错误有很大的不同。

result = fromJSON("usagov_bitly_data2013-05-17-1368817803")
#Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
#           [ 34.730400, -86.586098 ] } { "a": "Mozilla\/5.0 (Windows N
#                     (right here) ------^

对于Python,正如juanpa所提到的,它似乎是编码的问题。以下代码适用于我。

import json 
import os
path=os.path.abspath("usagov_bitly_data2013-05-17-1368817803")
print(path)
file = open(path, encoding="utf8")
records = [json.loads(line) for line in file]

答案 1 :(得分:0)

R中的解决方案:

library(jsonlite)

# if you have a local file
conn <- gzcon(file("usagov_bitly_data2013-05-17-1368817803.gz", "rb"))
# if you read it from URL
conn <- gzcon(url("http://1usagov.measuredvoice.com/bitly_archive/usagov_bitly_data2013-05-17-1368817803.gz"))

data <- stream_in(conn)