请浏览档案数据USA GOV Sample Data
现在我想在R中读取此文件,然后得到下面提到的错误
result = fromJSON(textFileName)
Error in fromJSON(textFileName) : unexpected character 'u'
当我想用Python阅读它然后得到下面提到的错误
import json
records = [json.loads(line) for line in open(path)]
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4088: character maps to <undefined>
&#13;
有人可以帮助我,我怎么能读到这种文件。
答案 0 :(得分:0)
我无法在我的系统上获得问题OP(windows / Rstudio / Jupyter)。我四处寻找R并找到this,使其适应这种情况:
library(jsonlite)
out <- lapply(readLines("usagov_bitly_data2013-05-17-1368817803"), fromJSON)
df<-data.frame(Reduce(rbind, out))
虽然我在R中遇到的错误与你的错误有很大的不同。
result = fromJSON("usagov_bitly_data2013-05-17-1368817803")
#Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
# [ 34.730400, -86.586098 ] } { "a": "Mozilla\/5.0 (Windows N
# (right here) ------^
对于Python,正如juanpa所提到的,它似乎是编码的问题。以下代码适用于我。
import json
import os
path=os.path.abspath("usagov_bitly_data2013-05-17-1368817803")
print(path)
file = open(path, encoding="utf8")
records = [json.loads(line) for line in file]
答案 1 :(得分:0)
R中的解决方案:
library(jsonlite)
# if you have a local file
conn <- gzcon(file("usagov_bitly_data2013-05-17-1368817803.gz", "rb"))
# if you read it from URL
conn <- gzcon(url("http://1usagov.measuredvoice.com/bitly_archive/usagov_bitly_data2013-05-17-1368817803.gz"))
data <- stream_in(conn)