我有一个巨大的json.gz文件,并且该文件已经转换为.json文件。我想问一下我们如何使用R读取.json文件中的前100条记录。非常感谢您的帮助。以下是示例代码:
library(jsonlite)
library(R.utils)
r=stream_in(file("yelp_academic_dataset_business.json"))
可以从链接中找到文件“ yelp_academic_dataset_business.json”: https://www.dropbox.com/s/gd1k41y9gbpfwq3/yelp_academic_dataset_business.json
答案 0 :(得分:1)
使用来自原始链接的数据,@ Shree的建议很明确。首先,使用readLines
下载所需数量的行:
dat <- readLines("https://uc385e5985dd32823a7dc6ba9b5e.dl.dropboxusercontent.com/cd/0/get/AhyCjVEm8yKnLz4w0-hZaW-titb8fOhQdMcwhTMF1_3i_iJ7DOqOU_KQRTtcvaFBaSTpAznh_6eq-vKAEiDkeVygMnRjThrnz0V5fyC4AURAcg/file?_download_id=9916801659220323334123287637995650900165723151388885263767035946&_notify_domain=www.dropbox.com&dl=1", n =4 )
# dat <- readLines("yelp_academic_dataset_business.json", n = 4)
现在创建一个“伪文本连接”,并将其传递给json解析器:
jsonlite::stream_in(textConnection(dat))
# Imported 4 records. Simplifying...
# business_id full_address hours.Tuesday.close hours.Tuesday.open hours.Friday.close hours.Friday.open hours.Monday.close hours.Monday.open
# 1 vcNAWiLM4dR7D2nwwJ7nCA 4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018 17:00 08:00 17:00 08:00 17:00 08:00
# 2 UsFtqoBl7naz8AVUBZMjQQ 202 McClure St\nDravosburg, PA 15034 <NA> <NA> <NA> <NA> <NA> <NA>
# 3 cE27W9VPgO88Qxe4ol6y_g 1530 Hamilton Rd\nBethel Park, PA 15234 <NA> <NA> <NA> <NA> <NA> <NA>
# 4 HZdLhv6COCleJMo7nPl-RA 301 S Hills Vlg\nPittsburgh, PA 15241 21:00 10:00 21:00 10:00 21:00 10:00
# hours.Wednesday.close hours.Wednesday.open hours.Thursday.close hours.Thursday.open hours.Sunday.close hours.Sunday.open hours.Saturday.close hours.Saturday.open open
# 1 17:00 08:00 17:00 08:00 <NA> <NA> <NA> <NA> TRUE
# 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> TRUE
# 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> FALSE
# 4 21:00 10:00 21:00 10:00 18:00 11:00 21:00 10:00 TRUE
# categories city review_count name neighborhoods longitude state stars
# 1 Doctors, Health & Medical Phoenix 9 Eric Goldberg, MD NULL -111.98376 AZ 3.5
# 2 Nightlife Dravosburg 4 Clancy's Pub NULL -79.88693 PA 3.5
# 3 Active Life, Mini Golf, Golf Bethel Park 5 Cool Springs Golf Center NULL -80.01591 PA 2.5
# 4 Shopping, Home Services, Internet Service Providers, Mobile Phones, Professional Services, Electronics Pittsburgh 3 Verizon Wireless NULL -80.05998 PA 3.5
# latitude attributes.By Appointment Only attributes.Happy Hour attributes.Accepts Credit Cards attributes.Good For Groups attributes.Outdoor Seating attributes.Price Range
# 1 33.49931 TRUE NA NA NA NA NA
# 2 40.35052 NA TRUE TRUE TRUE FALSE 1
# 3 40.35690 NA NA NA NA NA NA
# 4 40.35762 NA NA NA NA NA NA
# attributes.Good for Kids type
# 1 NA business
# 2 NA business
# 3 TRUE business
# 4 NA business