JQ或任何json解析器通过多个大型JSON文件进行连接

时间:2014-06-05 18:41:27

标签: json jq

在此页面 - https://openlibrary.org/developers/dumps - 有版本的JSON数据转储'和作者'压缩时总共约7Gb的数据(未压缩时约为28Gb)。

版本文件的结构如下(每行中的信息各不相同):

/type/edition   /books/OL24712550M  2   2011-08-12T15:48:15.081632  {"subtitle": "finding solace and strength from friends and strangers", "series": ["Thorndike Press large print biography", "Thorndike large print biography series"], "covers": [6783622], "lc_classifications": ["E840.8.E29 E24 2007"], "latest_revision": 2, "ocaid": "savinggracesfind00edwa", "source_records": ["ia:savinggracesfind00edwa"], "title": "Saving graces", "languages": [{"key": "/languages/eng"}], "subjects": ["Cancer", "Family", "Legislators' spouses", "Philosophy", "Patients", "Large type books", "Lawyers' spouses", "Biography", "Protected DAISY"], "subject_people": ["Elizabeth Edwards (1949-)", "John Edwards (1953 June 10-)"], "publish_country": "meu", "by_statement": "Elizabeth Edwards", "oclc_numbers": ["71809986"], "type": {"key": "/type/edition"}, "revision": 2, "publishers": ["Thorndike Press"], "ia_box_id": ["IA133215"], "full_title": "Saving graces finding solace and strength from friends and strangers", "last_modified": {"type": "/type/datetime", "value": "2011-08-12T15:48:15.081632"}, "key": "/books/OL24712550M", "authors": [{"key": "/authors/OL6606949A"}], "publish_places": ["Waterville, Me"], "pagination": "613 p. (large print) ;", "created": {"type": "/type/datetime", "value": "2011-06-29T22:47:47.350358"}, "dewey_decimal_class": ["973.931092", "B"], "number_of_pages": 613, "isbn_13": ["9780786291670"], "lccn": ["2006031151"], "subject_places": ["United States", "North Carolina"], "isbn_10": ["0786291672"], "publish_date": "2007", "copyright_date": "2006", "works": [{"key": "/works/OL15801457W"}]}
/type/edition   /books/OL11119269M  5   2010-04-24T18:14:28.389476  {"number_of_pages": 362, "subtitle": "Godparenthood and Adoption in the Early Middle Ages (The University of Delaware Press Series, the Family in Interdisciplinary Perspective)", "weight": "1.6 pounds", "covers": [2673249], "latest_revision": 5, "edition_name": "Rev Exp edition", "title": "Spiritual Kinship As Social Practice", "languages": [{"key": "/languages/eng"}], "subjects": ["Family & Relationships", "Genealogy, heraldry, names and honours", "c 500 CE to c 1000 CE", "Ancient Rome - History", "Social Institutions", "Sociology", "Ancient Rome", "Sociology - Marriage & Family", "Alternative Family", "Ancient - Rome", "Spirituality - General", "Adoption", "Europe", "History", "Medieval, 500-1500", "Social history", "Sponsors", "To 1500"], "type": {"key": "/type/edition"}, "physical_dimensions": "9.8 x 6.2 x 1 inches", "revision": 5, "publishers": ["University of Delaware Press"], "physical_format": "Hardcover", "last_modified": {"type": "/type/datetime", "value": "2010-04-24T18:14:28.389476"}, "key": "/books/OL11119269M", "authors": [{"key": "/authors/OL797447A"}], "identifiers": {"goodreads": ["2994735"]}, "isbn_13": ["9780874136326"], "isbn_10": ["0874136326"], "publish_date": "June 2000", "works": [{"key": "/works/OL4195029W"}]}
/type/edition   /books/OL25407707M  1   2012-08-08T08:36:18.306844  {"series": ["Then & now"], "lc_classifications": ["F459.E43 C375 2012"], "latest_revision": 1, "source_records": ["marc:marc_loc_updates/v40.i32.records.utf8:13804252:745"], "title": "Elizabethtown", "languages": [{"key": "/languages/eng"}], "subjects": ["Buildings, structures", "Pictorial works", "Historic buildings"], "publish_country": "scu", "by_statement": "Meranda L. Caswell", "type": {"key": "/type/edition"}, "revision": 1, "publishers": ["Arcadia Pub."], "full_title": "Elizabethtown", "last_modified": {"type": "/type/datetime", "value": "2012-08-08T08:36:18.306844"}, "key": "/books/OL25407707M", "authors": [{"key": "/authors/OL1397347A"}], "publish_places": ["Charleston, S.C"], "pagination": "x, 95 p. :", "created": {"type": "/type/datetime", "value": "2012-08-08T08:36:18.306844"}, "lccn": ["2012933881"], "number_of_pages": 95, "isbn_13": ["9780738591667"], "subject_places": ["Elizabethtown (Ky.)", "Elizabethtown", "Kentucky"], "isbn_10": ["0738591661"], "publish_date": "2012", "works": [{"key": "/works/OL16772737W"}]}

作者文件的结构如下:

/type/author    /authors/OL100223A  2   2008-09-08T16:20:28.105165  {"name": "Umu Hilmy", "personal_name": "Umu Hilmy", "last_modified": {"type": "/type/datetime", "value": "2008-09-08T16:20:28.105165"}, "key": "/authors/OL100223A", "type": {"key": "/type/author"}, "revision": 2}
/type/author    /authors/OL6606949A 1   2009-05-14T08:13:43.294872  {"name": "Elizabeth Edwards", "created": {"type": "/type/datetime", "value": "2009-05-14T08:13:43.294872"}, "personal_name": "Elizabeth Edwards", "last_modified": {"type": "/type/datetime", "value": "2009-05-14T08:13:43.294872"}, "latest_revision": 1, "key": "/authors/OL6606949A", "birth_date": "1949", "type": {"key": "/type/author"}, "revision": 1}
/type/author    /authors/OL1003081A 5   2012-06-06T22:11:38.525232  {"name": "William Pinder Eversley", "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "death_date": "1918", "photos": [6897255, 6897254], "last_modified": {"type": "/type/datetime", "value": "2012-06-06T22:11:38.525232"}, "latest_revision": 5, "key": "/authors/OL1003081A", "birth_date": "1850", "personal_name": "William Pinder Eversley", "type": {"key": "/type/author"}, "revision": 5}

我最终想要的是一个制表符分隔的文件,其中只包含以下信息:

  

OL参考标题名称isbn_10 isbn_13 subject subject_places subject_people

例如:

  

/ books / OL24712550M拯救美国人Elizabeth Edwards 0786291672 9780786291670"巨蟹座","家庭","立法者'配偶","哲学","患者","大型书籍","律师'配偶","传记","受保护的DAISY" "美国","北卡罗来纳州" " Elizabeth Edwards(1949 - )"," John Edwards(1953年6月10日 - )"

(在某些情况下,其中一些字段将为空。)

因此,除了名称'来自作者转储的字段,通过版本转储中的引用查找,例如/ authors / OL6606949A。

所以我试图将JQ用于以下查询(仅用于测试几列):

  

.personal_name as $ names | .authors | {title,name,author:$ names [.key]}

但它甚至没有执行,因为我也在找到作者密钥的表示法时遇到问题。

1 个答案:

答案 0 :(得分:1)

由于主题等可以有多个值,您希望它们如何在输出中分开,以免模糊不清?

jq '.personal_name as $names | .authors as $authors| {title, name, author: $names[.key]}'

是您在问题中使用的jq命令的固定版本,但未使用$authors

无论如何,如果你澄清了你的意思,我们肯定可以做到这一点!