我发现我的标题很不清楚,但我不知道如何更好地重写它,所以随时编辑它!
数据
我有以下(简化)JSON:
[
{
"genes_id": "eco:b0002",
"entry_id": "b0002",
"division": "CDS",
"organism": "Escherichia coli K-12 MG1655",
"organism_code": "eco",
"organism_id": "T00007",
"name": "thrA",
"names": [
"thrA"
],
"definition": "(RefSeq) Bifunctional aspartokinase/homoserine dehydrogenase 1",
"eclinks": [
],
"orthologs": {
"K12524": "bifunctional aspartokinase / homoserine dehydrogenase 1 [EC:2.7.2.4 1.1.1.3]"
},
"pathways": {
"eco00260": "Glycine, serine and threonine metabolism",
"eco00261": "Monobactam biosynthesis",
"eco00270": "Cysteine and methionine metabolism",
"eco00300": "Lysine biosynthesis",
"eco01100": "Metabolic pathways",
"eco01110": "Biosynthesis of secondary metabolites",
"eco01120": "Microbial metabolism in diverse environments",
"eco01130": "Biosynthesis of antibiotics",
"eco01230": "Biosynthesis of amino acids"
},
"modules": {
"eco_M00016": "Lysine biosynthesis, succinyl-DAP pathway, aspartate => lysine",
"eco_M00017": "Methionine biosynthesis, apartate => homoserine => methionine",
"eco_M00018": "Threonine biosynthesis, aspartate => homoserine => threonine"
},
"classes": [
],
"position": "337..2799",
"chromosome": null,
"gbposition": "337..2799",
"motifs": {
"Pfam": [
"Homoserine_dh",
"AA_kinase",
"NAD_binding_3",
"ACT_7",
"ACT",
"Sacchrp_dh_NADP"
]
},
"dblinks": {
"NCBI-GeneID": [
"945803"
],
"NCBI-ProteinID": [
"NP_414543"
],
"Pasteur": [
"thrA"
],
"RegulonDB": [
"ECK120000987"
],
"ECOCYC": [
"EG10998"
],
"ASAP": [
"ABE-0000008"
],
"UniProt": [
"P00561"
]
}
},
{
"genes_id": "eco:b0003",
"entry_id": "b0003",
"division": "CDS",
"organism": "Escherichia coli K-12 MG1655",
"organism_code": "eco",
"organism_id": "T00007",
"name": "thrB",
"names": [
"thrB"
],
"definition": "(RefSeq) homoserine kinase",
"eclinks": [
],
"orthologs": {
"K00872": "homoserine kinase [EC:2.7.1.39]"
},
"pathways": {
"eco00260": "Glycine, serine and threonine metabolism",
"eco01100": "Metabolic pathways",
"eco01110": "Biosynthesis of secondary metabolites",
"eco01120": "Microbial metabolism in diverse environments",
"eco01230": "Biosynthesis of amino acids"
},
"modules": {
"eco_M00018": "Threonine biosynthesis, aspartate => homoserine => threonine"
},
"classes": [
],
"position": "2801..3733",
"chromosome": null,
"gbposition": "2801..3733",
"motifs": {
"Pfam": [
"GHMP_kinases_N",
"GHMP_kinases_C"
]
},
"dblinks": {
"NCBI-GeneID": [
"947498"
],
"NCBI-ProteinID": [
"NP_414544"
],
"Pasteur": [
"thrB"
],
"RegulonDB": [
"ECK120000988"
],
"ECOCYC": [
"EG10999"
],
"ASAP": [
"ABE-0000010"
],
"UniProt": [
"P00547"
]
}
}
]
期望的输出
这是两个对象的数组。我对两个对象的genes_id
和pathways
感兴趣,并希望获得包含以下内容的制表符分隔文件:
eco:b0002 eco00260 Glycine, serine and threonine metabolism
eco:b0002 eco00261 Monobactam biosynthesis
eco:b0002 eco00270 Cysteine and methionine metabolism
eco:b0002 eco00300 Lysine biosynthesis
eco:b0002 eco01100 Metabolic pathways
eco:b0002 eco01110 Biosynthesis of secondary metabolites
eco:b0002 eco01120 Microbial metabolism in diverse environments
eco:b0002 eco01130 Biosynthesis of antibiotics
eco:b0002 eco01230 Biosynthesis of amino acids
eco:b0003 eco00260 Glycine, serine and threonine metabolism
eco:b0003 eco01100 Metabolic pathways
eco:b0003 eco01110 Biosynthesis of secondary metabolites
eco:b0003 eco01120 Microbial metabolism in diverse environments
eco:b0003 eco01230 Biosynthesis of amino acids
我发现了什么 我知道可以采用以下格式提取数据:
eco:b0002: list of pathways and ids
eco:b0003: list of pathways and ids
但是我想传播路径到各行,如上例所示。我找不到有关如何使用jq执行此操作的任何信息,因此怀疑这是否实际可行。因此,如果可能的话,如何使用Jq实现这一目标?
答案 0 :(得分:1)
调用:jq -rf totsv.jq input.json
程序(totsv.jq):
.[]
| .genes_id as $id
| .pathways
| to_entries[]
| [$id, .key, .value]
| @tsv
TSV是一个不错的选择(和jq一样)!