python-解析Maven依赖树

时间:2019-07-19 18:02:49

标签: python maven parsing neo4j dependency-tree

我希望能够将Maven依赖关系树作为输入并解析它,以确定groupId,artifactId和每个依赖项的版本(如果有的话,还有其子级(子),以及子级))的groupId,artifactId和版本(以及任何其他子级,依此类推)。 我不确定在为neo4j准备数据之前,通过mvn依赖关系树解析并将信息存储为嵌套字典是否最有意义。

我也不确定解析整个mvn依赖树的最佳方法。下面的代码是我尝试解析,删除前面不必要的信息并为孩子或父母添加标签方面取得的最大进步。

tree= 
[INFO] +- org.antlr:antlr4:jar:4.7.1:compile
[INFO] |  +- org.antlr:antlr4-runtime:jar:4.7.1:compile
[INFO] |  +- org.antlr:antlr-runtime:jar:3.5.2:compile
[INFO] |  \- com.ibm.icu:icu4j:jar:58.2:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] +- brs:dxprog-lang:jar:3.3-SNAPSHOT:compile
[INFO] |  +- brs:libutil:jar:2.51:compile
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] |  |  +- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] |  |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile
    [INFO] |  |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile
.
.
.


fileObj = open("tree", "r")

for line in fileObj.readlines():
    for word in line.split():
        if "[INFO]" in line.split():
            line = line.replace(line.split().__getitem__(0), "")
            print(line)

            if "|" in line.split():
                line = line.replace(line.split().__getitem__(0), "child")
                print(line)

                if "+-" in line.split() and "|" not in line.split():
                    line = line.replace(line.split().__getitem__(0), "")
                    line = line.replace(line.split().__getitem__(0), "parent")
                    print(line, '\n\n')

输出:

 |  |  \- com.google.protobuf:protobuf-java:jar:3.5.1:compile

 child  child  \- com.google.protobuf:protobuf-java:jar:3.5.1:compile

 |  +- com.h2database:h2:jar:1.4.195:compile

 child  +- com.h2database:h2:jar:1.4.195:compile

   parent com.h2database:h2:jar:1.4.195:compile

鉴于我相对不熟悉python的功能,对于以有组织的方式解析和返回数据的最佳方法的任何见解,我将不胜感激。预先谢谢你!

1 个答案:

答案 0 :(得分:1)

我不知道您的编程经验是什么,但这不是一件容易的事。

首先,您可以看到依赖项的实现程度由符号|实现。您可以做的最简单的事情是建立一个堆栈,该堆栈将根的依赖路径存储到孩子,孙子,...:

def build_stack(text):
    stack = []
    for line in text.split("\n"):
        if not line:
            continue

        line = line[7:] # remove [INFO]
        level = line.count("|")
        name = line.split("-", 1)[1].strip() # the part after the -
        stack = stack[:level] + [name] # update the stack: everything up to level-1 and name
        yield stack[:level], name # this is a generator

for bottom_stack, name in build_stack(DATA):
    print (bottom_stack + [name])

输出:

['org.antlr:antlr4:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr4-runtime:jar:4.7.1:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'org.antlr:antlr-runtime:jar:3.5.2:compile']
['org.antlr:antlr4:jar:4.7.1:compile', 'com.ibm.icu:icu4j:jar:58.2:compile']
['commons-io:commons-io:jar:1.3.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'commons-collections:commons-collections:jar:3.2.2:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile']
['brs:dxprog-lang:jar:3.3-SNAPSHOT:compile', 'brs:libutil:jar:2.51:compile', 'org.apache.commons:commons-collections4:jar:4.1:compile', 'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile']

第二,您可以使用此堆栈基于禁忌字典来构建树:

def create_tree(text):
    tree = {}
    for stack, name in build_stack(text):
        temp = tree
        for n in stack: # find or create...
            temp = temp.setdefault(n, {}) # ...the most inner dict
        temp[name] = {}
    return tree

from pprint import pprint
pprint(create_tree(DATA))

输出:

{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
                                                                               'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
                                                                                                                                           'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
 'commons-io:commons-io:jar:1.3.2:compile': {},
 'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
                                        'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
                                        'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}
{'brs:dxprog-lang:jar:3.3-SNAPSHOT:compile': {'brs:libutil:jar:2.51:compile': {'commons-collections:commons-collections:jar:3.2.2:compile': {},
                                                                               'org.apache.commons:commons-collections4:jar:4.1:compile': {'com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile': {},
                                                                                                                                           'com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile': {}}}},
 'commons-io:commons-io:jar:1.3.2:compile': {},
 'org.antlr:antlr4:jar:4.7.1:compile': {'com.ibm.icu:icu4j:jar:58.2:compile': {},
                                        'org.antlr:antlr-runtime:jar:3.5.2:compile': {},
                                        'org.antlr:antlr4-runtime:jar:4.7.1:compile': {}}}

一个空的字典在树上实现了叶子。

第三,您需要格式化树,即1.提取数据并将2.子项分组在列表中。这是一个简单的树遍历(此处为DFS):

def format(tree):
    L = []
    for name, subtree in tree.items():
        group, artifact, packaging, version, scope = name.split(":")
        d = {"artifact":artifact} # you can add group, ...
        if subtree: # children are present
            d["children"] = format(subtree)
        L.append(d)
    return L

pprint(format(create_tree(DATA)))

输出:

[{'artifact': 'antlr4',
  'children': [{'artifact': 'antlr4-runtime'},
               {'artifact': 'antlr-runtime'},
               {'artifact': 'icu4j'}]},
 {'artifact': 'commons-io'},
 {'artifact': 'dxprog-lang',
  'children': [{'artifact': 'libutil',
                'children': [{'artifact': 'commons-collections'},
                             {'artifact': 'commons-collections4',
                              'children': [{'artifact': 'jackson-annotations'},
                                           {'artifact': 'jackson-core'}]}]}]}]

您可以将步骤分组。