解析python中的元组列表并消除双打

时间:2016-03-23 19:03:41

标签: python list python-3.x

我有以下问题:

我有一个表示包及其版本的元组列表(某些包没有指定的版本,因此没有问题),如下所示:

    ('lib32c-dev', '', '', '')
    ('libc6-i386', '2.4', '', '')
    ('lib32c-dev', '', '', '')
    ('libc6-i386', '1.06', '', '')
    ('libc6-i386', '2.4', '', '')
    ('lib32c-dev', '', '', '')
    ('libc6-i386', '2.16', '', '')
    ('libc6-dev', '', '', '')
    ('', '', 'libc-dev', '')
    ('libc6-dev', '', '', '')
    ('', '', 'libc-dev', '')
    ('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
    ('libc6-dev-x32', '', '', '')
    ('libc6-x32', '2.16', '', '')
    ('libncursesw5-dev', '5.9+20150516-2ubuntu1', '', '')
    ('libc6-dev-x32', '', '', '')
    ('libc6-x32', '2.16', '', '')
    ('libc6-dev-x32', '', '', '')
    ('libc6-x32', '2.16', '', '')
    ('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
    ('libncursesw5-dev', '5.9+20150516-2ubuntu1', '', '')

正如您所看到的,有些软件包不止一次列在元组中,但版本不同。

我需要解析元组列表,以便在将列表转换为字典之前为每个包提供最新版本。

PS:包名称及其版本的位置不固定。但是我们可以说版本总是在包名后面,所以我们可以说版本总是在第1和第3位吗?

感谢您的帮助。

6 个答案:

答案 0 :(得分:0)

你实际上应该先把它变成字典

data = {}
for value in my_list:
    data2 = iter(value)
    #find the first non-empty entry in our subtuple, that is our package name
    name = next(d for d in data2 if d)
    version = next(data2,'') # the version is whatever immediatly follows the package name
    data.setdefault(name,[]).append(version)

这将使你获得90%的方式,虽然这取决于包名称是第一个元素......显然并不总是这样......

这里有一种方法可以从字符串中获取版本号

def float_version_from_string(version_string):
    try:
       return float(re.findall("\d.?\d*",version_string)[0])
    except (IndexError,ValueError):
       return -1    

答案 1 :(得分:0)

棘手的部分是找到一个比较函数,可以可靠地确定哪个版本更新。例如,我们要将2.16视为比2.4更新,但是天真的字符串比较是不够的。更重要的是,浮点数比较不仅不足,当版本无法转换为浮点数时,它会引发ValueError

期望的排序类型可以称为“自然排序”或“人类排序”,this question中有一些解决方案。

可用于比较两个值(而不是排序列表)的实现可能类似于:

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def getchunks(s):
    return [tryint(c) for c in re.split('([0-9]+)', s)]

def compare_strings(s1, s2):
    return getchunks(s1) > getchunks(s2)

# 2.4 < 2.16
# 2.4 < 2.4.1
# a_1 < a_2
# and so on...

这可以在一个相对简单的算法中使用,使用defaultdict来跟踪已经看到的库。这假定元组列表包含在lib_tuples

from collections import defaultdict

lib_ver_dict = defaultdict(str)

for lib_tuple in lib_tuples:
    generator = (string for string in lib_tuple if string)
    lib, ver = next(generator), next(generator, '')

    if compare_strings(ver, lib_ver_dict[lib]):
        lib_ver_dict[lib] = ver

最终结果是:

'lib32c-dev': ''
'libc6-x32': '2.16'
'libc6-i386': '2.16'
'libncurses5-dev': '5.9+20150516-2ubuntu1'
'libc6-dev': ''
'libc-dev': ''
'libncursesw5-dev': '5.9+20150516-2ubuntu1'
'libc6-dev-x32': ''

请注意,compare_strings不符合小数排序(例如2.001 == 2.1);实现该细节会使代码更加混乱(并且可能无关紧要)。此外,如果您不想进行区分大小写的比较,则可以更新tryint函数以在最后一行中使用s.lower()

编辑:您的解决方案应该可行,但我通常建议您在迭代时不要更改字典。此外,压缩keysvalues似乎是可靠的,但更容易调用items。最后,行del list_versions[:]是荒谬的;它会创建一个全新的列表来删除它。您可以用更简洁的方式重写您的函数:

from functools import cmp_to_key

def compare_packages(package_dictionary):
    new_dictionary = {}
    for package, versions in package_dictionary.items():
        version = max(versions, key=cmp_to_key(apt_pkg.version_compare))
        new_dictionary[package] = version or 'Not Specified'
    return new_dictionary

答案 2 :(得分:-1)

这只是一个动态编写的虚拟实现。它没有经过测试,只有当元组的第一个元素是包名,第二个元素是它的版本时,它才能工作。这可能无法为您提供确切的解决方案,但它应该可以帮助您解决问题。

my_list_of_tuples = [...]  # some list
my_new_list = []
for tuple in my_list_of_tuples:
    version = float(tuple[1])
    package_name = tuple[0]
    for tuple in my_new_list:
        if tuple[0] == package_name and float(tuple[1]) > version:
            my_new_list.append(tuple)

答案 3 :(得分:-1)

你可以迭代列表,并将包放在dict中,当且仅当它的新版本不存在时才会出现:

def version_as_list(s):
    """Converts string symoblizing version to list of integers
    for comparsion purposes."""
    return [int(i) for i in s.split('.')]

data = {}
for name, version, _, _:
    if vesion_as_list(data.get(name, '')) < version_as_list(version):
        data[name] = version

答案 4 :(得分:-1)

使用大量Python内置/库代码。似乎很长的解决方案,但实际上并非如此 - 这是因为我介入的文档。代码只有7行。

import re, itertools

pkgs = [('libc', '', '', ''), ... ]  # your list of tuples

# a function to extract a version number from a string
rxVSN = re.compile('^(?P<vsn>\d+(\.\d+)?)')
def version(s):
    mo = rxVSN.match(s)
    return float(mo.group('vsn')) if mo is not None else 0.0

# step one: sort the list of tuples by package name and reverse version
# uses built-in sorted() function
#     https://docs.python.org/2/library/functions.html#sorted
pkgs = sorted( pkgs, key = lambda tup: (tup[0], -version(tup[1])) )

# Now we can use the itertools.groupby() function to group the 
# tuples by package name. Then we take the first element of each
# group because that is the one with the highest version number
# (because that's how we sorted them ...)
#    https://docs.python.org/2/library/itertools.html#itertools.groupby
for (pkg, versions) in itertools.groupby( pkgs, key=lambda tup: tup[0]):
    print pkg,": ", next(versions)

输出:

 :  ('', '', 'libc-dev', '')
lib32c-dev :  ('lib32c-dev', '', '', '')
libc6-dev :  ('libc6-dev', '', '', '')
libc6-dev-x32 :  ('libc6-dev-x32', '', '', '')
libc6-i386 :  ('libc6-i386', '2.4', '', '')
libc6-x32 :  ('libc6-x32', '2.16', '', '')
libncurses5-dev :  ('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
libncursesw5-dev :  ('libncursesw5-dev', '5.9+20150516-2ubuntu1', '', '')

答案 5 :(得分:-3)

我找到了理想的解决方案。我用过:

    apt_pkg.version_compare(a,b).

谢谢大家。

功能:

    def comparePackages(package_dictionary):
     #loop in keys and values of package_dictionary
        for package_name, list_versions in zip(package_dictionary.keys(), package_dictionary.values()) :
            #loop on each sublist
            for position in xrange(len(list_versions)) :
                a = str(list_versions[position])
                b = str(list_versions[position-1])
                #the only way it worked was by using a and b
                vc = apt_pkg.version_compare(a,b)
                if vc > 0:
                    #a>b
                    max_version = a
                elif vc == 0:
                    #a==b
                    max_version = a         
                elif vc < 0:
                    #a<b
                    max_version = b

            del list_versions[:]
            if(max_version is '') :
                max_version = 'Not Specified'

            package_dictionary[package_name] = max_version

输出:

    lib32c-dev : Not Specified
    libc6-x32 : 2.16
    libc6-i386 : 2.16
    libncurses5-dev : 5.9+20150516-2ubuntu1
    libc6-dev : Not Specified
    libc-dev : Not Specified
    libncursesw5-dev : 5.9+20150516-2ubuntu1
    libc6-dev-x32 : Not Specified