如何在纯python中找到另一列上具有一个列分组的最大长度的行

时间:2018-03-28 17:46:12

标签: python-3.x

我是python的新手。我需要一些类似sql的功能,最好用纯python而不是panda。需要在第二列上分组并获得第一列最大长度的列。 要求略有变化。我的要求是获得长度小于最大长度的标签。 步骤1:获取第二列中的最大计数/。 第2步:返回第二列中的/的计数小于步骤1中的/的计数的标记 我的名单有:

 ['MYDATA_FILE_XT', '/MYDATA/FILE/XT/ROW/STATUS', 'string']
['MYDATA_FILE_XT_ROW', '/MYDATA/FILE/XT/ROW/STATUS', 'string']
['MYDATA_FILE_XT_ROW_STATUS', '/MYDATA/FILE/XT/ROW/STATUS', 'string']
['XX', '/MYDATA/FILE/XV/ROW/CURRENCY_CODE', 'string']
['MYDATA', '/MYDATA/FILE/XV/ROW/CURRENCY_CODE', 'string']
['MYDATA_FILE', '/MYDATA/FILE/XV/ROW/CURRENCY_CODE', 'string']
['MYDATA_FILE_XV', '/MYDATA/FILE/XV/ROW/CURRENCY_CODE', 'string']
['MYDATA_FILE_XV_ROW', '/MYDATA/FILE/XV/ROW/CURRENCY_CODE', 'string']
['MYDATA_FILE_XV_ROW_CURRENCY_CODE', '/MYDATA/FILE/XV/ROW/CURRENCY_CODE', 'string']
['YY', '/MYDATA/FILE/XV/ROW/EXCESS_AMOUNT', 'string']
['MYDATA', '/MYDATA/FILE/XV/ROW/EXCESS_AMOUNT', 'string']
['MYDATA_FILE', '/MYDATA/FILE/XV/ROW/EXCESS_AMOUNT', 'string']
['MYDATA_FILE_XV', '/MYDATA/FILE/XV/ROW/EXCESS_AMOUNT', 'string']
['MYDATA_FILE_XV_ROW', '/MYDATA/FILE/XV/ROW/EXCESS_AMOUNT', 'string']
['MYDATA_FILE_XV_ROW_EXCESS_AMOUNT', '/MYDATA/FILE/XV/ROW/EXCESS_AMOUNT', 'string']
['LM', '/MYDATA/FILE/XV/ROW/USD_EQUIVALENT', 'string']
['MYDATA', '/MYDATA/FILE/XV/ROW/USD_EQUIVALENT', 'string']
['MYDATA_FILE', '/MYDATA/FILE/XV/ROW/USD_EQUIVALENT', 'string']
['MYDATA_FILE_XV', '/MYDATA/FILE/XV/ROW/USD_EQUIVALENT', 'string']
['MYDATA_FILE_XV_ROW', '/MYDATA/FILE/XV/ROW/USD_EQUIVALENT', 'string']
['MYDATA_FILE_XV_ROW_USD_EQUIVALENT', '/MYDATA/FILE/XV/ROW/USD_EQUIVALENT', 'string']

新期望: [ 'MYDATA_FILE_XT_ROW'] [ 'MYDATA_FILE_XV_ROW']

2 个答案:

答案 0 :(得分:0)

您可以创建一个字典,存储第二列的最大长度路径,然后将其转换为列表

for row in rows:
 try:
    length=len(result[row[1]])
    if(len(row[0])>length):
        result[row[1]]=row[0]
 except:
    result[rows[1]]=row[0]

答案 1 :(得分:0)

假设您的列表位于名为data的变量中,则应使用您的预期结果填充变量cleaned

from functools import reduce
from itertools import groupby
from operator import itemgetter

cleaned = []
for key, values in groupby(data, itemgetter(1)):
    cleaned += [reduce(lambda x, y: x if len(x[0]) > len(y[0]) else y, values)]