比较字典中的值,并根据值对每个值进行处理

时间:2018-07-24 17:53:28

标签: python dictionary

我有一个带有一列列的CSV。我只想要三列。 我将其导入到我的python脚本中,并将三列变成了三个列表

然后将每个列表添加到字典中。列表1是键,其他列表是两个值。 (也许有更好的方法吗?)

key is a transaction id
value1 is a filename
value2 is a date

最后这是什么?

  1. 遍历字典并找到所有重复的文件名(会有多组重复项)
  2. 对于每组重复的文件名,找到一个具有最新(最近)日期值的id(密钥)(如果时间和日期相同,则为最高id(密钥))
  3. 打印最新日期的密钥(我只需要输入id)
  4. 为每个其他重复项打印“这是一个重复项” +(键)(同样,只需输入每个ID)

我想对所有键重复该操作,直到我本质上只获得列表中最新项目的ID(键)。文件名x可以有5个重复的文件名,文件名y可以有100个重复项,文件名t可以有30个重复项,等等。

我正在使用API​​实际移动数据,这就是为什么我需要获取最新的ID并将此ID在此外部系统中移动到“ x”,并将所有重复项移动到“ y”的原因。

这是我在构建字典时要考虑的内容(假设其构建顺序正确),但我真的不知道从这里可以去哪里

import csv

def readcsv(filename, column):
    file = open(filename, "rU")
    reader = csv.reader(file, delimiter=",")
    list = []
    for row in reader:
         list.append(row[(column)])
    file.close()
    return list

def makeDict(id, fileName, detDate):
        iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
        return (iList)

id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))

mainDict = makeDict((id), (fileName), (detDate))

样本数据(将各列提取到更简单的表格中进行测试)

Date    fileURL ID
7/24/2018 16:04 https://localhost/file1.docx    2599302
7/24/2018 16:03 https://localhost/file3.docx    2349302
7/24/2018 16:01 https://localhost/file1.docx    2599302
7/24/2018 16:04 https://localhost/fil232.xml    2599303
7/24/2018 16:03 https://localhost/file1.docx    2349333
7/24/2018 16:01 https://localhost/file3.docx    2529374

更新: 使用下面的答案,这就是我最终使之起作用的原因:

import csv

def readcsv(filename, column):
    file = open(filename, "rU")
    reader = csv.reader(file, delimiter=",")
    list = []
    for row in reader:
         list.append(row[(column)])
    file.close()
    return list

def makeDict(id, fileName, detDate):
        iList = {z[0]:list(z[1:]) for z in zip((id),(fileName),(detDate))}
        return (iList)


## Group Keys by like file names ##
def groupKeys(mainDict):
    same_filename = {}
    for key, line in (mainDict).items():
     name, date = line
     if name not in same_filename:
       same_filename[name] = [key]
     else:
       same_filename[name].append( key )
    return(same_filename)



########################################### Get latest ID ##################
def getLatestID(same_filename, mainDict):
## for each file
    for k in (same_filename.keys()):
     curDate = 0
     curID = 0
 ## get each id value (aka matching ids holding same file)
     for v in (same_filename.get((k))):
      moveDupeList.append(v)   ## add to a list of dupes 

  ## if current id's date is equal to the highest found so far - note:date already set since its same
      if ((mainDict.get((v)))[1]) == (curDate):

    ## check which id is highest and set curId if new high found
       if (v) > (curId):
        curId = (v)

    ## else if date of current is greater than greatest found so far set new highest date and id
      elif ((mainDict.get((v)))[1]) > (curDate):
       curDate = ((mainDict.get((v)))[1])
       curId = (v)
     if (curId) in moveDupeList:
      moveDupeList.remove((curId))   #remove latest from dupe list
     moveProperList.append((curId))  #add latest to proper list
########################################### Get latest ID ##################


id = (readcsv("jul.csv", 2))
fileName = (readcsv("jul.csv", 1))
detDate = (readcsv("jul.csv", 0))

mainDict = makeDict((id), (fileName), (detDate))
same_filename = (groupKeys(mainDict))
getLatestID((same_filename), (mainDict))

1 个答案:

答案 0 :(得分:0)

一个起点可能是构建另一个字典,为每个文件名提供所有对应键(id)的列表:

DllImport

这是您的第一点。