Question

我在.txt文件中有一个文本，并且有一些段落，你可以看到这个结构：

name:zzzz,surnames:zzzz,id:zzzz,country:zzzz ...
name:zzzz,surnames:zzzz,id:zzzz,country:zzzz ...
name:zzzz,surnames:zzzz,id:zzzz,country:zzzz ...
name:zzzz,surnames:zzzz,id:zzzz,country:zzzz ...

而且我知道如何比较所有'id'，如果有相同id的paragraf，则消除其中一个。有些想法？谢谢。

我已经获得了第一个ID：/

Answer 1

首先，我假设您的数据看起来像这样。

qDebug()

我建议你使用name:z,surnames:zz,id:zzz,country:zzzz name:y,surnames:yy,id:yyy,country:yyyy name:x,surnames:xx,id:xxx,country:xxxx name:z,surnames:zz,id:zzz,country:zzzz包及其pandas函数。它可以为您提供一个read_csv对象，方便处理数据表。

DataFrame

Answer 2

使用文件路径作为参数，您可以选择每行的ID并将其保存在词典中。

import re
import sys

ref = dict()
with open(sys.argv[1], 'r') as f:
    for line in f.readlines():
       m = re.search(".*id:(\w*),", line)
       if m is not None and m.group(1) is not None:
           ref[m.group(1)] = line.strip()

for i in ref:
    print(ref[i])

Answer 3

尝试此操作，在解析文本文件时创建id的dict，以跟踪已包含的ID。编写一个新的文本文件，在解析时只包含唯一的ID。

file = open("file.txt","r")
file_new = open("file_new.txt","w")
id_list = {}    

for line in file:
     #third value of the line is the id
     id = line.split(",")[2]

     #if id is new, we add its corresponding line to the new file and record
     if id not in id_list:
         id_list[id] = True
         file_new.write(line)

如何获取同一个单词的所有信息

3 个答案: