Question

我有一个配置文件，格式为：

IP，用户名，日志文件

IP，用户名，日志文件1

IP，用户名，日志文件2

我在下面提供了将文本文件行存储到列表中的代码，但是我需要代码方面的帮助，这些代码可以确定日志文件的名称与logfile1是否相同请帮助

import csv

config_file_path = "config15.txt"  # read config file and assign IP,username,logfile,serverpath,localpath
file  = open(config_file_path, 'r')
reader = csv.reader(file)
all_rows = [row for row in reader] # appending config file contents in a list

上面的代码输出给出：

[['127.0.0.1', 'new34', 'logfile'], ['127.0.0.1', 'new34', 'logfile1']]

我想要一个代码来比较并判断logfile和logfile1的名称是否相同，以及相同的名称是否返回true。

Answer 1

使用简单的迭代并将set用作检查变量。

例如：

all_rows = [['127.0.0.1', 'new34', 'logfile1'], ['127.0.0.1', 'new34', 'logfile1']]
def check_row(data):
    seen = set()
    for i in data:
        if i[-1] in seen:
            return True
        else:
            seen.add(i[-1])
    return False


print(check_row(all_rows))  #True

Answer 2

如果这确实是您的文件格式。将其读取为数据框会更容易：

import pandas as pd
df = pd.read_csv('config15.txt',sep=',', header = None, names =['ip','un','lf']) #or just change extension to *.csv
dupldf =df[df.duplicated(['lf'])]# find duplicate rows

如果为空，则没有重复的值

Answer 3

因此，据我所知，您正在寻找日志文件重复项。首先，您需要一个列表（或日志文件的向量），例如：

logfiles = [row[-1] for row in reader]

此列表包含日志文件名称。现在，我建议您使用numpy，这是一个非常庞大的python库，其中包含有用的方法（如果您要使用python中的代码，则必须了解此库），所以：

import numpy as np
logfiles = np.array(logfiles) #simply transformation of list into a numpy array 
i, j = np.where(logfiles[:, np.newaxis]==logfiles[np.newaxis, :])

logfiles[i]是重复的元素，即logfiles[i] = logfiles[j] 显然，每个元素也都等于它自己，因此您必须删除i==j的元素：

index2save = np.where(i[:, np.newaxis]!=j[np.newaxis, :])[0]
i = i[index2save]

现在i是重复元素的索引，而logfiles[i]是相同的名称。希望这可以帮助您！

比较同一列表中的项目

3 个答案: