python独特的结果

时间:2015-11-26 08:56:06

标签: python unique

我有两个带有ip-Addresses的文件,我试图查看文件1中的哪些地址不在文件2中。我只是不能让它工作,我做错了什么?我得到了以下代码:

access = []   
with open("C:\\users\\joey\\desktop\\access.log",'r') as bestand:
    for line in bestand:
        try:
            splittedline = line.split('sftp-session')[1].split("[")[1].split("]")[0]
        except Exception:
            continue
        access.append(splittedline)


nodes = []
with open("C:\\users\\joey\\desktop\\exit_nodes.csv",'r') as bestand1:
    for line in bestand1:
        nodes.append(line)


setA = set(access)
setB = set(nodes)
listC = list(setB - setA)

print listC

输出:(只是一小部分)

59.231\n', '78.41.115.145\n', '62.210.76.96\n', '84.53.203.38\n', '185.82.216.119\n', '176.10.99.205\n', '107.150.53.178\n', '37.157.192.208\n', '91.238.60.100\n', '110.93.23.170\n', '162.247.72.213\n', '18.239.0.140\n', '84.115.35.248\n', '106.187.37.158\n', '213.61.149.125\n', '86.178.119.84\n', '50.76.159.218\n', '46.72.101.220\n', '78.46.51.124\n', '178.162.193.213\n', '207.201.223.196\n', '101.99.64.150\n', '5.199.142.93\n', '5.165.42.171\n', '185.17.144.138\n', '81.219.51.206\n', '65.181.113.136\n', '185.13.37.158\n', '104.232.3.33\n', '77.109.141.140\n', '77.170.1.2\n', '93.126.101.223\n', '188.246.75.178\n', '193.107.85.61\n', '188.138.1.229\n', '108.26.225.148\n', '108.61.212.102\n', '128.79.53.244\n', '81.89.0.195\n', '94.23.30.53\n', '104.237.156.214\n', '68.233.235.217\n', '188.166.49.82\n', '192.3.177.167\n', '173.208.196.215\n', '77.109.138.44\n', '106.187.45.156\n', '78.142.175.70\n', '71.230.253.68\n', '66.146.193.31\n', '90.231.152.159\n', '122.19.43.24\n', '79.98.107.90\n', '178.9.251.184\n', '176.108.160.253\n', '93.95.228.116\n', '106.185.29.93\n', '109.169.23.202\n', '94.242.57.26\n', '79.165.223.209\n', '192.241.199.208\n', '162.220.56.186\n', '212.71.238.203\n', '178.79.161.152\n', '78.21.6.161\n', '85.159.113.228\n', '37.139.3.171\n', '104.167.102.244\n', '62.49.92.150\n', '66.220.3.179\n', '185.61.148.183\n', '104.167.113.138\n', '66.85.131.72\n', '37.59.123.142\n', '121.54.175.50\n', '94.242.251.112\n', '185.13.38.185\n', '24.175.166.20\n', '54.65.198.84\n', '176.123.6.101\n', '176.10.99.202\n', '176.106.54.54\n

3 个答案:

答案 0 :(得分:2)

在将每行添加到列表中之前,请尝试删除每行的换行符。我认为你的第二个清单中的换行符正在干扰比较。

>>> a = "one two three\n"
>>> a
'one two three\n'
>>> a.rstrip("\n")
'one two three'
>>> a
'one two three\n'

答案 1 :(得分:1)

我认为分割线的部分是问题:

splittedline = line.split('sftp-session')[1].split("[")[1].split("]")[0]

我尝试了一些示例输入 access.log

1.1.1.2
1.1.1.3

exit_nodes.csv

1.1.1.1
1.1.1.2
1.1.1.3
1.1.1.4

并使用您的(修改过的)脚本(使用.strip()删除换行符)

access = []   
with open('access.log', 'r') as a:
    for line in a:
        access.append(line.strip())

nodes = []
with open('exit_nodes.csv', 'r') as b:
    for line in b:
        nodes.append(line.strip())

setA = set(access)
setB = set(nodes)
listC = list(setB - setA)

print listC

它产生正确的输出,exit_nodes.csv中的所有内容,但不在access.log中:

>>> 
['1.1.1.4', '1.1.1.1']

答案 2 :(得分:0)

我猜你的输入是sftp-session[127.0.0.1] 您应该使用正则表达式来解析数据。

import re

access = []   
with open("C:\\users\\joey\\desktop\\access.log",'r') as bestand:
    for line in bestand:
        re_match = re.search('sftp-session\[\s*(\d+\.\d+\.\d+\.\d+)\s*\]', line)
        if re_match:
            access.append(re_match.group(1))
        else:
            continue

这样您就可以确保在阵列中获得100%正确的数据。正如obscurite所述,您的问题可能是输出中的换行符\n