我的问题是删除两个重复的行。我有一个文本文件:
192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.12 --- 00:A0:57:2E:A6:12
192.168.1.11 --- 00:1D:A2:80:3C:CC
192.168.1.7 --- F0:9F:C2:0A:48:E7
192.168.1.6 --- 80:2A:A8:C9:85:1C
192.168.1.1 --- F0:9F:C2:05:B7:A6
192.168.1.9 --- DC:4A:3E:DF:22:06
192.168.1.8 --- 80:2A:A8:C9:8E:F6
192.168.1.1 --- F0:9F:C2:05:B7:A6
192.168.1.7 --- F0:9F:C2:0A:48:E7
192.168.1.12 --- 00:A0:57:2E:A6:12
192.168.1.11 --- 00:1D:A2:80:3C:CC
192.168.1.6 --- 80:2A:A8:C9:85:1C
192.168.1.8 --- 80:2A:A8:C9:8E:F6
文本文件完全如下所示。请帮助我,我想删除两个重复的行,所以它只保留:
192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.9 --- DC:4A:3E:DF:22:06
感谢您的帮助。
答案 0 :(得分:2)
collections.Counter
对象的另一个简短替代方案:
import collections
with open('lines.txt', 'r') as f:
for k,c in collections.Counter(f.read().splitlines()).items():
if c == 1:
print(k)
输出:
192.168.1.18 --- B8:27:EB:48:C3:B6
192.168.1.9 --- DC:4A:3E:DF:22:06
答案 1 :(得分:1)
问题中没有很多详细信息,您已经标记为numpy
,这是一项要求还是仅仅是一种兴趣?
如果您没有特殊要求,请使用标准库:
d = {}
with open('/file/path', 'r') as f:
for line in f:
if line not in d:
d[line] = 1
else:
d[line] += 1
no_dup = [line for line in d if d[line] < 2]
答案 2 :(得分:1)
选项1
使用numpy
首先,使用 np.loadtxt
加载文件。
x = np.loadtxt('file.txt', dtype=str, delimiter=',')
# bogus delimiter so that a 1D array is loaded
接下来,将{em> np.unique
与return_counts=True
一起使用,找到所有未重复的唯一条目。
unique, counts = np.unique(x, return_counts=True)
out = unique[counts == 1]
out
array(['192.168.1.18 --- B8:27:EB:48:C3:B6',
'192.168.1.9 --- DC:4A:3E:DF:22:06'],
dtype='<U34')
选项2
使用pandas
使用 pd.read_csv
加载您的数据,然后致电 drop_duplicates
。
df = pd.read_csv('file.txt', delimiter=',', header=None)
df
0
0 192.168.1.18 --- B8:27:EB:48:C3:B6
1 192.168.1.12 --- 00:A0:57:2E:A6:12
2 192.168.1.11 --- 00:1D:A2:80:3C:CC
3 192.168.1.7 --- F0:9F:C2:0A:48:E7
4 192.168.1.6 --- 80:2A:A8:C9:85:1C
5 192.168.1.1 --- F0:9F:C2:05:B7:A6
6 192.168.1.9 --- DC:4A:3E:DF:22:06
7 192.168.1.8 --- 80:2A:A8:C9:8E:F6
8 192.168.1.1 --- F0:9F:C2:05:B7:A6
9 192.168.1.7 --- F0:9F:C2:0A:48:E7
10 192.168.1.12 --- 00:A0:57:2E:A6:12
11 192.168.1.11 --- 00:1D:A2:80:3C:CC
12 192.168.1.6 --- 80:2A:A8:C9:85:1C
13 192.168.1.8 --- 80:2A:A8:C9:8E:F6
df.drop_duplicates(keep=False)
0
0 192.168.1.18 --- B8:27:EB:48:C3:B6
6 192.168.1.9 --- DC:4A:3E:DF:22:06
要保存到您的文字,您可以使用 pd.to_csv
:
df.to_csv('file.txt', delimiter='')