Question

我有一个数据文件，其中列出了日期（由包含-(void)viewDidAppear:(BOOL)animated { dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^ { [imageView sd_setImageWithURL:[NSURL URLWithString:[NSString stringWithFormat:@"%@", img]] placeholderImage:[UIImage imageNamed:@"stub_image.jpg"] completed:^(UIImage *image, NSError *error, SDImageCacheType cacheType, NSURL *imageURL) { dispatch_async(dispatch_get_main_queue(), ^ { [MBProgressHUD hideHUDForView:self.view animated:YES]; }); }]; }); }的行表示）和名称后跟数字：

这个列表文件很长（约97k行并且每天都在增长），我希望（快速）列出所有唯一名称。在bash我可以这样做：

2015.05.22
nameA 15
nameB 32
2015.05.20
nameA 2
nameC 26

但我在Python中使用这些数据，我想知道是否有一种在Python中做同样事情的方法。显然，我可以简单地从python脚本中调用这个shell命令，但我宁愿学习最佳实践＆＃39;这样做的方式。

Answer 1

This will do the trick which basically implements the same set of behaviours as your "Shell" script:

Filter lines in a given file; Remove any line that contains a .; Get a unique set of this data; Print it

Example:

from __future__ import print_function

lines = (line.strip() for line in open("foo.txt", "r"))
all_names = (line.split(" ", 1)[0] for line in lines if "." not in line)
unique_names = set(all_names)
print("\n".join(unique_names))

Output:

$ python foo.py 
nameC
nameB
nameA

Answer 2

只需使用re

即可

>>> input_str = """
2015.05.22
nameA 15
nameB 32
2015.05.20
nameA 2
nameC 26
"""
>>> import re
>>> set(re.findall('[a-zA-Z]+', input_str))
set(['nameB', 'nameC', 'nameA'])
>>>

Answer 3

您只需一个awk命令即可完成所有这些操作：

$ awk 'NF && $1!~/\./ {a[$1]} END {for (i in a) print i}' file
nameC
nameA
nameB

这将检查那些具有某些数据并且其第一个字段不包含点的行。在这种情况下，它将值存储在数组a[]中，稍后会打印出来。

在Python中，您可以使用set()来存储数据并防止重复：

for name in set([line.split()[0] for line in open('a') if line.split()[0] and "." not in line.split()[0]]):
    print name

Answer 4

更详细的做法：

unique_results = set()

with open("my file.txt") as my_file:
    for line in my_file:
         if "." not in line:
             name = line.split(" ")
             unique_results.add(name)

Answer 5

只需一行代码即可实现（假设是Python 2.x）：

unique_names = {}.fromkeys([line.split()[0] for line in open("file.txt", "r") if "." not in line]).keys()
print unique_names

输出：

['nameB', 'nameC', 'nameA']

如果你想像shell那样输出：

print "\n".join(unique_names)

输出：

nameB
nameC
nameA

如果名字的顺序并不重要，那么python也很优雅。

Python命令在长列表中查找唯一名称

5 个答案: