Python命令在长列表中查找唯一名称

时间:2015-05-22 14:30:32

标签: python bash grep

我有一个数据文件,其中列出了日期(由包含-(void)viewDidAppear:(BOOL)animated { dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^ { [imageView sd_setImageWithURL:[NSURL URLWithString:[NSString stringWithFormat:@"%@", img]] placeholderImage:[UIImage imageNamed:@"stub_image.jpg"] completed:^(UIImage *image, NSError *error, SDImageCacheType cacheType, NSURL *imageURL) { dispatch_async(dispatch_get_main_queue(), ^ { [MBProgressHUD hideHUDForView:self.view animated:YES]; }); }]; }); } 的行表示)和名称后跟数字:

.

这个列表文件很长(约97k行并且每天都在增长),我希望(快速)列出所有唯一名称。在bash我可以这样做:

2015.05.22
nameA 15
nameB 32
2015.05.20
nameA 2
nameC 26

但我在Python中使用这些数据,我想知道是否有一种在Python中做同样事情的方法。显然,我可以简单地从python脚本中调用这个shell命令,但我宁愿学习最佳实践'这样做的方式。

5 个答案:

答案 0 :(得分:1)

This will do the trick which basically implements the same set of behaviours as your "Shell" script:

Filter lines in a given file; Remove any line that contains a .; Get a unique set of this data; Print it

Example:

from __future__ import print_function

lines = (line.strip() for line in open("foo.txt", "r"))
all_names = (line.split(" ", 1)[0] for line in lines if "." not in line)
unique_names = set(all_names)
print("\n".join(unique_names))

Output:

$ python foo.py 
nameC
nameB
nameA

答案 1 :(得分:1)

只需使用re

即可
>>> input_str = """
2015.05.22
nameA 15
nameB 32
2015.05.20
nameA 2
nameC 26
"""
>>> import re
>>> set(re.findall('[a-zA-Z]+', input_str))
set(['nameB', 'nameC', 'nameA'])
>>> 

答案 2 :(得分:0)

您只需一个awk命令即可​​完成所有这些操作:

$ awk 'NF && $1!~/\./ {a[$1]} END {for (i in a) print i}' file
nameC
nameA
nameB

这将检查那些具有某些数据并且其第一个字段不包含点的行。在这种情况下,它将值存储在数组a[]中,稍后会打印出来。

在Python中,您可以使用set()来存储数据并防止重复:

for name in set([line.split()[0] for line in open('a') if line.split()[0] and "." not in line.split()[0]]):
    print name

答案 3 :(得分:0)

更详细的做法:

unique_results = set()

with open("my file.txt") as my_file:
    for line in my_file:
         if "." not in line:
             name = line.split(" ")
             unique_results.add(name)

答案 4 :(得分:0)

只需一行代码即可实现(假设是Python 2.x):

unique_names = {}.fromkeys([line.split()[0] for line in open("file.txt", "r") if "." not in line]).keys()
print unique_names

输出:

['nameB', 'nameC', 'nameA']

如果你想像shell那样输出:

print "\n".join(unique_names)

输出:

nameB
nameC
nameA

如果名字的顺序并不重要,那么python也很优雅。