我想打印唯一字符串值的计数,字符长度和相应的字符串。 Python很好,但我可以使用其他工具提出建议。如果需要特定的输出,可以很容易地解析标签分离或类似的输出。这是Parsing URI parameter and keyword value pairs的后续内容。
示例来源:
date=2012-11-20
test=
y=5
page=http%3A//domain.com/page.html&unique=123456
refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname
test=
refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname
refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname
y=5
page=http%3A//support.domain.com/downloads/index.asp
page=http%3A//support.domain.com/downloads/index.asp
view=month
y=5
y=5
y=5
示例输出:
5 3 y=5
3 78 refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname
2 52 page=http%3A//support.domain.com/downloads/index.asp
2 5 test=
1 15 date=2012-11-20
1 10 view=month
这是一个例子,我可以使用单行,但假设可能更容易在Python中提出可以处理这个和长度计数的东西。
$ sort test | uniq -c | sort -nr
5 y=5
3 refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname
2 test=
2 page=http%3A//support.domain.com/downloads/index.asp
1 view=month
1 page=http%3A//domain.com/page.html&unique=123456
1 date=2012-11-20
答案 0 :(得分:1)
是的,您可以使用Python轻松完成。通常人们倾向于使用字典来保持重复的跟踪
>>> from collections import defaultdict
>>> group = defaultdict(list)
>>> with open("test.txt") as fin:
for line in fin:
group[len(line.rstrip())].append(line)
>>> for k, g in group.items():
print k, len(g), g[0].strip()
3 5 y=5
5 2 test=
10 1 view=month
78 3 refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname
15 1 date=2012-11-20
48 1 page=http%3A//domain.com/page.html&unique=123456
52 2 page=http%3A//support.domain.com/downloads/index.asp
相反,如果您想模仿shell命令,可以使用itertools.groupby
来实现类似的操作,其行为类似于uniq
>>> with open("test.txt") as fin:
file_it = (e.rstrip() for e in fin)
for k, g in groupby(sorted(file_it, key = len), len):
first_elem = next(g).strip()
print k, sum(1 for _ in g) + 1, first_elem
3 5 y=5
5 2 test=
10 1 view=month
15 1 date=2012-11-20
48 1 page=http%3A//domain.com/page.html&unique=123456
52 2 page=http%3A//support.domain.com/downloads/index.asp
78 3 refer=http%3A//domain2.net/results.aspx%3Fq%3Dbob+test+1.21+some%26file%3Dname