如何返回两个列表中相同元素的计数?

时间:2015-03-06 05:48:25

标签: python python-2.7

我有两个非常大的列表(这就是我使用...的原因),列表清单:

x = [['I like stackoverflow. Hi ok!'],['this is a great community'],['Ok, I didn\'t like this!.'],...,['how to match and return the frequency?']]

和字符串列表:

y = ['hi', 'nice', 'ok',..., 'frequency']

我想在新列表中返回y中所有x列表中[(1,2),(2,0),(3,1),...,(n,count)] 中的任何单词出现的次数(计数)。例如,对于上面的列表,这应该是正确的输出:

[(1,count),...,(n,count)]

如下,n。其中count是列表编号,y x中{{1}}中任何字出现的次数{{1}}。知道如何处理这个问题吗?。

6 个答案:

答案 0 :(得分:3)

首先,您应该将x预处理到一组小写单词的列表中 - 这将极大地加速以下查找。 E.g:

ppx = []
for subx in x:
    ppx.append(set(w.lower() for w in re.finditer(r'\w+', subx))

(是的,你可以把它折叠成列表理解,但我的目标是一些易读性。)

接下来,您循环遍历y,检查ppx中有多少个集合包含y的每个项目 - 这将是

[sum(1 for s in ppx if w in s) for w in y]

这不会给你那些你渴望的多余的第一件物品,但enumerate救援......:

list(enumerate((sum(1 for s in ppx if w in s) for w in y), 1))

应该准确地给出你需要的东西。

答案 1 :(得分:2)

这是一个更易读的解决方案。检查我在代码中的评论。

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

x = [['I like stackoverflow. Hi ok!'],['this is a great community'],['Ok, I didn\'t like this!.'],['how to match and return the frequency?']]
y = ['hi', 'nice', 'ok', 'frequency']


assert len(x)==len(y), "you have to make sure length of x equals y's"
num = []
for i in xrange(len(y)):
    # lower all the strings in x for comparison
    # find all matched patterns in x and count it, and store result in variable num
    num.append(len(re.findall(y[i], x[i][0].lower())))

res = []
# use enumerate to give output in format you want
for k, v in enumerate(num):
    res.append((k,v))
# here is what you want    
print res

输出:

[(0, 1), (1, 0), (2, 1), (3, 1)]

答案 2 :(得分:2)

INPUT:

x = [['I like stackoverflow. Hi ok!'],['this is a great community'],
['Ok, I didn\'t like this!.'],['how to match and return the frequency?']]
y = ['hi', 'nice', 'ok', 'frequency']

CODE:

import re
s1 = set(y)
index = 0
result = []
for itr in x:
    itr = re.sub('[!.?]', '',itr[0].lower()).split(' ')
    # remove special chars and convert to lower case
    s2 = set(itr)
    intersection = s1 & s2
    #find intersection of common strings
    num = len(intersection)
    result.append((index,num))
    index = index+1

输出:

result = [(0, 2), (1, 0), (2, 1), (3, 1)]

答案 3 :(得分:1)

你也可以这样做。

>>> x = [['I like stackoverflow. Hi ok!'],['this is a great community'],['Ok, I didn\'t like this!.'],['how to match and return the frequency?']]
>>> y = ['hi', 'nice', 'ok', 'frequency']
>>> l = []
>>> for i,j in enumerate(x):
        c = 0
        for x in y:
            if re.search(r'(?i)\b'+x+r'\b', j[0]):
                c += 1
        l.append((i+1,c))


>>> l
[(1, 2), (2, 0), (3, 1), (4, 1)]

(?i)将执行不区分大小写的匹配。 \b称为单词边界,在单词字符和非单词字符之间匹配。

答案 4 :(得分:1)

也许你可以在x中连接字符串以简化计算:

w = ' '.join(i[0] for i in x)

现在w是一个长字符串,如下所示:

>>> w
"I like stackoverflow. Hi ok! this is a great community Ok, I didn't like this!. how to match and return the frequency?"

通过此转换,您只需执行此操作:

>>> l = []
>>> for i in range(len(y)):
    l.append((i+1, w.count(str(y[i]))))

给你:

>>> l
[(1, 2), (2, 0), (3, 1), (4, 0), (5, 1)]

答案 5 :(得分:0)

你可以制作一个字典,其中键是" Y"名单。循环键的值并在字典中查找它们。一旦在X嵌套列表中遇到单词,请继续更新值。