Question

我在python中有一个特定的编码问题。

Count = defaultdict(int)
for l in text:
   for m in l['reviews'].split():
      Count[m] += 1

print Count

text是一个类似于

的列表

[{'ideology': 3.4,
 'ID': '50555',
 'reviews': 'Politician from CA-21, very liberal and aggressive'},{'ideology': 1.5,
 'ID': '10223'
 'reviews': 'Retired politician'}, ...]

如果我运行此代码，我会得到如下结果：

defaultdict(<type 'int'>, {'superficial,': 2, 'awesome': 1, 
'interesting': 3, 'A92': 2, ....

我想得到的是一个二重奏计数，而不是单字组计数。我尝试了以下代码，但收到错误TypeError: cannot concatenate 'str' and 'int' objects

Count = defaultdict(int)
for l in text:
    for m in l['reviews'].split():
       Count[m, m+1] += 1

我想使用类似的代码，而不是使用Stackoverflow中已经存在的其他代码。大多数现有代码都使用单词列表，但我想直接从原始文本中的split（）中计算bigrams。

我希望得到类似的结果：

defaultdict(<type 'int'>, {('superficial', 'awesome'): 1, ('awesome, interesting'): 1, 
('interesting','A92'): 2, ....}

为什么我会收到错误，如何修复此代码？

Answer 1

有一种计算标准库中对象的解决方案，称为Counter。此外，在itertools的帮助下，您的二元组计数器脚本可能如下所示：

class ModelManager {
    static let sharedInstance = ModelManager()
    var database = FMDatabase(path: Utility.getPath("myDB.sqlite"))


    class func getInstance() -> ModelManager {
        if(sharedInstance.database == nil) {
            sharedInstance.database = FMDatabase(path: Utility.getPath("myDB.sqlite"))
        }

        return sharedInstance
    }
}

Answer 2

如果我理解你的问题，下面的代码可以解决你的问题。

 Count = dict()
    for l in text:
        words = l['reviews'].split()
        for i in range(0,len(words) -1):
            bigram  = " ".join(words[i:i+2] )
            if not bigram  in Count:
                Count[bigram] = 1;
            else:
                Count[bigram] = Count[bigram] + 1

计数将是：

> {'CA-21, very': 1, 'liberal and': 1, 'very liberal': 1, 'and
> aggressive': 1, 'Politician from': 1, 'aggressive Politician': 1,
> 'from CA-21,': 1}

编辑：如果你想使用key作为元组，只需更改连接线。 python dict也有哈希元组。

Answer 3

你想计算每两个相邻单词的数量吗？让他们成为一个元组。

text = [{'ideology':3.4, 'ID':'50555', 'reviews':'Politician from CA-21, very liberal and aggressive'}]
Count = {}
for l in text:
   words = l['reviews'].split()
   for i in range(len(words)-1):
        if not (words[i],words[i+1]) in Count:
                Count[(words[i],words[i+1])] = 0
        Count[(words[i],words[i+1])] += 1

print Count

结果：

{（'和'，'aggressive'）：1，（'from'，'CA-21，'）：1，（'Politician'，'from'）：1，（'CA-21，' ，'非常'）：1，（'非常'，'自由主义'）：1，（'自由'，'和'）：1}

如何在python中使用循环计算bigrams

3 个答案: