Question

嗨所以我正在尝试编写一个函数，classify（csv_file），它从csv文件创建一个默认的字典词典。第一个＆＃34;列＆＃34; （每行中的第一项）是字典中每个条目的关键，然后是第二个＆＃34;列＆＃34; （每行中的第二项）将包含值。

但是，我想通过调用两个函数（按此顺序）来改变值：

trigram_c（string）：在字符串中创建一个三元组计数的默认字典（即值）
normal（tri_counts）：接受trigram_c的输出并对计数进行标准化（即将每个三元组的计数转换为数字）。

因此，我的最终输出将是词典词典：

{value: {trigram1 : normalised_count, trigram2: normalised_count}, value2: {trigram1: normalised_count...}...} and so on

我目前的代码如下：

def classify(csv_file):
    l_rows = list(csv.reader(open(csv_file)))
    classified = dict((l_rows[0], l_rows[1]) for rows in l_rows)

例如，如果csv文件是：

Snippet1, "It was a dark stormy day"
Snippet2, "Hello world!"
Snippet3, "How are you?"

最终输出类似于：

{Snippet1: {'It ': 0.5352, 't w': 0.43232}, Snippet2: {'Hel' : 0.438724,...}...} and so on.

（当然，不仅仅有两个三元组计数，而且为了示例的目的，这些数字只是随机的）。

非常感谢任何帮助！

Answer 1

首先，请检查分类功能，因为我无法运行它。这里更正了版本：

import csv

def classify(csv_file):
    l_rows = list(csv.reader(open(csv_file)))
    classified = dict((row[0], row[1]) for row in l_rows)
    return classified

它从第一列返回带有键的字典，值是第二列中的字符串因此，您应该迭代每个字典条目并将其值传递给trigram_c函数。我不明白你如何计算三元组计数，但是例如如果你只计算字符串中三元组外观的数量，你可以使用下面的函数。如果你想进行其他计数，你只需要在for循环中更新代码。

def trigram_c(string):
    trigram_dict = {}
    start = 0
    end = 3
    for i in range(len(string)-2):
        # you could implement your logic in this loop
        trigram = string[start:end]
        if trigram in trigram_dict.keys():
            trigram_dict[trigram] += 1
        else:
            trigram_dict[trigram] = 1
        start += 1
        end += 1
    return trigram_dict

从csv文件创建字典字典

1 个答案: