根据单词数组对字符串列表中的单词进行计数,并从中获取字典

时间:2019-07-08 06:48:21

标签: arrays python-3.x dictionary

我有一个字符串列表,如下:

string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']

以及单词列表,例如:

words=['hope','court','mention','maryland']

现在,我只想将字符串列表中出现的列表单词的发生次数放入单独的字典中,关键字为'doc_(index),值作为嵌套字典,值作为嵌套字典,关键字为出现的单词,值作为计数。预期输出为:

words_dict={'doc_1':{'court':2,'hope':1},'doc_2':{'court':1,'hope':1},'doc_3':{'mention':1,'hope':1,'maryland':1}}

我第一步要做的是

docs_dict={}
count=0
for i in string_list:
    count+=1
    docs_dic['doc_'+str(count)]=i
print (docs_dic)

{'doc_1': 'philadelphia court excessive disappointed court hope', 'doc_2': 'hope jurisdiction obscures acquittal court', 'doc_3': 'mention hope maryland signal held problem internal reform life bolster level grievance'}

在此之后,我不知道如何获得字数统计。我到目前为止所做的:

docs={}
for k,v in words_dic.items():
    split_words=v.split()
    for i in words:
        if i in split_words:
            docs[k][i]+=1
        else:
            docs[k][i]=0

4 个答案:

答案 0 :(得分:1)

您可以在python中使用count来获取句子中的字数。

检查此代码:

words_dict = {}
string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
words_list=['hope','court','mention','maryland']
for i in range(len(string_list)): #iterate over string list
    helper = {} #temporary dictionary
    for word in words_list: #iterate over word list
        x = string_list[i].count(word) #count no. of occurrences of word in sentence
        if x > 0:
            helper[word]=x
    words_dict["doc_"+str(i+1)]=helper #add temporary dictionary into final dictionary

#Print dictionary contents
for i in words_dict:
    print(i + ": " + str(words_dict[i]))

以上代码的输出为:

doc_3: {'maryland': 1, 'mention': 1, 'hope': 1}                                                                                                                                     
doc_2: {'court': 1, 'hope': 1}                                                                                                                                                      
doc_1: {'court': 2, 'hope': 1}

答案 1 :(得分:0)

使用Counter获取每个文档中的字数。

尝试一下,

  Future<void> createData(docId) {

    final Tracker createTracker = Tracker(
        id: docId,
        comment: _commentController.value,
        exercise: _exerciseController.value,
        number: _numberController.value,
        repetition: _repetitionController.value,
        sets: _setsController.value,
        weight: _weightController.value
    );

    print(_commentController.value + _numberController.value.toString());

    return trackerDb.createData(docId, createTracker);
  }

输出:

  Future createData(String docId, Tracker tracker) async {
    await db.document(docId).setData(convertTrackerToMap(tracker));
  }

答案 2 :(得分:0)

问题here似乎可以解决。

以下是我尝试执行所需代码的尝试。

from collections import Counter
string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
words=['hope','court','mention','maryland']


result_dict = {}

for index, value in enumerate(string_list):
     string_split = value.split(" ")
     cntr = Counter(string_split)
     result = { key: cntr[key] for key in words }
     result_dict['doc'+str(index)] = result


希望您发现它有用。

答案 3 :(得分:0)

尝试一下

from collections import Counter

string_list = ['philadelphia court excessive disappointed court hope',
               'hope jurisdiction obscures acquittal court',
               'mention hope maryland signal held problem internal reform life bolster level grievance']
words = ['hope', 'court', 'mention', 'maryland']

result = {f'doc_{i + 1}': {key: value for key, value in Counter(string_list[i].split()).items() if key in words} for i in range(len(string_list))}
print(result)

输出:

{'doc_1': {'court': 2, 'hope': 1}, 'doc_2': {'hope': 1, 'court': 1}, 'doc_3': {'mention': 1, 'hope': 1, 'maryland': 1}}