调试程序以从Python中的文本文件中的单词生成关联对

时间:2017-03-24 10:50:48

标签: python dictionary python-3.5

我已经编写了这段代码,以便在我的文本文件中出现超过50次的艺术家对。两个艺术家的名字用逗号分隔。我无法弄清楚为什么我的程序运行不正常,因为我正确地提取名称,但是打印名称时会失真,这些名称中的某些字符会消失。

这是我的代码:

#Program Written in Jetbrains Pycharm Community Edition with `Anaconda(Python Version 3.5)`
#Dhruv Marwha

filename="Artist_lists_small.txt" #Name of the file assigned to a variable

with open(filename,encoding='utf8')as infile:
    something=list(set(infile.read().split(","))) #Initialising Something with all unique artist names found in the Text File based on splitting all the words on a ','.

dictionary={}#Dictionary to Store Artist Name Along with It's Line Number

with open(filename,encoding='utf8') as f:#If the word from our list is present in the text file,we add the word+it's line number to the list of values in the dictionary.
    for line_num,xyz in enumerate(f):
        for i in range(len(something)):
            if something[i] in xyz:
                dictionary.setdefault(something[i], []).append(line_num)

for key,value in list(dictionary.items()):#Removing All the keys whose list of values has a length of less than 50
                                          #This is done because if 2 words occur less than 50 times,there's no point in making their combinations.
    if len(value)<50:
        del dictionary[key]

pair_list=[]#List to Store the Collection of All pairs which occur together more than 50 times

for key,value in dictionary.items():
    for key1,value1 in dictionary.items():#Comparing one key with another key in the same dictionary in order to find the intersection between the list of values
        if key==key1:
            continue
        if len(set.intersection(set(value),set(value1)))>50:#If the values between 2 lists intersect more than 50 times,the pair is valid.
            pair_list.append((key,key1))


for k in range(len(pair_list)):#Printing The List Of Pairs.
    print(pair_list[k])

#print(len(pair_list))#Printing the no of pairs which occur more than 50 times.

#############################################################################################################

#Time Complexity-O(n^2)
#Space Complexity-O(n)

Here's the link到相关文本文件。

另外请告诉我这个程序是否产生正确的输出,因为它产生正确的输出。

0 个答案:

没有答案