我不知道为什么字符串的长度为'0'

时间:2016-05-08 11:54:52

标签: python file text-files

以下是我的代码。没有找到任何评论,我会添加我的代码。

filenames2 = ['BROWN1_L1.txt', 'BROWN1_M1.txt', 'BROWN1_N1.txt', 'BROWN1_P1.txt', 'BROWN1_R1.txt']
with open("C:/Python27/L1_R1_TRAINING.txt", 'w') as outfile:
    for fname in filenames2:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

b = open("C:/Python27/L1_R1_TRAINING.txt", 'rU')    

filenames3 =[]
    for path, dirs, files in os.walk("C:/Python27/Reutertest"):
        for file in files:
            file = os.path.join(path, file)
            filenames3.append(file)

    with open("C:/Python27/REUTER.txt", 'w') as outfile:
        for fname in filenames3:
            with open(fname) as infile:
                for line in infile:
                    outfile.write(line)
c = open("C:/Python27/REUTER.txt", 'rU')

def Cross_Entropy(x,y):
filecontents1 = x.read()
filecontents2 = y.read()
sentence1 = filecontents1.upper()
sentence2 = filecontents2.upper()
count_A1 = sentence1.count('A')
count_B1 = sentence1.count('B')
count_C1 = sentence1.count('C')
count_all1 = len(sentence1)
prob_A1 = count_A1 / count_all1
prob_B1 = count_B1 / count_all1
prob_C1 = count_C1 / count_all1
count_A2 = sentence2.count('A')
count_B2 = sentence2.count('B')
count_C2 = sentence2.count('C')
count_all2 = len(sentence2)
prob_A2 = count_A2 / count_all2
prob_B2 = count_B2 / count_all2
prob_C2 = count_C2 / count_all2
Cross_Entropy = -(prob_A1 * math.log(prob_A2, 2) + prob_B1 * math.log(prob_B2, 2) + prob_C1 * math.log(prob_C2, 2)

Cross_Entropy(b, c)  

是。现在。我得到了错误"prob_A1 = count_A1 / count_all1 ZeroDivisionError: division by zero"。我的代码出了什么问题?我的拼写错了吗?

1 个答案:

答案 0 :(得分:0)

我不太确定你从文件中读取字符串的背后是什么,但你的交叉熵可以更简洁地计算出来:

def crossEntropy(s1,s2):
    s1 = s1.upper()
    s2 = s2.upper()
    probsOne = (s1.count(c)/float(len(s1)) for c in 'ABC')
    probsTwo = (s2.count(c)/float(len(s2)) for c in 'ABC')
    return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))

例如,

>>> crossEntropy('abbcabcba','abbabaccc')
1.584962500721156

如果这是您想要计算的内容 - 您现在可以集中精力组合字符串以传递给crossEntropy。我建议删除读写 - 读取逻辑(除非你需要你想要创建的两个文件)并直接将两个目录中的文件读入两个数组,将它们连接成两个被剥离的大字符串所有空白区域然后传递给crossEntropy

另一种可能的方法。如果你想要的 all 是两个目录中的'A','B','C'的计数 - 只需创建两个词典,每个目录一个,两个都用'A'键入,' B'和'C',遍历每个目录中的文件,依次读取每个文件,迭代但不保存生成的字符串,只获取这三个字符的计数,并创建{{1}的版本期待两本词典。

类似的东西:

crossEntropy

例如,

def crossEntropy(d1,d2):
    countOne = sum(d1[c] for c in 'ABC')
    countTwo = sum(d2[c] for c in 'ABC')
    probsOne = (d1[c]/float(countOne) for c in 'ABC')
    probsTwo = (d2[c]/float(countTwo) for c in 'ABC')
    return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))