Question

我尝试从以utf-8编码打开的文本文件中读取大量数字。文本文件是pdf的复制/粘贴。问题出在负数（-1，-2等）上：我剥离了所有内容，因此各个字符串位看起来像这样：-1，-2等。

然后我想用它们计算并用float()进行转换，但出现错误：

can't convert string to float: '-1'

我得出结论，'-'可以解释为长'-'，无论调用什么，然后在文本文件中手动将其替换为'-'。现在，它适用于该单个字符串，float（）对其进行了转换。我写了一个小脚本，该脚本在文本文件中查找并用'-'替换了所有'-'，但这没用。

with open('text.txt', encoding='utf8') as fobj:
    all = []
    for line in fobj:
        line = line.strip()
        if '-' in line:
            line.replace('-','-')
            print('replaced')
        all.append(line)
with open('text2.txt','w',encoding='utf8') as f:
    for i in all:
        print(i)
        f.write(i)
        f.write('\n')

为什么我可以手动将'-'替换为'-'，但不能用此脚本替换？感谢您的帮助。

从文本文件中摘录的示例：

/ 11/3 / 2 / 0 / 0/–1 /
/ 11/5 / 0 / 2 / 0/0 / N
/ 12/3 / 1 / 0 / 0/0 /
/ 12/4 / 1 / 1 / 0/0 / NS

/ 12/4 / 4 / –1 / 0/–1 / H

/ 12/5 / 1 / 0 / 0/–1 / H

/ 12/5 / 2 / 0 / 0/-1 / H

/ 11/4 / 0 / 0 / 0/0 / H

您实际上可以看到倒数第二行和倒数第三行-1之间的差异。在此副本中。我手动更换了最后一个。

Answer 1

您错过了line作业

if '-' in line:
    line = line.replace('-','-')
    print('replaced')

Answer 2

我刚刚看过您的代码：它确实replace('-','-') –相同的字符。

您应该执行replace('–','-')，或者为了更清楚地了解您所做的事情，replace(u'\u2013', '-')。

此外，您对line的重新分配丢失了。

Answer 3

同时使用两个答案，您的代码应为：

with open('text.txt', encoding='utf8') as fobj:
        all_ = []
        for line in fobj:
            line = line.strip()
            if u'\u2013' in line:
                line = line.replace(u'\u2013', '-')
                print('replaced', line)
            all_.append(line)
    with open('text2.txt','w',encoding='utf8') as f:
        for i in all_:
            print(i)
            f.write(i)
            f.write('\n')

结果是

replaced / 11/3 / 2 / 0 / 0/-1 /
replaced / 12/4 / 4 / -1 / 0/-1 / H
replaced / 12/5 / 1 / 0 / 0/-1 / H
/ 11/3 / 2 / 0 / 0/-1 /
/ 11/5 / 0 / 2 / 0/0 / N
/ 12/3 / 1 / 0 / 0/0 /
/ 12/4 / 1 / 1 / 0/0 / NS

/ 12/4 / 4 / -1 / 0/-1 / H

/ 12/5 / 1 / 0 / 0/-1 / H

/ 12/5 / 2 / 0 / 0/-1 / H

/ 11/4 / 0 / 0 / 0/0 / H

字符串“ -1”不能转换为浮点数

3 个答案: