Question

我正在尝试用Python 3.0编写代码来逐行重新格式化数据文件。代码读取每一行，将行转换为列表，然后读取列表的每个元素。然后修改每个列表元素并将其复制到输出文件。

问题是某些元素包含反斜杠字符，Python将其解释为命令或莫名其妙地忽略。有没有办法在Python中读取和/或提取列表的元素作为原始或文字字符串？

我的代码如下：

import shlex
import sys
import fileinput
import string
inputFile = list(open("inputfile.txt","r"))
outputFile = open("outputFile.txt","a")

for i in range(1,len(inputFile)):
    print(inputFile[i])
    line = shlex.shlex(inputFile[i], posix = True)
    line.whitespace = "\t"
    line.whitespace_split = True
    line = list(line)
    for j in range(0,3):
        cell = line[j]
        cell_1 = cell.replace("\\","\\\\")
        outputFile .write(("%s\t")%(cell_1))
    for k in range(4,len(line)):
        cell = str(line[k])
        cell_1 = cell.replace(" | ","\t")
        if cell_1 == "-":
            outputFile .write("-\t-\t")
        if cell_1 == "unknown":
            outputFile .write("unknown\t-\t")
        else:
            outputFile .write(("%s\t")%(cell_1))

输入的一个例子是： GA10034 7421353 7424287 FBgn0070093 Dpse \ GA10034蛋白水解|用InterPro从电子注释推断：IPR007484 - - - - 未知 - - - 肽酶活性|使用InterPro从电子注释推断：IPR007484 - - - - - -

示例输出行是： GA10034 7421353 7424287用InterPro从电子注释推断DpseGA10034蛋白水解：IPR007484 - - - - - - - - - - - - 未知 - - - - - - - - - - 用InterPro从电子注释推断的肽酶活性：IPR007484 - - - - - - - - - -

输出中删除了Dpse和GA10034之间的\。

（该脚本还在每个新行的开头添加一个制表符 - 从第二行开始 - 在输出中;并且在输入文件的大约3/4处声称存在“无法解释”没有收尾报价“;但我认为最好一次解决一个问题”

Answer 1

您的问题似乎是您在创建解析器时指定posix = True，解析器会解释反斜杠和引号。但是，听起来你不想要这些行为，所以你应该使用posix = False代替。

有没有办法从Python中的列表中提取和打印原始/文字字符串

1 个答案: