Question

我为我对python如何提前处理字符串的无知感到抱歉。我有一个至少1000行的.txt文件。它看起来像下面的

:dodge
1 6 some description string of unknown length
E7 8 another description string 
3445 0 oil temp something description voltage over limit etc

:ford
AF 4 description of stuff
0 8 string descritiopn

我想要做的基本上是放一个＆＃34 ;;＆＃34;在每个字符串之前，我将最终得到的内容如下

:dodge
1 6 ;some description string of unknown length
E7 8 ;another description string 
3445 0 ;oil temp something description voltage over limit etc

:ford
AF 4 ;description of stuff
0 8 ;string descritiopn

我的想法是打开文件，搜索＆＃34;：＆＃34;人物，下一行，转到＆＃34; ＆＃34;角色，转到下一个＆＃34; ＆＃34;性格并写一个＆＃34;;＆＃34;。另一个想法是转到＆＃34; / n＆＃34;文字文件中的字符，如果下一个charachter！=＆＃34;：＆＃34;然后寻找第二个空间导入系统 import fileinput

with open("testDTC.txt", "r+") as f:
for line in f:
    if ' ' in line:     #read first space
        if ' ' in line:     #read second space
            line.append(';')

    f.write(line)

f.close()

我知道它不能得到我需要的东西但是自从我在python中进行字符串操作以来已经很长时间了。

Answer 1

你只需要在空格上拆分两次并加入字符串，你就不需要一个简单的重复模式的正则表达式：

with open("testDTC.txt") as f:
    for line in f:
        if line.strip() and not line.startswith(":"):
            spl = line.split(None,2)
            print("{} ;{}".format(" ".join(spl[:2]),spl[2]))

要将更改写入原始文件，您可以fileinput.input使用inplace=True：

from fileinput import input
for line in input("testDTC.txt",inplace=True):
    if line.strip() and not line.startswith(":"):
        spl = line.split(None,2)
        print("{} ;{}".format(" ".join(spl[:2]),spl[2]),end="")
    else:
        print(line,end="")

我们可以解压缩而不是索引：

        a, b, c = line.split(None,2)
        print("{} {} ;{}".format(a, b, c),end="")

输出：

:dodge
1 6 ;some description string of unknown length
E7 8 ;another description string 
3445 0 ;oil temp something description voltage over limit etc

:ford
AF 4 ;description of stuff
0 8 ;string descritiopn

对于python 2，您可以删除end=""并在print语句后使用逗号，即print(line),

我们避免使用line.startswith(":")的起始段落行和if line.strip()的空行。

Answer 2

您可以使用非常简单的算法执行此操作，而无需调用正则表达式，这样您就可以看到正在发生的事情。

with open('test.txt') as infile:
    with open('out.txt', 'w') as outfile:
        for line in infile:
            if not line or line.startswith(':'):   # Blank or : line
                outfile.write(line or '\n')        # pass it through
            else:
                line_parts = line.split(None, 2)   # split at most twice
                try:
                    # try adding the semicolon after the 2nd space
                    line_parts[2] = ';' + line_parts[2]
                except IndexError:
                    pass
                outfile.write(' '.join(line_parts))

如果你真的想一次读取一个文件中的字符，你最终会使用read方法和seek，但这在Python中是不必要的，因为你有高级构造像文件迭代和强大的字符串方法来帮助你。

Answer 3

根据您的示例，您的第二列中似乎有一个用空格分隔的数字或数字，例如8，6后面是第三列中的一些描述，似乎没有任何数字。如果是这种情况，不仅对于此示例，您可以使用此事实来搜索由空格分隔的数字，并在其后面添加;，如下所示：

导入重新

rep = re.compile(r'(\s\d+\s)')    

out_lines = []

with open("file.txt", "r+") as f:
    for line in f:      
        re_match = rep.search(line)
        if re_match:
            # append ; after the found expression.                         
            line = line.replace(re_match.group(1), re_match.group(1)+';')        
        out_lines.append(line)



with open('file2.txt', 'w') as f:
    f.writelines(out_lines)

获得的file2.txt如下：

:dodge
1 6 ;some description string of unknown length
E7 8 ;another description string
3445 0 ;oil temp something description voltage over limit etc

:ford
AF 4 ;description of stuff
0 8 ;string descritiopn

Answer 4

这就是我要做的事情：

for line in f:
    if ' ' in line:
        sp = line.split(' ', 2)
        line = '%s %s ;%s' % (sp[0], sp[1], sp[2])

Answer 5

由于你只有1000行左右，我认为你可以通过readlines（）和每行使用split来一次性读取它。如果该行只有一个元素然后打印它，则调用另一个循环来处理具有多个元素的后续行，并将第三个[2]元素替换为分号和元素的串联。然后你必须做一些事情来很好地输出这条线（这里有连接，但很多其他的解决方案）取决于你想要它。

with open('testDTC.txt') as fp:
    lines = fp.readlines()

for i in xrange(len(lines)):
    if len(lines[i].split()) == 1:
        print lines[i][:-1]
        i += 1
        while len(lines[i].split()) > 0:
            spl = lines[i].split()
            spl[2] = ";"+spl[2]
            print " ".join(spl)
            i += 1
            if i == len(lines):
                break
        print

python，搜索.text文件并注入字符

5 个答案: