Question

我有一个非常庞大的javascript文件，我试图分析。该文件有很多代码，删除了换行符，分析文件变得很难，因此我使用replace函数查找;的所有实例，并将其替换为;\u000A（\ u000A是换行的unicode）。这解决了我的问题，程序变得更具可读性。但是我现在遇到了另一个问题：每个for循环都被更改了。

例如：

for(i=0; i<someValue; i++)

改为

for(i=0;
i<someValue;
i++)

我想用Python编写一个程序来格式化这个错误。我的想法是这样的：

for line in open('index.html', 'r+'):
    if  line.startswith('for(') and line.endswith(';'):
        line.strip('\n')

但是，我不知道用什么代码去除下一行换行符，因为for循环一次只能读取一行。有人可以建议我要做什么吗？

Answer 1

Python文件对象是可迭代的，您可以在循环时询问下一行：

with open(inputfilename) as ifh:
    for line in ifh:
        if line.startswith('for(') and line.endswith(';\n'):
            line = line.rstrip('\n') + next(ifh).rstrip('\n') + next(ifh)

这使用next() function从ifh文件对象中检索接下来的两个项目，并将它们添加到当前行。在此之后，外部循环将继续使用该行。

为了说明，请查看此迭代器循环的输出：

>>> lst = [1, 2, 3, 4]
>>> lst_iter = iter(lst)
>>> for i in lst_iter:
...     print i
...     if i == 2:
...         print 'skipping ahead to', next(lst_iter)
...
1
2
skipping ahead to 3
4

此处next()将lst_iter迭代推进到下一个项目，然后外部for循环继续使用下一个值。

您的下一个问题是就地重写文件;你不能同时读取和写入同一个文件，并希望只更换正确的部分。缓冲和不同的线路长度阻碍了它。

使用fileinput module处理替换文件内容：

import sys
import fileinput

for line in fileinput.input(inputfilename):
    if line.startswith('for(') and line.endswith(';'):
        line = line.rstrip('\n') + next(ifh).rstrip('\n') + next(ifh)
    sys.stdout.write(line)

或使用我的in-place file rewriting context manager。

from inplace import inplace

with inplace(inputfilename) as (ifh, ofh):
    for line in ifh:
        if line.startswith('for(') and line.endswith(';'):
            line = line.rstrip('\n') + next(ifh).rstrip('\n') + next(ifh)
        ofh.write(line)

Answer 2

您可以使用计数器，如下所示：

cnt = 2
for line in open('index.html'):
    if(line.startswith('for(') and line.endswith(';\n')):
        cnt = 0
    if cnt < 2:
        line = line.strip('\n')
        cnt += 1

如何在python中读取文件时检查下一行的下一行并在其末尾删除换行符？

2 个答案: