我想编辑一个文本文档,其中每10-12行末尾有一个页码(将PDF转换为文本并在页面末尾有页码)。我想删除这些特定的页码整数而不是文本,因为可以有页码50,但也可以是一行,其中可以有50作为整数。所以我想只删除页码为整数的行。
文本文档示例:
1
militant Muslims use scriptures such as the
Genesis story describing the destruction of
Sodom and Gomorrah as justification (from Allah)
for the hatred they vent on all things non-
Muslim and especially on gay men.
2
A Word from the Author
Today, in the 21st Century the majority of Muslims
hold middle
3
Into The Darkness
the driver assured the exhausted travelers who
were dozing fitfully in the rear of the van, they
4
down. It blocked the narrow road.
Ali Azzizi was the other man accompanying
the women.
5
我希望从1-5中删除这些页码,但如果这些相同的号码出现在行之间的任何位置,则不应将其删除。
我的代码
filename = input('filname')
filedata = None
temp = 1
with open(filename, 'r', encoding="utf8") as file:
filedata = file.read()
filedata.join(line.strip() for line in file)
rahul = ' '
for line in file:
if(line=='1'):
filedata = filedata.replace(line, ' ')
with open(filename, 'w', encoding="utf8") as file:
file.write(filedata)
答案 0 :(得分:1)
如果不强制使用python,则可以使用grep -v '^[0-9][\s]*' test.txt
。
cristian@nb:~/$ grep -v '^[0-9][\s]*' test.txt
militant Muslims use scriptures such as the
Genesis story describing the destruction of
Sodom and Gomorrah as justification (from Allah)
for the hatred they vent on all things non-
Muslim and especially on gay men.
A Word from the Author
Today, in the 21st Century the majority of Muslims
hold middle
Into The Darkness
the driver assured the exhausted travelers who
were dozing fitfully in the rear of the van, they
down. It blocked the narrow road.
Ali Azzizi was the other man accompanying
the women.