如何在文本文件中查找短语并在之前/之后删除?

时间:2017-07-24 08:15:15

标签: python csv

我想找一个短语:"删除这个"。我想只保留两次出现的短语,并删除其他所有内容。

text.text.text.text
text.text.text.text
text.text.text.text
text.text.text.text
delete this
text.text.text.text
text.text.text.text
text.text.text.text
delete this
text.text.text.text
text.text.text.text

这是我目前的代码:

import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import sys
import re

def to_2d(l,n):
    return [l[i:i+n] for i in range(0, len(l), n)]

f = open('air.txt', 'r')
x = f.readlines()

filename=r'output.csv'

resultcsv = open(filename,"wb")
output = csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')

maindatatable = to_2d(x, 4)
    if 'delete this' in maindatatable.text:
                stop = 1
                break

print maindatatable
output.writerows(maindatatable)

resultcsv.close()

3 个答案:

答案 0 :(得分:1)

您可以使用str.split

with open('air.txt', 'r') as f:
    x = f.read()

req_text = x.split('delete this')[1: -1]

data = []
for text in req_text:
    for line in text.strip().splitlines():
        data.append([line])

要写入csv文件,只需打开它并拨打writer.writerows

with open('output.csv', "wb") as f
    output = csv.writer(f, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
    output.writerows(data) 

将其保存到文件中:

text.text.text.text
text.text.text.text
text.text.text.text

使用delete代替delete this

req_text = x.split('delete')[1: -1]

data = []
for text in req_text:
    text = text.split('\n', 1)[1]
    for line in text.strip().splitlines():
        data.append([line])

答案 1 :(得分:0)

这是一个带开关的基本结构。即使有多个delete_this对,它也应该有效:

read = False
with open('data.txt') as txt:
    for line in txt:
        if line.strip() == 'delete this':
            read = not read
        elif read:
            print line,

data.txt为:

text.text.text.text1
text.text.text.text2
text.text.text.text3
text.text.text.text4
delete this
text.text.text.text5
text.text.text.text6
text.text.text.text7
delete this
text.text.text.text8
text.text.text.text9

输出:

text.text.text.text5
text.text.text.text6
text.text.text.text7

答案 2 :(得分:0)

我现在要假设分隔符是完整的行。这是达到你想要的一种方式:

import sys
delimiter = "delete this\n"
result = []
with open('air.txt', 'r') as inf:
    for line in inf:
        if line == delimiter:
            break
    else:
        sys.exit("opening delimiter missing")
    for line in inf:
        if line != delimiter:
            result.append(line)
        else:
            break
    else:
        sys.exit("closing delimiter missing")

只有在循环中没有执行else语句时,才会执行附加到for语句的break子句。这可以确保各种奇怪的文件末尾条件不会弄乱您的逻辑。

with语句是一种使文件可用的便捷方式,并确保无论发生什么,它都会在使用后正确关闭。

result列表可以转换为带有简单构造的字符串:

output = "".join(result)