使用python替换字符串中的单词

时间:2016-03-04 06:32:27

标签: python

import re

fp1 = open('stopwords.txt','r')
stop = fp1.readline()
#print(stop)

def passstopwords(getstopwords):
    stopword = getstopwords
    #print(stopword)
    fp = open('read1.txt', 'r')
    line = fp.readline
    while line:
        line = fp.readline()
        print(getstopwords)
        line = re.sub(getstopwords, r'', line)
        print(line)
    fp.close()
    return;

passstopwords(stop)

我得到的输出是同一行,没有任何变化。但是,如果我写'somestring'而不是'getstopwords',它的工作正常。

1 个答案:

答案 0 :(得分:0)

我的输入文件是SAMPLE.TXT,内容如下

Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap
into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets con

stopwords.txt

Lorem
simply
book
printing

代码为:

import re
fp1 = open('stopwords.txt','r')
lisOfStopWords = fp1.readlines()
fp1.close()

def passstopwords(lisOfStopWords):
    stopwords = "|".join([x.strip() for x in lisOfStopWords])
    print("Stopwords:" + stopwords)
    fp = open('SAMPLE.TXT', 'r')
    stopWordPattern = r"%(stopwords)s" % {'stopwords' : stopwords}
    for line in fp.readlines():
        print("ORIGINAL:" + line.strip())
        line = re.sub(stopWordPattern, r'', line)
        print("REPLACED:"+ line)
    fp.close()
    return;

passstopwords(lisOfStopWords)

输出为:

Stopwords:Lorem|simply|book|printing
ORIGINAL:Lorem Ipsum is simply dummy text of the printing and typesetting industry.
REPLACED: Ipsum is  dummy text of the  and typesetting industry.

ORIGINAL:Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
REPLACED: Ipsum has been the industry's standard dummy text ever since the 1500s,

ORIGINAL:when an unknown printer took a galley of type and scrambled it to make a type
REPLACED:when an unknown printer took a galley of type and scrambled it to make a type

ORIGINAL:specimen book. It has survived not only five centuries, but also the leap
REPLACED:specimen . It has survived not only five centuries, but also the leap

ORIGINAL:into electronic typesetting, remaining essentially unchanged.
REPLACED:into electronic typesetting, remaining essentially unchanged.

ORIGINAL:It was popularised in the 1960s with the release of Letraset sheets con
REPLACED:It was popularised in the 1960s with the release of Letraset sheets con

如您所见,Loremsimplybookprinting将被替换。