import re
fp1 = open('stopwords.txt','r')
stop = fp1.readline()
#print(stop)
def passstopwords(getstopwords):
stopword = getstopwords
#print(stopword)
fp = open('read1.txt', 'r')
line = fp.readline
while line:
line = fp.readline()
print(getstopwords)
line = re.sub(getstopwords, r'', line)
print(line)
fp.close()
return;
passstopwords(stop)
我得到的输出是同一行,没有任何变化。但是,如果我写'somestring'
而不是'getstopwords'
,它的工作正常。
答案 0 :(得分:0)
我的输入文件是SAMPLE.TXT,内容如下
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap
into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets con
stopwords.txt
Lorem
simply
book
printing
代码为:
import re
fp1 = open('stopwords.txt','r')
lisOfStopWords = fp1.readlines()
fp1.close()
def passstopwords(lisOfStopWords):
stopwords = "|".join([x.strip() for x in lisOfStopWords])
print("Stopwords:" + stopwords)
fp = open('SAMPLE.TXT', 'r')
stopWordPattern = r"%(stopwords)s" % {'stopwords' : stopwords}
for line in fp.readlines():
print("ORIGINAL:" + line.strip())
line = re.sub(stopWordPattern, r'', line)
print("REPLACED:"+ line)
fp.close()
return;
passstopwords(lisOfStopWords)
输出为:
Stopwords:Lorem|simply|book|printing
ORIGINAL:Lorem Ipsum is simply dummy text of the printing and typesetting industry.
REPLACED: Ipsum is dummy text of the and typesetting industry.
ORIGINAL:Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
REPLACED: Ipsum has been the industry's standard dummy text ever since the 1500s,
ORIGINAL:when an unknown printer took a galley of type and scrambled it to make a type
REPLACED:when an unknown printer took a galley of type and scrambled it to make a type
ORIGINAL:specimen book. It has survived not only five centuries, but also the leap
REPLACED:specimen . It has survived not only five centuries, but also the leap
ORIGINAL:into electronic typesetting, remaining essentially unchanged.
REPLACED:into electronic typesetting, remaining essentially unchanged.
ORIGINAL:It was popularised in the 1960s with the release of Letraset sheets con
REPLACED:It was popularised in the 1960s with the release of Letraset sheets con
如您所见,Lorem
或simply
或book
或printing
将被替换。