我如何在python中执行此操作?
badphrases.txt包含
Go away
Don't do that
Stop it
allphrases.txt包含
I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice
我希望allphrases.txt能够清除badphrases.txt中的行。
它在bash中是微不足道的
cat badfiles.txt | while read b
do
cat allphrases.txt | grep -v "$b" > tmp
cat tmp > allphrases.txt
done
噢,你以为我没有看过或试过。我搜索过了一个小时。
这是我的代码:
# Files
ttv = "/tmp/tv.dat"
tmp = "/tmp/tempfile"
bad = "/tmp/badshows"
badfiles已存在
......这里的代码创建了ttv
# Function grep_v
def grep_v(f,str):
file = open(f, "r")
for line in file:
if line in str:
return True
return False
t = open(tmp, 'w')
tfile = open(ttv, "r")
for line in tfile:
if not grep_v(bad,line):
t.write(line)
tfile.close
t.close
os.rename(tmp, ttv)
答案 0 :(得分:0)
首先谷歌如何在python中读取文件:
你可能会得到这样的东西:How do I read a file line-by-line into a list?
使用它来读取列表中的文件
for line in allphrases:
flag = True
for badphrase in badphrases:
if badphrase in line:
flag = False
break
if flag:
print(line)
现在你有两个列表中的内容。
迭代所有的短语并检查是否存在来自坏语的短语。
此时您可能会考虑使用google:
从这些地方获取代码并构建一个像这样的暴力算法:
$ips = is_array($ip);
foreach ($ips as $key=>$value) {
$records[] = geoip_record_by_addr($gi, $value);
}
如果你能理解这段代码,那么你会注意到你需要用输出替换print到file:
然后考虑如何改进算法。一切顺利。
更新:
@COLDSPEED建议您可以简单的谷歌 - 如何在python中替换文件中的行:
你可能会得到这样的结果:Search and replace a line in a file in Python
哪个也有效。
答案 1 :(得分:0)
解决方案也不错。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import feedparser, os, re
# Files
h = os.environ['HOME']
ttv = h + "/WEB/Shows/tv.dat"
old = h + "/WEB/Shows/old.dat"
moo = h + "/WEB/Shows/moo.dat"
tmp = h + "/WEB/Shows/tempfile"
bad = h + "/WEB/Shows/badshows"
# Function not_present
def not_present(f,str):
file = open(f, "r")
for line in file:
if str in line:
return False
return True
# Sources (shortened)
sources = ['http://predb.me/?cats=tv&rss=1']
# Grab all the feeds and put them into ttv and old
k = open(old, 'a')
f = open(ttv, 'a')
for h in sources:
d = feedparser.parse(h)
for post in d.entries:
if not_present(old,post.link):
f.write(post.title + "|" + post.link + "\n")
k.write(post.title + "|" + post.link + "\n")
f.close
k.close
# Remove shows without [Ss][0-9] and put them in moo
m = open(moo, 'a')
t = open(tmp, 'w')
file = open(ttv, "r")
for line in file:
if re.search(r's[0-9]', line, re.I) is None:
m.write(line)
# print("moo", line)
else:
t.write(line)
# print("tmp", line)
t.close
m.close
os.rename(tmp, ttv)
# Remove badshows
t = open(tmp, 'w')
with open(bad) as f:
content = f.readlines()
bap = [x.strip() for x in content]
with open(ttv) as f:
content = f.readlines()
all = [x.strip() for x in content]
for line in all:
flag = True
for b in bap:
if b in line:
flag = False
break
if flag:
t.write(line + "\n")
t.close
os.rename(tmp, ttv)