从基于另一个文件的文件中删除短语(Python)

时间:2017-08-08 04:47:39

标签: python

我如何在python中执行此操作?

badphrases.txt包含

Go away
Don't do that
Stop it

allphrases.txt包含

I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice

我希望allphrases.txt能够清除badphrases.txt中的行。

它在bash中是微不足道的

cat badfiles.txt | while read b
do
cat allphrases.txt | grep -v "$b" > tmp
cat tmp > allphrases.txt
done
噢,你以为我没有看过或试过。我搜索过了一个小时。

这是我的代码:

# Files  
ttv = "/tmp/tv.dat"  
tmp = "/tmp/tempfile"  
bad = "/tmp/badshows"  

badfiles已存在
......这里的代码创建了ttv

# Function grep_v  
def grep_v(f,str):  
     file = open(f, "r")   
     for line in file:  
          if line in str:  
               return True  
     return False  

t = open(tmp, 'w')  
tfile = open(ttv, "r")   
for line in tfile:  
     if not grep_v(bad,line):  
          t.write(line)  
tfile.close  
t.close  
os.rename(tmp, ttv)  

2 个答案:

答案 0 :(得分:0)

首先谷歌如何在python中读取文件:

你可能会得到这样的东西:How do I read a file line-by-line into a list?

使用它来读取列表中的文件

for line in allphrases:
    flag = True
    for badphrase in badphrases:
        if badphrase in line:
            flag = False
            break
    if flag:
        print(line)

现在你有两个列表中的内容。

迭代所有的短语并检查是否存在来自坏语的短语。

此时您可能会考虑使用google:

  • 如何迭代列表python
  • 如何检查字符串是否存在于另一个字符串python中

从这些地方获取代码并构建一个像这样的暴力算法:

        $ips = is_array($ip);
        foreach ($ips as $key=>$value) {
            $records[] = geoip_record_by_addr($gi, $value);
        }

如果你能理解这段代码,那么你会注意到你需要用输出替换print到file:

  • 现在google如何打印到文件python。

然后考虑如何改进算法。一切顺利。

更新:

@COLDSPEED建议您可以简单的谷歌   - 如何在python中替换文件中的行:

你可能会得到这样的结果:Search and replace a line in a file in Python

哪个也有效。

答案 1 :(得分:0)

解决方案也不错。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import feedparser, os, re

# Files
h = os.environ['HOME']
ttv = h + "/WEB/Shows/tv.dat"
old = h + "/WEB/Shows/old.dat"
moo = h + "/WEB/Shows/moo.dat"
tmp = h + "/WEB/Shows/tempfile"
bad = h + "/WEB/Shows/badshows"

# Function not_present
def not_present(f,str):
     file = open(f, "r") 
     for line in file:
          if str in line:
               return False
     return True

# Sources (shortened)
sources = ['http://predb.me/?cats=tv&rss=1']

# Grab all the feeds and put them into ttv and old
k = open(old, 'a')
f = open(ttv, 'a')
for h in sources:
     d = feedparser.parse(h)
     for post in d.entries:
          if not_present(old,post.link):
               f.write(post.title + "|" +  post.link + "\n")
               k.write(post.title + "|" +  post.link + "\n")
f.close
k.close

# Remove shows without [Ss][0-9] and put them in moo
m = open(moo, 'a')
t = open(tmp, 'w')
file = open(ttv, "r") 
for line in file:
     if re.search(r's[0-9]', line, re.I) is None:
          m.write(line)
#          print("moo", line)
     else:
          t.write(line)
#          print("tmp", line)
t.close
m.close
os.rename(tmp, ttv)

# Remove badshows
t = open(tmp, 'w')
with open(bad) as f:
    content = f.readlines()
bap = [x.strip() for x in content] 

with open(ttv) as f:
    content = f.readlines()
all = [x.strip() for x in content] 

for line in all:
    flag = True
    for b in bap:
        if b in line:
            flag = False
            break
    if flag:
         t.write(line + "\n")
t.close
os.rename(tmp, ttv)