Question

我正在开发一个文本搜索项目。我有2个清单。

a = ['ibm','dell']
b =['strength','keyword']##this is a list of keywords given by the user

现在我创建搜索谷歌的组合。

lst = list(itertools.product(a, b))

我需要的帮助如下：使用代码，我将使用不同的关键字及其引理搜索文本。之后，我需要将搜索到的文本写入excel文件。我需要使用列表A中的名称创建工作表，并仅将搜索到的文本写入不同的工作表中。我无法想象。下面是我的代码的一部分。

def getarticle(url,n):
    final =[]
    regex ='(.*).pdf'
    pattern = re.compile(regex)
    if re.match(pattern,url) is not None:
        text = pdf_to_text(url)
        final.append('')
        final.append(url)
        final.append(text)
        New_file = open((('text' + str((round(random.random(),2))) + '.txt')),'w+')
        New_file.write(smart_str(unicode(text,'utf-8')))
        New_file.close()
    else:
        br = mechanize.Browser()
        br.set_handle_robots(False)
        br.addheaders = [('User-agent','Chrome')]
        html = br.open(url).read()
        titles = br.title()
        readable_article= Document(html).summary()
        readable_title = Document(html).short_title()
        soup = bs4.BeautifulSoup(readable_article)
        Final_Article = soup.text
        final.append(titles)
        final.append(url)
        final.append(Final_Article)
        raw = nltk.clean_html(html)
        cleaned = re.sub(r'& ?(ld|rd)quo ?[;\]]', '\"', raw)
        tokens = nltk.wordpunct_tokenize(raw)
        lmtzr = WordNetLemmatizer()
        t = [lmtzr.lemmatize(t) for t in tokens]
        text = nltk.Text(t)
        word = words(n)
        find = ' '.join(str(e) for e in word)
        search_words = set(find.split(' '))
        sents = ' '.join([s.lower() for s in text])
        blob = TextBlob(sents.decode('ascii','ignore'))
        matches = [map(str, blob.sentences[i-1:i+2])     # from prev to after next
                for i, s in enumerate(blob.sentences) # i is index, e is element
                if search_words & set(s.words)]
        return  ''.join (str(y).replace('& rdquo','').replace('& rsquo','') for y in matches)

这将返回我需要写入excel文件的文本，我无法编码。

Answer 1

就将文本写入Excel可以阅读的文件而言，您可能希望查看Python的csv库，它提供了许多有用的.csv操作工具。

Excel列表迭代

1 个答案: