Question

我有一个文本文件，我想显示包含z和x字符的所有单词。

我该怎么做？

Answer 1

如果您不想遇到2个问题：

for word in file('myfile.txt').read().split():
    if 'x' in word and 'z' in word:
        print word

Answer 2

假设您将整个文件作为内存中的一个大字符串，并且单词的定义是“一个连续的字母序列”，那么您可以执行以下操作：

import re
for word in re.findall(r"\w+", mystring):
    if 'x' in word and 'z' in word:
        print word

Answer 3

>>> import re
>>> pattern = re.compile('\b(\w*z\w*x\w*|\w*x\w*z\w*)\b')
>>> document = '''Here is some data that needs
... to be searched for words that contain both z
... and x.  Blah xz zx blah jal akle asdke asdxskz
... zlkxlk blah bleh foo bar'''
>>> print pattern.findall(document)
['xz', 'zx', 'asdxskz', 'zlkxlk']

Answer 4

我只是想指出，与简单的string methods-based solution provided by Wooble相比，这些正则表达式中的一些是多么严厉。

我们做一些时间，不是吗？

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import timeit
import re
import sys

WORD_RE_COMPILED = re.compile(r'\w+')
Z_RE_COMPILED = re.compile(r'(\b\w*z\w*\b)')
XZ_RE_COMPILED = re.compile(r'\b(\w*z\w*x\w*|\w*x\w*z\w*)\b')

##########################
# Tim Pietzcker's solution
# https://stackoverflow.com/questions/3962846/how-to-display-all-words-that-contain-these-characters/3962876#3962876
#
def xz_re_word_find(text):
    for word in re.findall(r'\w+', text):
        if 'x' in word and 'z' in word:
            print word


# Tim's solution, compiled
def xz_re_word_compiled_find(text):
    pattern = re.compile(r'\w+')
    for word in pattern.findall(text):
        if 'x' in word and 'z' in word:
            print word


# Tim's solution, with the RE pre-compiled so compilation doesn't get
# included in the search time
def xz_re_word_precompiled_find(text):
    for word in WORD_RE_COMPILED.findall(text):
        if 'x' in word and 'z' in word:
            print word


################################
# Steven Rumbalski's solution #1
# (provided in the comment)
# https://stackoverflow.com/questions/3962846/how-to-display-all-words-that-contain-these-characters/3963285#3963285
def xz_re_z_find(text):
    for word in re.findall(r'(\b\w*z\w*\b)', text):
        if 'x' in word:
            print word


# Steven's solution #1 compiled
def xz_re_z_compiled_find(text):
    pattern = re.compile(r'(\b\w*z\w*\b)')
    for word in pattern.findall(text):
        if 'x' in word:
            print word


# Steven's solution #1 with the RE pre-compiled
def xz_re_z_precompiled_find(text):
    for word in Z_RE_COMPILED.findall(text):
        if 'x' in word:
            print word


################################
# Steven Rumbalski's solution #2
# https://stackoverflow.com/questions/3962846/how-to-display-all-words-that-contain-these-characters/3962934#3962934
def xz_re_xz_find(text):
    for word in re.findall(r'\b(\w*z\w*x\w*|\w*x\w*z\w*)\b', text):
        print word


# Steven's solution #2 compiled
def xz_re_xz_compiled_find(text):
    pattern = re.compile(r'\b(\w*z\w*x\w*|\w*x\w*z\w*)\b')
    for word in pattern.findall(text):
        print word


# Steven's solution #2 pre-compiled
def xz_re_xz_precompiled_find(text):
    for word in XZ_RE_COMPILED.findall(text):
        print word


#################################
# Wooble's simple string solution
def xz_str_find(text):
    for word in text.split():
        if 'x' in word and 'z' in word:
            print word


functions = [
        'xz_re_word_find',
        'xz_re_word_compiled_find',
        'xz_re_word_precompiled_find',
        'xz_re_z_find',
        'xz_re_z_compiled_find',
        'xz_re_z_precompiled_find',
        'xz_re_xz_find',
        'xz_re_xz_compiled_find',
        'xz_re_xz_precompiled_find',
        'xz_str_find'
]

import_stuff = functions + [
        'text',
        'WORD_RE_COMPILED',
        'Z_RE_COMPILED',
        'XZ_RE_COMPILED'
]


if __name__ == '__main__':

    text = open(sys.argv[1]).read()
    timings = {}
    setup = 'from __main__ import ' + ','.join(import_stuff)
    for func in functions:
        statement = func + '(text)'
        timer = timeit.Timer(statement, setup)
        min_time = min(timer.repeat(3, 10))
        timings[func] = min_time


    for func in functions:
        print func + ":", timings[func], "seconds"

在从plaintext copy of Moby Dick获得的Project Gutenberg上运行此脚本，在Python 2.6上，我得到以下时间：

xz_re_word_find: 1.21829485893 seconds
xz_re_word_compiled_find: 1.42398715019 seconds
xz_re_word_precompiled_find: 1.40110301971 seconds
xz_re_z_find: 0.680151939392 seconds
xz_re_z_compiled_find: 0.673038005829 seconds
xz_re_z_precompiled_find: 0.673489093781 seconds
xz_re_xz_find: 1.11700701714 seconds
xz_re_xz_compiled_find: 1.12773990631 seconds
xz_re_xz_precompiled_find: 1.13285303116 seconds
xz_str_find: 0.590088844299 seconds

在Python 3.1中（使用2to3修复print语句后），我得到以下时间：

xz_re_word_find: 2.36110496521 seconds
xz_re_word_compiled_find: 2.34727501869 seconds
xz_re_word_precompiled_find: 2.32607793808 seconds
xz_re_z_find: 1.32204890251 seconds
xz_re_z_compiled_find: 1.34104800224 seconds
xz_re_z_precompiled_find: 1.34424304962 seconds
xz_re_xz_find: 2.33851099014 seconds
xz_re_xz_compiled_find: 2.29653286934 seconds
xz_re_xz_precompiled_find: 2.32416701317 seconds
xz_str_find: 0.656699895859 seconds

我们可以看到，基于正则表达式的函数运行时间往往是Python 2.6中基于字符串方法的函数运行的两倍，而Python 3的运行时间则是3倍。时间差异对于一个人来说是微不足道的。解析（没有人会错过那些毫秒），但是对于必须多次调用函数的情况，基于字符串方法的方法既简单又快。

Answer 5

我不知道这台发电机的性能，但对我来说就是这样：

from __future__ import print_function
import string

bookfile = '11.txt' # Alice in Wonderland
hunted = 'az' # in your case xz but there is none of those in this book

with open(bookfile) as thebook:
    # read text of book and split from white space
    print('\n'.join(set(word.lower().strip(string.punctuation)
                    for word in thebook.read().split()
                    if all(c in word.lower() for c in hunted))))
""" Output:
zealand
crazy
grazed
lizard's
organized
lazy
zigzag
lizard
lazily
gazing
""

“

Answer 6

听起来像Regular Expressions的工作。阅读并尝试一下。如果您遇到问题，请更新您的问题，我们可以帮助您了解详细信息。

Answer 7

>>> import re
>>> print re.findall('(\w*x\w*z\w*|\w*z\w*x\w*)', 'axbzc azb axb abc axzb')
['axbzc', 'axzb']

如何显示包含这些字符的所有单词？

7 个答案: