Question

我尝试使用yield创建一个单词生成器并将每个项目写入文件，但是在文件输出中，当我写入文件时，我得到了以下内容：< / p>

         C   sh   t  d t d t d d � d d � �< } x2 t | j �  � |  k r] | j t t �  � � q, WWd  QXd  S(   Ns   bfDict-t
   use_stringt   lengthi
   s   .txts   a+(   t   openR   t   Truet   lent      readlinest   writet   nextR   (   t     max_wordst   lib(    (    s[   C:\Users\z-perkins-thomas\Documents\bin\python\HashKing\lib\attacks\bruteforce\bf_attack.pyt   create_wordlist   s    )(
   t   ost   stringt   randomR   t   lib.algorithms.hashing_algst   lib.settingsR   t   FalseR   R   (    (    (    s[   C:\Users\z-perkins-thomas\Documents\bin\python\HashKing\lib\attacks\bruteforce\bf_attack.pyt   <module>   s   
l2\colorlog\colorlog\logging.pyt   wrapper   s    
(   t      functoolst   wraps(   R   R   (    (   R   sT   c:\users\z-perk~1\appdata\local\temp\1\pip-build-rtaul2\colorlog\colorlog\logging.pyt   ensure_configured   s    (   t   __doc__t
   __future__R    R   R   t   colorlog.colorlogR   R   R   R   R   t       getLoggert   debugt   infot   warningt   errort   criticalt   logt      exceptiont
   StreamHandler(    (    (    sT   c:\users\z-perk~1\appdata\local\temp\1\pip-build-rtaul2\colorlog\colorlog\logging.pyt   <module>   s"           
            s"   C:\Python27\lib\ctypes\wintypes.pyR   g   s            t   _COORDc           B   s    e  Z d  e f d e f g Z RS(   t   Xt   Y(   R   R     R   R   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyR   n   s      t   POINTc           B   s    e  Z d  e f d e f g Z RS(   t   xt   y(   R   R      R   R   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyR   r   s      t   SIZEc           B   s    e  Z d  e f d e f g Z RS(   t   cxt   cy(   R   R     R   R   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyR   w   s      c         C   s   |  | d >| d >S(   Ni   i   (    (   t   redt   greent   blue(    (    s"   C:\Python27\lib\ctypes\wintypes.pyt   RGB|   s    t   FILETIMEc           B   s    e  Z d  e f d e f g Z RS(   t
   dwLowDateTimet   dwHighDateTime(   R   R    t   DWORDR   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyR%      s     t   MSGc           B   sD   e  Z d  e f d e f d e f d e f d e f d e f g Z RS(   t   hWndt   messaget   wParamt   lParamt   timet   pt(     R   R       t   HWNDt   c_uintt   WPARAMt   LPARAMR(   R   R   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyR)   �   s                      i  t   WIN32_FIND_DATAAc           B   sp   e  Z d  e f d e f d e f d e f d e f d e f d e f d e f d e e f d  e d
 f g
 Z RS(   t   dwFileAttributest   ftCreationTimet   ftLastAccessTimet   ftLastWriteTimet
   nFileSizeHight   nFileSizeLowt   dwReserved0t   dwReserved1t    cFileNamet   cAlternateFileNamei   (   R   R    R(   R%   t   c_chart   MAX_PATHR   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyR4   �   s                                 
t   WIN32_FIND_DATAWc           B   sp   e  Z d  e f d e f d e f d e f d e f d e f d e f d e f d e e f d     e d
 f g
 Z RS(   R5   R6   R7   R8   R9   R:   R;   R<   R=   R>   i   (   R   R       R(   R%   t   c_wcharR@   R   (    (    (    s"   C:\Python27\lib\ctypes\wintypes.pyRA   �   s                                   
t   ATOMt   BOOLt   BOOLEANt   BYTEt   CO

我的生成器看起来像这样：

import itertools

def word_generator(length_min=6, length_max=12, perms=False):
    chrs = 'abc'
    for n in range(length_min, length_max + 1):
        for xs in itertools.product(chrs, repeat=n):
            yield ''.join(xs)


def create_wordlist(max_words=100000):
    with open("words.txt", "a+") as lib:
        while len(lib.readlines()) <= max_words:
                lib.write(next(word_generator()))

导致此文件中的奇怪输出的原因是什么？

Answer 1

我只能猜到这个问题，但是从你的代码中可以看出这些问题：

您的文本编辑器或shell的编码可以设置为与ASCII编码不兼容的编码。

如果您碰巧使用文本编辑器打开文件，则应检查文本编辑器的编码。或者，如果您正好在shell中读取文件，请检查您正在使用的shell的编码。

如果您正在使用Python 2.X，并且如果您没有更改系统中的默认编码，那么您的字符串将以ASCII格式写入文件。在3.X中，这略有不同，对于open，您可以明确指定编码：open('...', '+a', encoding='utf-8')。因此，请尝试在open中指定3.X中文件的编码，看看如果您使用3.X会发生什么。

Answer 2

首先，当我运行代码时，我没有得到你发布的内容。该程序进入了一个无限循环，在'words.txt'文件中放入'a'字符。我不知道是什么导致了你发布的奇怪的字符串，但我可以看到你的代码有3个问题。

您的word_generator似乎没问题。问题出在create_wordlist。

问题1： 这段代码next(word_generator())而不是获取现有序列的下一个元素，创建一个新序列，然后获取它的下一个元素。 Sice它是一个全新的序列，它的下一个元素是它的第一个元素，在这种情况下是'aaaaaa'。如果不是每次迭代都创建一个新的sequance，你应该只创建一次，然后反复调用next。示例如下所示：

wgen = word_generator()
wilhe some_condition:
    lib.write(next(wgen))

问题2： 由于您尝试按lib.readlines()的大小计算单词，我相信您希望该文件每行只有一个单词，但这不是lib.write(next(word_generator()))行，因为没有'\ n'字符是写的。您应该在代码中添加行lib.write('\n')，或者如果您希望每行包含一个单词，则在该单词后附加一个'\ n'字符：

wgen = word_generator()
wilhe some_condition:
    lib.write(next(wgen) + '\n')

问题3： 当您在“a +”模式下打开“words.txt”时，流位置将设置为文件的末尾，随后调用lib.write()会保持此行为。因此，对lib.readlines()的调用将读取从文件末尾开始的行，因此总是返回一个大小为零的空数组。这会使您的while len(lib.readlines()) <= max_words:成为无限循环。

要解决此问题，您应该找到另一种方法来计算文件中的字数，或者在调用lib.seek(0, 0)之前使用lib.readlines()前往文件的开头（See doc on seek）< / p>

由于每次迭代读取文件的所有行都非常有用，我在下面的解决方案中采用了另一种方法。我只计算了一次初始线数：

def create_wordlist(max_words=100000):
    with open("words.txt", "a+") as lib:
        wgen = word_generator() # Creates the sequence of words

        lib.seek(0, 0)  # Goes to the begining of the file
        line_count = len(lib.readlines())   # Counts how many lines the file has

        # lib.readlines() set the stream position to the end,
        #   so now following 'lib.write()' calls will write to the end as expected.

        # For each missing line before reaching 'max_words' lines
        for i in range(line_count, max_words):
            lib.write(next(wgen) + '\n')    # Writes the next word in the sequence

用yield创建一个单词生成器

2 个答案: