将两个字符更改为一个符号(Python)

时间:2018-06-19 16:22:06

标签: python python-3.x file compression

我目前正在为学校执行文件压缩任务,但我发现自己无法理解这段代码中正在发生的事情(更具体地说,是ISN没有发生什么以及为什么没有发生)。

所以在代码的这一部分,我的目标是以非编码的方式将两个相同的相邻字母更改为一个符号,从而占用更少的内存:

          for i, word in enumerate(file_contents): 
           #file_contents = LIST of words in any given text file       

                word_contents = (file_contents[i]).split()
                for ind,letter in enumerate(word_contents[:-1]):
                    if word_contents[ind] == word_contents[ind+1]:
                         word_contents[ind] = ''
                         word_contents[ind+1] = '★'

但是,当我使用示例文本文件运行完整代码时,它似乎没有按照我的指示执行。例如,单词“ Sally”应为“ Sa★y”,但应保持不变。 谁能帮助我走上正确的轨道?

编辑:我错过了一个非常关键的细节。我希望压缩后的字符串以某种方式重新出现在原始文件名目录中,该列表中有双字母,因为完全压缩算法的目的是在输入文件中返回文本的压缩版本。

2 个答案:

答案 0 :(得分:1)

我建议使用regex来匹配相同的相邻字符。

示例

import re

txt = 'sally and bobby'
print(re.sub(r"(.)\1", '*', txt))

# sa*y and bo*y

不需要代码中的循环和条件检查。使用下面的行代替:

word_contents = re.sub(r"(.)\1", '*', word_contents)

答案 1 :(得分:1)

您的代码有些错误(我认为)。

1)split生成一个列表,而不是一个str,所以当您说这个枚举(word_contents [:-1])时,您似乎在假设要获取一个字符串?!无论如何...我不确定是不是。

但是然后!

2)这一行:

if word_contents[ind] == word_contents[ind+1]:
                   word_contents[ind] = ''
                   word_contents[ind+1] = '★'

您正在重新操作列表。很明显,您想对字符串或要处理的单词中的字符列表进行操作。最好的情况下,此功能什么都不做,最坏的情况下,您正在破坏单词内容列表。

因此,当您执行修改时,实际上是在查看word_contents列表,而不是列表项[:-1]。还有更多的问题,但是我认为这可以回答您的问题(希望如此)

如果您真的想了解自己在做错什么,我建议您在做的事情中加上打印语句。如果您正在寻找某人为您做功课,那么我猜还有另一种答案已经给您了。

这里是如何向功能添加日志记录的示例

  for i, word in enumerate(file_contents): 
   #file_contents = LIST of words in any given text file       

        word_contents = (file_contents[i]).split()
        # See what the word content list actually is
        print(word_contents)
        # See what your slice is actually returning
        print(word_contents[:-1])
        # Unless you have something modifying your list elsewhere you probably want to iterate over the words list generally and not just the slice of it as well.
        for ind,letter in enumerate(word_contents[:-1]):
            # See what your other test is testing
            print(word_contents[ind], word_contents[ind+1])
            # Here you probably actually want
            # word_contents[:-1][ind]
            # which is the list item you iterate over and then the actual string I suspect you get back
            if word_contents[ind] == word_contents[ind+1]:
                 word_contents[ind] = ''
                 word_contents[ind+1] = '★'

更新:基于OP的后续问题,我制作了一个带有说明的示例程序。请注意,这不是最佳解决方案,而主要是在教授流控制和使用基本结构方面的练习。

# define the initial data...
file = "sally was a quick brown fox and jumped over the lazy dog which we'll call billy"
file_contents = file.split()

# Enumerate isn't needed in your example unless you intend to use the index later (example below)
for list_index, word in enumerate(file_contents):

# changing something you iterate over is dangerous and sometimes confusing like in your case you iterated over 
# word contents and then modified it.  if you have to take
# two characters you change the index and size of the structure making changes potentially invalid. So we'll create a new data structure to dump the results in
    compressed_word = []

    # since we have a list of strings we'll just iterate over each string (or word) individually
    for character in word:
        # Check to see if there is any data in the intermediate structure yet if not there are no duplicate chars yet
        if compressed_word:
            # if there are chars in new structure, test to see if we hit same character twice 
            if character == compressed_word[-1]:
                # looks like we did, replace it with your star
                compressed_word[-1] = "*"
                # continue skips the rest of this iteration the loop
                continue
        # if we haven't seen the character before or it is the first character just add it to the list
        compressed_word.append(character)

    # I guess this is one reason why you may want enumerate, to update the list with the new item?
    # join() is just converting the list back to a string
    file_contents[list_index] = "".join(compressed_word)

# prints the new version of the original "file" string
print(" ".join(file_contents))

输出:"sa*y was a quick brown fox and jumped over the lazy dog which we'* ca* bi*y"