Question

我希望在我的代码的这一部分中，删除我从读取文件中获得的单词中的任何非字母符号。

我知道可能有一个空字符串正在测试，错误发生了，

但我无法弄清楚为什么我尝试了许多不同的代码。

这就是我现在所拥有的：

for i in given_file:

    cut_it_out = True

    while cut_it_out:
        if len(i) == 0:
            cut_it_out = False
        else:
            while (len(i) != 0) and cut_it_out:
                if i.lower()[0].isalpha() and i.lower()[len(i) - 1].isalpha():
                    cut_it_out = False

                if (not i.lower()[len(i) - 1].isalpha()):
                    i = i[:len(i) - 2]
                if (not i.lower()[0].isalpha()):
                    i = i[1:]

任何人都可以帮我解决这个问题吗？感谢。

感谢有趣的答案:)，我希望它更精确，但是我无法摆脱无限循环问题。

任何人都可以帮我搞清楚吗？

all_words = {} # New empty dictionary
for i in given_file:
    if "--" in i:
        split_point = i.index("--")
        part_1 = i[:split_point]
        part_2 = i[split_point + 2:]
        combined_parts = [part_1, part_2]

        given_file.insert(given_file.index(i)+2, str(part_1))
        given_file.insert(given_file.index(part_1)+1, str(part_2))
        #given_file.extend(combined_parts)
        given_file.remove(i)
        continue


    elif len(i) > 0:
        if i.find('0') == -1 and i.find('1') == -1 and i.find('2') == -1 and i.find('3') == -1 and i.find('4') == -1\
            and i.find('5') == -1 and i.find('6') == -1 and i.find('7') == -1 and i.find('8') == -1 and i.find('9') == -1:
            while not i[:1].isalpha():
                i = i[1:]

            while not i[-1:].isalpha():
                i = i[:-1]

            if i.lower() not in all_words:
                all_words[i.lower()] = 1 
            elif i.lower() in all_words:
                all_words[i.lower()] += 1

Answer 1

您的代码存在一些问题：

当前的问题是，第二个if可以删除所有非字母字符串中的最后一个字符，然后第三个if将产生异常。
如果最后一个字符是非字母的，则删除最后一个两个字符。
不需要这两个嵌套循环，您可以使用break代替那个布尔变量
如果i.lower()[x]是非alpha，那么i[x];另外，最好使用i[-1]作为最后一个索引

在解决了这些问题之后，保持总体思路相同，您的代码就变成了

while len(i) > 0:
    if i[0].isalpha() and i[-1].isalpha():
        break
    if not i[-1].isalpha():
        i = i[:-1]
    elif not i[0].isalpha(): # actually, just 'else' would be enough, too
        i = i[1:]

但这仍然有点难以理解。我建议在字符串的两端使用两个循环：

while i and not i[:1].isalpha():
    i = i[1:]
while i and not i[-1:].isalpha():
    i = i[:-1]

或者您可以使用regular expression，这样的想法：

i = re.sub(r"^[^a-zA-Z]+|[^a-zA-Z]+$", "", i)

其内容如下：替换组+中不是（[^...]）的所有（a-zA-Z）字符，这些字符位于字符串开头之后（^）或（|）在字符串结尾（$）之前加""。

Answer 2

我认为你的问题是一个过于复杂的解决方案的结果。 @tobias_k指出了错误。无论如何，你的代码效率很低。尝试简化，例如尝试:(我还没有测试过）

for i in given_file:
    beg=0
    end=len(i)-1
    while beg<=end and not i[beg].isalpha():
        beg=beg+1
    while beg<=end and not i[end].isalpha():
        end=end-1
    res=""
    if beg<=end:
       res=i[beg:end]

IndexError：字符串索引超出范围，无法弄清楚原因

2 个答案: