Question

我的代码：

readfile = open("{}".format(file), "r")

lines = readfile.read().lower().split()

elements = """,.:;|!@#$%^&*"\()`_+=[]{}<>?/~"""
for char in elements:
    lines = lines.replace(char, '')

这适用于删除特殊字符。但是我需要条带化帮助＆＃34; - ＆＃34;和＆＃34; ＆＃39; ＆＃34;

所以例如＆＃34;安全舞蹈＆＃34;没关系，但不是＆＃34; -hi-＆＃34;但是＆＃34;我＆＃34;＆＃34;没关系，但不是＆＃34; ＆＃39;嗨＆＃34;

我只需要删除开头和结尾

它不是一个字符串，它是一个列表。

我该怎么做？

Answer 1

您可以尝试string.punctuation和strip：

import string

my_string_list = ["-hello-", "safety-dance", "'hi", "I'll", "-hello"]

result = [item.strip(string.punctuation) for item in my_string_list]
print(result)

结果：

['hello', 'safety-dance', 'hi', "I'll", 'hello']

Answer 2

首先，在循环中使用str.replace是低效的。由于字符串是不可变的，因此您将在每次迭代时创建一个需要字符串。您可以使用str.translate一次删除不需要的字符。

只有当它不是边界字符时才删除短划线，这正是str.strip所做的。

您想删除的字符似乎与string.punctuation相对应，'-'的特殊情况。

from string import punctuation

def remove_special_character(s):
    transltation = str.maketrans('', '', punctuation.replace('-', ''))
    return ' '.join([w.strip('-') for w in s.split()]).translate(transltation)

polluted_string = '-This $string contain%s ill-desired characters!'
clean_string = remove_special_character(polluted_string)

print(clean_string)

# prints: 'This string contains ill-desired characters'

如果要将其应用于多行，可以使用列表理解来完成。

lines = [remove_special_character(line) for line in lines]

最后，要阅读文件，您应该使用with语句。

with open(file, "r") as f
    lines = [remove_special_character(line) for line in f]

剥离和拆分如何剥离列表

2 个答案: