我有一个字符串列表,我想从中删除每个字符串中的特定元素。以下是我到目前为止的情况:
s = [ "Four score and seven years ago, our fathers brought forth on",
"this continent a new nation, conceived in liberty and dedicated"]
result = []
for item in s:
words = item.split()
for item in words:
result.append(item)
print(result,'\n')
for item in result:
g = item.find(',.:;')
item.replace(item[g],'')
print(result)
输出结果为:
['Four', 'score', 'and', 'seven', 'years', 'ago,', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation,', 'conceived', 'in', 'liberty', 'and', 'dedicated']
在这种情况下,我希望新列表包含所有单词,但除了引号和撇号之外,它不应包含任何标点符号。
['Four', 'score', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated']
即使使用find函数,结果也似乎相同。如何在没有标点符号的情况下更正打印?如何改进代码?
答案 0 :(得分:2)
您可以使用re.split
指定要拆分的正则表达式,在这种情况下,所有内容都不是数字或数字。
import re
result = []
for item in s:
words = re.split("[^A-Za-z0-9]", s)
result.extend(x for x in words if x) # Include nonempty elements
答案 1 :(得分:2)
分割字符串后,您可以删除想要删除的所有字符:
for item in s:
words = item.split()
for item in words:
result.append(item.strip(",.")) # note the addition of .strip(...)
您可以将要删除的任何字符添加到.strip()
的String参数中,所有这些都在一个字符串中。上面的例子删除了逗号和句点。
答案 2 :(得分:1)
s = [ "Four score and seven years ago, our fathers brought forth on", "this continent a new nation, conceived in liberty and dedicated"]
# Replace characters and split into words
result = [x.translate(None, ',.:;').split() for x in s]
# Make a list of words instead of a list of lists of words (see http://stackoverflow.com/a/716761/1477364)
result = [inner for outer in result for inner in outer]
print s
输出:
['Four', 'score', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated']
答案 3 :(得分:1)
或者,你可以在
中添加一个循环for item in result:
g = item.find(',.:;')
item.replace(item[g],'')
并拆分,.:;
只需添加一个标点符号数组,如
punc = [',','.',':',';']
然后在for item in result:
内遍历它,如
for p in punc:
g = item.find(p)
item.replace(item[g],'')
所以完整的循环是
punc = [',','.',':',';']
for item in result:
for p in punc:
g = item.find(p)
item.replace(item[g],'')
我对此进行了测试,它确实有效。