我有一个文件text.txt,其中包含ff:
apple boy 'cat'
dog, egg fin
goat hat ice!
我需要使用空格和特殊字符拆分文本文件,同时忽略新行,这样输出就像这样一个数组:
["apple", "boy", "'", "cat", "'", "dog", "egg", "fin", "goat", "hat", "ice", "!"]
但到目前为止,我的代码输出结果是这样的: 它返回每个字符的字符串,甚至保留空格......
["a", "p", "p", "l", "e", "b", "o", "y", "'", "c", "a", "t", "'", "\n," "d", "o", "g", "e", "g", "g", "f", "i", "n", "\n", "g", "o", "a", "t", "h", "a", "t", "i", "c", "e", "!", "\n" ]
这是我的代码:
file=open(text.txt)
for i in file:
i.split(" ")
b+=i
print b
如果不允许导入任何模块该怎么办?特别是re模块?
答案 0 :(得分:3)
使用临时字符串,找到非空字母数字字符将它们包装在两侧的空格中,然后在末尾分割
lines ="""apple boy 'cat'
dog, egg fin
goat hat ice!"""
out = []
for line in lines.splitlines():
temp = ""
for ch in line:
if ch.isalnum():
temp+= ch
else:
temp += " {} ".format(ch)
out.extend(temp.split())
print(out)
输出:
['apple', 'boy', "'", 'cat', "'", 'dog', ',', 'egg', 'fin', 'goat', 'hat', 'ice', '!']
使用您的文件只需迭代文件对象并应用相同的逻辑:
with open("text.txt") as f:
out = []
for line in f:
temp = ""
for ch in line:
if ch.isalnum():
temp += ch
else:
temp += " {} ".format(ch)
out.extend(temp.split())
您还可以使用一组标点符号并更改逻辑检查,如果字符集出现在集合中:
st = set("""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~""")
with open("text.txt") as f:
out = []
for line in f:
temp = ""
for ch in line:
if ch not in st:
temp += ch
else:
temp += " {} ".format(ch)
out.extend(temp.split())
答案 1 :(得分:1)
将所有行放入单个字符列表(l
)。
然后检查字符是否是字母数字,而当前字符是字母数字(isalnum()
),它被组合成字符串(comb
),此字符串被添加到输出list(out
)当找到非字母数字时,然后单独添加非字母数字,直到再次找到字母数字。
out
。
with open('text.txt') as f:
l = f.readlines()
# separates each character into a list
l = list(l)
# output list
out = []
# string in which alphanumerics will be combined
comb = ''
# loops through chars
# comb is added with chars while char is alphanumberic,
# comb is added to out when a non-alphanumeric char is detected
# and then it resets, and the char detected as punc is added as well
for ch in l:
if ch.isalnum():
comb += ch
else:
out.append(comb)
out.append(ch)
comb = ''
# filters out from space and newlines
out = [ s for s in out if s != '' and s != '\n' and s != ' ' ]
print(out)
输出:
['apple', 'boy', "'", 'cat', "'", 'dog', ',', 'egg', 'fin', 'goat', 'hat', 'ice', '!']