ActiveSheet
假设我的刺痛在文本文件中如下
第1行:
import shlex
fil=open("./demoshlex.txt",'r')
line=fil.readline()
print line
print shlex.split(line)
我想按如下方式拆分行和表单列表
asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something
我尝试使用[asfdsafadfa, "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'.", is something]
,但它给了我异常,放置代码和异常
shlex.split
答案 0 :(得分:1)
在我看来,您只想在第一次出现"
时进行拆分,并希望将所有"
保留在输出列表的第二个元素中。
以下是仅使用标准库的示例,无需导入:
result = []
with open('test.txt', 'r') as openfile:
for line in openfile:
# strip spaces and \n from the line
line = line.strip()
# split the line on "
my_list = line.split('"')
# only append first element of the list to the result
result.append(my_list[0].strip())
# rebuild the second part, adding back in the "
remainder = '"' + '"'.join([a for a in my_list[1:]])
# append the second part to the result
result.append(remainder)
print(result)
输出:
['asfdsafadfa', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."']
或者如果您打印输出列表的各个元素:
for e in result:
print(e)
输出:
asfdsafadfa
"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
[根据评论编辑]
根据评论,您可以使用.split('"', 1)
,例如:
with open('test.txt', 'r') as openfile:
for line in openfile:
# strip spaces and \n from the line
line = line.strip()
# split the line on " but only the fist one
result = line.split('"', 1)
# add in the " for the second element
result[1] = '"' + result[1]
[根据更新的问题和评论进行编辑]
OP的评论:
我只想要引用的部分,即删除"是"从那以后 结果列表的元素并使其成为[2]元素
因为问题是用尾随更新"是"输入上的字符串,需要在输出中省略,现在示例如下:
with open('test.txt', 'r') as openfile:
for line in openfile:
# strip spaces and \n from the line
line = line.strip()
# split the line on " but only the fist one
result = line.split('"', 1)
# add in the " for the second element, remove trailing string
result[1] = '"{}"'.format(result[1].rsplit('"', 1)[0])
但是文件可能包含多行,如果是这种情况,则需要建立一个输出列表,每行一个输出。现在的例子如下:
result = []
with open('test.txt', 'r') as openfile:
for line in openfile:
if '"' in line:
# we can split the line on "
line = line.strip().split('"', 1)
if line[1][-1] == '"':
# no trailing string to remove
# pre-fix second element with "
line[1] = '"{}'.format(line[1])
elif '"' in line[1]:
# trailing string to be removed with .rsplit()[0]
# post- and pre-fix " for second element
line[1] = '"{}"'.format(line[1].rsplit('"', 1)[0])
else:
# no " in line, return line as one element list
line = [line.strip()]
result.append(line)
# result is now a list of lists
for line in result:
for e in line:
print(e)
答案 1 :(得分:1)
最好的方法是使用re
s = '''asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something'''''
pat = re.compile(
r'''
^ # beginning of a line
(.*?) # first part. the *? means non-greedy
(".*") # part between the outermost ", ("-included)
(.*?) # last part
$ # end of a line
''', re.DOTALL|re.VERBOSE)
pat.match(s).groups()
('asfdsafadfa ',
'"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."',
' is something')
总而言之,这将成为:
test_str = '''asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something
asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
asfdsafadfa Tabvxc avcxsdasaf sadasfdf. sdsadsaf '0000000000000000000000000000000'.
'''
def split_lines(filehandle):
pat = re.compile(r'''^(.*?)(".*")(.*?)$''', re.DOTALL)
for line in filehandle:
match = pat.match(line)
if match:
yield match.groups()
else:
yield line
with StringIO(test_str) as openfile:
for line in split_lines(openfile):
print(line)
第一个生成器将打开的文件句柄拆分为不同的行。然后它试图分割线。如果成功,它会产生一个包含不同部分的元组,否则会产生原始字符串。
在实际的程序中,您可以将StringIO(test_str)
替换为open(filename, 'r')
('asfdsafadfa ', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."', ' is something') ('asfdsafadfa ', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."', '') asfdsafadfa Tabvxc avcxsdasaf sadasfdf. sdsadsaf '0000000000000000000000000000000'.
答案 2 :(得分:0)
您的原始字符串似乎很难引用。 您可以通过在它们之前添加引号来转义引号,如下所示:
my_var = "Tabvxc \"avcx\"sdasaf\" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
然后您可以按照以下方式进行拆分:
my_var.split('"')