我有这个字符串,我想用“,”分割它
x = 'a, b, c , d , "x,x,2" , hi'
x.split(',')
这是我的真实字符串
x = 'Outward ,Supply , ,Tax Invoice ,IN9195212470,31/12/2019,VPS AGRO & AUTO PVT LTD ,311954,06AAACV9344F1ZA ,"VILLAGE KHANPUR KOLIAN, N.H. 1 ",6 K.M. FRO,KURUKSHETRA ,HARYANA ,136131,VPS AGRO & AUTO PVT LTD ,311954,"VILLAGE KHANPUR KOLIAN, N.H. 1",6 K.M. FRO,KURUKSHETRA ,HARYANA ,136131,503675,SM VAL. GENUINE DIESEL ENG. OIL 1/9 L ,27101980,360,LTR,58204.04,9,5238.36,9,5238.36,0,0,0,0,0,0,0,68680.76, , , , , ,, , ,06AAACW0287A1ZR ,VALVOLINE CUMMINS PVT LTD-AMBALA ,"KHASHRA NO-108/1/2, ", ,AMBALA ,133004,HARYANA , , , , ,'
它返回此结果
['a','b','c','d','"x','x','2', 'hi']
但是我想要这个
['a', 'b', 'c' , 'd' , '"x,x,2"' , 'hi']
如何在python中做到这一点
帮帮我
答案 0 :(得分:3)
import shlex
lexer = shlex.shlex('a, b, c , d , "x,x,2" , hi')
lexer.whitespace += ','
print(list(lexer))
结果:
['a', 'b', 'c', 'd', '"x,x,2"', 'hi']
这是更新任务的更新解决方案:
x = 'Outward ,Supply , ,Tax Invoice ,IN9195212470,31/12/2019,VPS AGRO & AUTO PVT LTD ,311954,06AAACV9344F1ZA ,"VILLAGE KHANPUR KOLIAN, N.H. 1 ",6 K.M. FRO,KURUKSHETRA ,HARYANA ,136131,VPS AGRO & AUTO PVT LTD ,311954,"VILLAGE KHANPUR KOLIAN, N.H. 1",6 K.M. FRO,KURUKSHETRA ,HARYANA ,136131,503675,SM VAL. GENUINE DIESEL ENG. OIL 1/9 L ,27101980,360,LTR,58204.04,9,5238.36,9,5238.36,0,0,0,0,0,0,0,68680.76, , , , , ,, , ,06AAACW0287A1ZR ,VALVOLINE CUMMINS PVT LTD-AMBALA ,"KHASHRA NO-108/1/2, ", ,AMBALA ,133004,HARYANA , , , , ,'
import shlex
lexer = shlex.shlex(x)
lexer.whitespace = ','
lexer.whitespace_split = True
print([cell.strip() for cell in lexer])
结果:
['Outward', 'Supply', '', 'Tax Invoice', 'IN9195212470', '31/12/2019', 'VPS AGRO & AUTO PVT LTD', '311954', '06AAACV9344F1ZA', '"VILLAGE KHANPUR KOLIAN, N.H. 1 "', '6 K.M. FRO', 'KURUKSHETRA', 'HARYANA', '136131', 'VPS AGRO & AUTO PVT LTD', '311954', '"VILLAGE KHANPUR KOLIAN, N.H. 1"', '6 K.M. FRO', 'KURUKSHETRA', 'HARYANA', '136131', '503675', 'SM VAL. GENUINE DIESEL ENG. OIL 1/9 L', '27101980', '360', 'LTR', '58204.04', '9', '5238.36', '9', '5238.36', '0', '0', '0', '0', '0', '0', '0', '68680.76', '', '', '', '', '', '', '', '06AAACW0287A1ZR', 'VALVOLINE CUMMINS PVT LTD-AMBALA', '"KHASHRA NO-108/1/2, "', '', 'AMBALA', '133004', 'HARYANA', '', '', '', '']
答案 1 :(得分:1)
没有内置的程序可以实现这一目标,而无需进行大量的前/后处理数据操作。
shlex.split
在此示例中可以使用某种程度的功能,但由于它在空格处分割,因此具有欺骗性。如果仅将2个元素与逗号分隔,则将失败。ast.literal_eval
无效,因为...有些项目不是文字csv.reader
对象几乎可以解决[x.strip() for x in next(csv.reader([x]))]
的问题,但由于引号和逗号之间存在空格,引号未正确处理。但是可以使用简单的状态机遍历每个字符:
x = 'a, b, c , d , "x,x,2" , hi'
in_quote = False
current = []
output = []
for c in x:
if in_quote:
current.append(c)
if c=='"':
output.append("".join(current))
current = []
in_quote = False
continue
if c==",":
output.append("".join(current))
current = []
elif c==" ":
pass
else:
current.append(c)
if c=='"':
in_quote = True
output.append("".join(current))
结果:
['a', 'b', 'c', 'd', '"x,x,2"', '', 'hi']
只需跳过空格,遇到逗号时创建一个新元素,但是如果遇到引号,则有一个标志。
最后,不要忘记遇到字符串结尾时最后一个元素的累积。
答案 2 :(得分:1)
仅使用split
的解决方案。请注意,它使用f字符串(python 3.6+),但是在较旧的版本中仍然可以实现相同的行为。
无需使用正则表达式就可以实现此目的,如下所示:我将注释代码以进行解释:
# First split by double quote
x = x.split('"')
final_x = []
for i in range(len(x)):
# We know that if the list element is even then it must be outside double quotes
if i%2 == 0:
# Split the list by commas and strip any whitespace
x_element = x[i].split(',')
x_element = [el.strip() for el in x_element]
# extend the list
final_x.extend(x_element)
else:
# This is an odd element of the list, therefore inside quotation.
# put the string back into quotations
x_element = f'"{x[i]}"'
#append this to the final list
final_x.append(x_element)
# filter out any white spaces left from the various splits
final_x = [el for el in final_x if el !='']
请注意,在追加奇数列表元素和扩展偶数方面的区别。这是因为您正在创建带有拆分的新列表,并且我们想扩展输出,而对于奇数元素,我们想向列表添加新元素,因此我们要追加。
答案 3 :(得分:1)
您可以使用正则表达式方法:
RegisterController
这产生
import regex as re
x = 'a, b, c , d , "x,x,2" , hi'
rx = re.compile(
r"""
"[^"]*"(*SKIP)(*FAIL)
|
\s*,\s*
""", re.VERBOSE)
lst = rx.split(x)
print(lst)