我需要来自字符串
i = "1,'Test','items (one, two, etc.)',1,'long, list'"
提取下一个字符串的数组:
['1', "'Test'", "'items (one, two, etc.)'", '1', "'long, list'"]
借助regexpress
r=re.split(r',+(?=[^()]*(?:\(|$))', i)
我只收到下一个结果:
['1', "'Test'", "'items (one, two, etc.)'", '1', "'long", " list'"]
UPD1
应支持NULL
i = "1,'Test',NULL,'items (one, two, etc.)',1,'long, list'"
['1', "'Test'", 'NULL', "'items (one, two, etc.)'", '1', "'long, list'"]
答案 0 :(得分:4)
在这种情况下,您不需要re.split
。您可以在列表理解中使用re.findall
:
>>> [k for j in re.findall(r"(\d)|'([^']*)'",i) for k in j if k]
['1', 'Test', 'items (one, two, etc.)', '1', 'long, list']
前面的正则表达式会匹配一个引用'([^']*)'
或任何数字(\d
)之间的任何内容。
或者作为一种更有效的方式,您可以使用ast.literal_eval
:
>>> from ast import literal_eval
>>> literal_eval(i)
(1, 'Test', 'items (one, two, etc.)', 1, 'long, list')
答案 1 :(得分:2)
这是csv
模块的任务:
import csv
from StringIO import StringIO
line = "1,'Test','items (one, two, etc.)',1,'long, list'"
reader = csv.reader(StringIO(line), quotechar="'")
row = next(reader)
# row == ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list']
这里的关键是创建一个CSV阅读器,将单引号指定为引号字符。
答案 2 :(得分:1)
您可以拆分单引号:
SELECT SUM(score) AS ttl_score, student_name
FROM assessment
WHERE class='$class_info'
AND date>='$start' AND date<='$end'
GROUP BY student_id ORDER BY ttl_score DESC;
或者使用地图:
i = "1,'Test','items (one, two, etc.)',1,'long, list'"
print([ele.strip(" ,") for ele in i.split("'") if ele.strip(",")])
['1', 'Test', 'items (one, two, etc.)', '1', 'long, list']
使用map with python 3非常有效:
print([ele for ele in map(lambda x: x.strip(", "), i.split("'")) if ele])
更好地使用python2和In [7]: i = "1,'Test','items (one, two, etc.)',1,'long, list'"
In [8]: timeit [ele for ele in map(lambda x: x.strip(", "), i.split("'")) if ele]
1000000 loops, best of 3: 1.5 µs per loop
In [9]: r = re.compile(r"(\d)|'([^']*)'")
In [10]: timeit [k for j in r.findall(i) for k in j if k]
100000 loops, best of 3: 3.92 µs per loop
:
itertools.imap
所有这些都返回相同的输出栏literal_eval,因为它将数字计算为整数:
In [9]: from itertools import imap
In [10]: timeit [ele for ele in imap(lambda x: x.strip(", "), i.split("'")) if ele]
1000000 loops, best of 3: 871 ns per loop
In [11]: r = re.compile(r"(\d)|'([^']*)'")
In [12]: timeit [k for j in r.findall(i) for k in j if k]
100000 loops, best of 3: 4.27 µs per loop
In [17]: from ast import literal_eval
In [18]: timeit literal_eval(i)
100000 loops, best of 3: 16.2 µs per loop
NUll系列没有什么不同:
In [19]: literal_eval(i)
Out[19]: (1, 'Test', 'items (one, two, etc.)', 1, 'long, list')
In [20]: [k for j in r.findall(i) for k in j if k]
Out[20]: ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list']
In [21]: [ele for ele in imap(lambda x: x.strip(", "), i.split("'")) if ele]Out[21]: ['1', 'Test', 'items (one, two, etc.)', '1', 'long, list']