我正在阅读.csv
文件并将其保存到名为csvfile
的矩阵中,矩阵内容如下所示(缩写为:有几十条记录):
[[' 411-440854-0',' 411-440824-0',' 411-441232-0',' 394- 529791',' 394-529729',' 394-530626'],< ...>,[' 394-1022430-0', ' 394-1022431-0',' 394-1022432-0',' ***另一个CN,其间有切换'],[' 394-833938-0',' 394-833939-0',' 394-833940-0'],< ...>,[' 394 -1021830-0',' 394-1021831-0',' 394-1021832-0',' ***分段器结束连接'],[ ' 394-1022736-0',' 394-1022737-0',' 394-1022738-0'],< ...>,[& #39; 394-1986420-0',' 394-1986419-0',' 394-1986416-0',' ***奇怪的BN行检查&# 39;],[' 394-1986411-0',' 394-1986415-0',' 394-1986413-0'],< ... >,[' 394-529865-0',' 394-529686-0',' 394-530875-0',' ***分段器终端连接'],[' 394-830900-0',' 394-830904-0',' 394-830902-0'],[ ' 394-2350772-0&#39 ;, ' 394-2350776-0',' 394-2350774-0',' ***分区器存在但没有结束时间'],< ...> ]
我正在将一个文本文件读入名为textfile
的变量中,内容如下所示:
...
object underground_line {
name SPU123-394-1021830-0-sectionalizer;
phases AN;
from SPU123-391-670003;
to SPU123-395-899674_sectionalizernode;
length 26.536;
configuration SPU123-1/0CN15-AN;
}
object underground_line {
name SPU123-394-1021831-0-sectionalizer;
phases BN;
from SPU123-391-670002;
to SPU123-395-899675_sectionalizernode;
length 17.902;
configuration SPU123-1/0CN15-BN;
}
object underground_line {
name SPU123-394-1028883-0-sectionalizer;
phases CN;
from SPU123-391-542651;
to SPU123-395-907325_sectionalizernode;
length 771.777;
configuration SPU123-1CN15-CN;
}
...
我想查看name
矩阵textfile
中的SPU123-
行的一部分(-0-sectionalizer
之后和csvfile
之前的任何内容)是否存在于counter = 0
for noline in textfile:
if 'name SPU123-' in noline:
if '-' in noline[23]:
if ((noline[13:23] not in s[0]) and (noline[13:23] not in s[1]) and (noline[13:23] not in s[2]) for s in csvfile):
counter = counter+1
else:
if ((noline[13:24] not in s[0]) and (noline[13:24] not in s[1]) and (noline[13:-24] not in s[2]) for s in csvfile):
counter = counter+1
print counter
矩阵中。如果它不存在,我想做一些事情(增加一个计数器),我尝试了几种方法,包括下面的内容:
if any((noline......)
这不起作用。我也在上面的代码示例中尝试了{{1}},但它也没有用。
答案 0 :(得分:1)
在列表s
:
l
>>> l = [['str', 'foo'], ['bar', 'so']]
>>> s = 'foo'
>>> any(s in x for x in l)
True
>>> s = 'nope'
>>> any(s in x for x in l)
False
在您的代码中实现此功能(假设noline[13:23]
是您想要搜索的字符串,然后如果它不在counter
中则递增csvfile
:
counter = 0
for noline in textfile:
if 'name SPU123-' in noline:
if '-' in noline[23]: noline[13:23]:
if not any(noline[13:23] in x for x in csvfile) and not any(noline[13:23] + '-0' in x for x in csvfile):
counter += 1
else:
if not any(noline[13:24] in x for x in csvfile) and not any(noline[13:24] + '-0' in x for x in csvfile):
counter += 1
答案 1 :(得分:1)
由于矩阵包含大量值的负载,因此每次迭代都很慢。
将值组装到映射中(在这种情况下为set
,因为没有关联数据),因为哈希表查找非常快:
s = {v for r in matrix for v in r if re.match(r'\d[-\d]+]\d$',v)} #or any filter more appropriate for your notion of valid identifiers
if noline[13:23] in s: #parsing the identifiers instead would be more fault-tolerant
#do something
由于初步步骤,这只会超过一定规模的蛮力方法。
答案 2 :(得分:0)
import re, itertools
展平csvfile
- data
是一个迭代器
data = itertools.chain.from_iterable(csvfile)
从数据中提取相关项目并将其设置为性能集合(避免多次迭代数据)
data_rex = re.compile(r'\d{3}-\d+')
data = {match.group() for match in itertools.imap(data_rex.match, data) if match}
量化不在数据中的名称。
def predicate(match, data = data):
'''Return True if match not found in data'''
return match.group(1) not in data
# after SPU123- and before -0-
name = re.compile(r'name SPU123-(\d{3}-\d+)-')
names = name.finditer(textfile)
# quantify
print sum(itertools.imap(predicate, names))