这是我的问题:我有一个字典(dico
),我想计算2个不同键的次数,它们都出现在文件“ file.tsv”的同一行中看起来像这样:
sp_345_4567 pe_645_4567876 ap_456_45678 pe_645_4556789 ...
sp_345_567 pe_645_45678 ...
pe_645_45678 ap_456_345678 ...
sp_345_56789 ap_456_345 ...
pe_645_45678 ap_456_345678 ...
sp_345_56789 ap_456_345 ...
...
例如,香蕉和苹果键的值出现在第1行,因此无论它们出现多少次,它们仍然存在,所以我们共有1行,我想在所有行上都做文件的
为此,我在每个值后面添加了模式'_\w+'
,然后使用函数re.search
进行了正则表达式。
from itertools import product
import csv
dico = {
"banana": "sp_345",
"apple": "ap_456",
"pear": "pe_345",
"cherry": "ap_345",
"coco": "sp_543",
}
counter = {}
with open("file.tsv") as file:
reader = csv.reader(file, delimiter="\t")
for line in reader:
for key1, key2 in product(dico, dico):
if key1 >= key2:
continue
counter[key1, key2] = 0
k1 = k2 = False
for el in line:
if re.search(dico[key1]+'_\w+', el):
k1 = True
elif re.search(dico[key2]+'_\w+', el):
k2 = True
if k1 and k2:
counter[key1, key2] += 1
break
for key, val in counter.items():
print(key, val)
但是发生的位置是从0停止:
Apple banana 0
pear banana 0
pear apple 0
答案 0 :(得分:1)
k1
和k2
不能同时为True
,因为您要同时使用False
进行初始化,并且最多只能将True
设置为一个。
elif re.search(dico[key2]+'_\w+', el):
k2 = True
应该是
if re.search(dico[key2]+'_\w+', el):
k2 = True
答案 1 :(得分:0)
您的专线
counter[key1, key2] = 0
仅在(key1,key2)还没有值时才发生。 例如,添加一个测试:
if (key1, key2) not in counter:
counter[key1, key2] = 0
或者您可以在打开csv之前将所有对的counter [key1,key2]设置为0。如:
for key1, key2 in product(dico, dico):
if key1 < key2:
counter[key1, key2] = 0
counter = {}
with open("file.tsv") as file:
....
也
elif re.search(dico[key2]+'_\w+', el):
应该是
if re.search(dico[key2]+'_\w+', el):
否则,当您找到key1时,您将永远找不到key2