我有一个像
这样的大字符串res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'New Delhi' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME = 'Mumbai' & & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == 'DRAMA' & FAV_LANGUAGE == 'English'",
"FAV_VENUE_CITY_NAME == 'New Delhi' & FAV_LANGUAGE == 'Hindi' & count_EVENT_LANGUAGE >= 1"]
现在我正在通过
提取字段 res = [re.split(r'[(==)(>=)]', x)[0].strip() for x in re.split('[&($#$)]', whereFields)]
res = [x for x in list(set(res)) if x]
o/p:['FAV_GENRE', 'FAV_LANGUAGE', 'FAV_VENUE_CITY_NAME', 'count_EVENT_GENRE', 'EVENT_GENRE','count_EVENT_LANGUAGE']
然后按照filter out some items from a list and store in different arrays in python
进行操作我正在获得价值
FAV_VENUE_CITY_NAME = ['New Delhi', 'Mumbai', 'Bangalore']
FAV_GENRE = ['|DRAMA|', '|COMEDY|', '|ACTION|ADVENTURE|SCI-FI|', 'DRAMA']
EVENT_GENRE = ['FESTIVAL', 'WORKSHOP', 'FANTASY', 'KIDS', 'EXHIBITION']
FAV_LANGUAGE = ['English', 'Hindi']
count_on_field = ['EVENT_GENRE', 'EVENT_LANGUAGE']
现在我想创建一个字典,其密钥将是res中的字段名称。和值将是上述链接的结果。
或者有没有办法让列表res的项目自己成为不同的不同列表。
类似
res = ['FAV_GENRE', 'FAV_LANGUAGE', 'FAV_VENUE_CITY_NAME', 'count_EVENT_GENRE', 'EVENT_GENRE','count_EVENT_LANGUAGE']
for i in range(len(res)):
res[i] = list(res[i]) # make each item as an empty list with name as it is
让他们变得像
FAV_VENUE_CITY_NAME = []
EVENT_GENRE = []
FAV_GENRE = []
FAV_LANGUAGE = [
然后按照上面链接中的方法获取res列表中每个列表的值。
然后创建一个类似下面一行的字典,制作一个以索引为键的字典
a = [51,27,13,56]
b = dict(enumerate(a))
#####d = dict{key=each list name from res list, value = value in each ind. lists}
#
或者如果可能的话,建议类似于顶部res列表...如何形成一个dict,其键作为字段名称和值作为每行的值
o/p: d = {'FAV_VENUE_CITY_NAME':['Mumbai','New Delhi','Bangalore'], 'EVENT_GENRE':['KIDS','FANTASY','FESTIVAL','WORKSHOP','EXHIBITION'], 'FAV_GENRE':['|DRAMA|','|ACTION|ADVENTURE|SCI-FI|','|COMEDY|','DRAMA'], 'FAV_LANGUAGE':['English','Hindi']}
count_EVENT_GENRE> = 1,count_EVENT_LANGUAGE> = 1不应该在该字典中,而应该转到列表
count_on_fields = ['EVENT_GENRE','EVENT_LANGUAGE']
如果有人有更好的想法或建议,可以帮忙。
答案 0 :(得分:1)
你走了:
创建包含所有值的列表:
values=[
FAV_GENRE,
FAV_LANGUAGE,
FAV_VENUE_CITY_NAME,
EVENT_GENRE,
count_on_field
]
然后按照answer:
的建议创建你的词典 d=dict(zip(res, values))
请注意,数组顺序确实重要,当然......
没有测试过,因为我现在电池耗尽了。我希望它能满足您的需求
答案 1 :(得分:1)
我认为你很难使用你从正则表达式中获得的列表,因为没有办法将它们绑回到他们的键盘上。我认为从原始列表开始可能是最简单的,然后逐渐减少。
from itertools import chain
res1 = [s.split(' & ') for s in res]
res2 = list(chain(*res1))
res3 = [item.replace('==', ' == ').replace('>=', ' >= ') for item in res2]
res4 = [item.split() for item in res3 if item]
res5 = [(item[0], item[-1]) for item in res4]
temp_dict = dict()
temp_set = set()
for key, value in res5:
if key.startswith('count'):
temp_set.add(key.replace('count_',''))
else:
clean_value = value.replace("'","")
temp_dict.setdefault(key, set()).add(clean_value)
output_dict = {key:list(value) for key, value in temp_dict.items()}
output_list = list(temp_set)
print(output_dict)
print(output_list)
您可以尝试打印中间结果(res1~res5)以查看正在进行的操作。
对于生产用途,特别是如果您要处理更大的res
,您应该将每个列表推导更改为生成器表达式,并将res2 = list(chain(*res1))
更改为res2 = chain.from_iterable(res1))
}。
答案 2 :(得分:1)
下面是一个IPython会话,向您展示如何构建 来自您数据的字典:
In [1]: from re import split
In [2]: from itertools import chain
In [3]: data = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'New Delhi' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' && EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' & & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == 'DRAMA' & FAV_LANGUAGE == 'English'",
"FAV_VENUE_CITY_NAME == 'New Delhi' & FAV_LANGUAGE == 'Hindi' & count_EVENT_LANGUAGE >= 1"]
In [4]: d = {}
In [5]: for elt in chain(*(split(' *& *', rec) for rec in data)):
if not elt: continue
k, v = split(' *[=>]= *', elt)
v = v.strip("'")
if k not in d: d[k] = []
if v not in d[k]: d[k].append(v)
...:
In [6]: d
Out[6]:
{'EVENT_GENRE': ['KIDS', 'FANTASY', 'FESTIVAL', 'WORKSHOP', 'EXHIBITION'],
'FAV_GENRE': ['|DRAMA|', '|ACTION|ADVENTURE|SCI-FI|', '|COMEDY|', 'DRAMA'],
'FAV_LANGUAGE': ['English', 'Hindi'],
'FAV_VENUE_CITY_NAME': ['Mumbai', 'New Delhi', 'Bangalore'],
'count_EVENT_GENRE': ['1'],
'count_EVENT_LANGUAGE': ['1']}
In [7]:
In [7]: count_fields = []
In [8]: for k in d:
if k[:6] == 'count_'
# no need for testing 'cs dict keys are unique
count_fields.append(k[6:])
del d[k]
In [9]: