df1
Ticker Category
0 XOM Group 1
1 CVX Group 1
2 RDSA-GB Group 2
3 BP-GB Group 1, Group 2
4 EQNR-NO Group 3
5 FP-FR Group 4
6 ENI-IT Group 3, Group 4
7 COP Group 5
我要获得的结果将基于“类别”列创建“行情清单”列表,并列出列出“类别”值的名称,同时用“ _”替换空格
第二,如果存在“类别”具有两个值的实例,例如“ US Major,Euro Major”,那么我如何确保“ Ticker”出现在两个Category列表中?
Group_1 = ['XOM','CVX','BP-GB']
Group_2 = ['RDSA-GB','BP-GB']
Group_3 = ['EQNR-NO','ENI-IT']
Group_4 = ['FP-FR','ENI-IT']
Group_5 = ['COP']
谢谢!
答案 0 :(得分:1)
您说的是名单,我想您的意思是字典?如果是这样,请尝试以下方法:
import pandas as pd
df = pd.DataFrame([["XOM","US Major"],
["CVX","US Major"],
["RDSA-GB","Euro Major"],
["BP-GB","Euro Major"],
["EQNR-NO","Euro Major"]],columns=["Ticker","Category"])
df_to_lists = df.groupby("Category")["Ticker"].apply(list)
lists_to_dict = dict(df_to_lists)
print(lists_to_dict)
输出:
{'Euro Major': ['RDSA-GB', 'BP-GB', 'EQNR-NO'], 'US Major': ['XOM', 'CVX']}
如果您不想要字典,则df_to_lists输出:
Category
Euro Major [RDSA-GB, BP-GB, EQNR-NO]
US Major [XOM, CVX]
Name: Ticker, dtype: object
答案 1 :(得分:0)
您还可以使用这样的循环功能(假设我的df
是您的df1
):
lists_with_unique_vals = dict()
for cat in df.Category.unique():
lists_with_unique_vals[cat.replace(' ', '_')] = list(df[df['Category']==cat]['Ticker'].unique())
结果如下:
>> print(lists_with_unique_vals)
{'US_Major': ['XOM', 'CVX'], 'Euro_Major': ['RDSA-GB', 'BP-GB', 'EQNR-NO']}
答案 2 :(得分:0)
紧跟@nassiam的代码来处理可能存在多个类别的情况,
import re
import sys
from datetime import datetime
rxco = re.compile
rx = {}
#String
s = r'140/154/011/002'
#String in a list
l = ['abc', 'XX123 SHDJ FFFF', s, 'unknown', 'TTL/4/5/6', 'ORD/123']
#Regex to get what I am interested in
rx['ls_pax_split'] = rxco(r'\s?((\d+\/?)*)')
#For loop returns matches and misses
for i in l:
m = re.match(rx['ls_pax_split'], i)
print(m)
#List Comprehension returns ALL entries - NOT EXPECTED
idx = [i for i, item in enumerate(l) if re.match(rx['ls_pax_split'], item)]
print(idx)
#Control Comprehension returns - AS EXPECTED
fruit_list = ['raspberry', 'apple', 'strawberry']
berry_idx = [i for i, item in enumerate(fruit_list) if re.match('rasp', item)]
print(berry_idx)
假设第一列import pandas as pd
df = pd.DataFrame([["XOM","US Major"],
["CVX","US Major"],
["RDSA-GB","Euro Major"],
["BP-GB","Euro Major"],
["EQNR-NO","Euro Major"],
["ABC-XYZ", "Euro Major, US Major"],
["DEF-GHI", "Euro Major, US Major"]], columns=["Ticker","Category"])
df_to_lists = df.groupby("Category")["Ticker"].apply(list)
lists_to_dict = dict(df_to_lists)
print(lists_to_dict)
# Till here it is the same code as @nassiam pointed out
# To handle multiple-valued category
keys = lists_to_dict.keys()
for key in keys:
categories = [x.strip() for x in key.split(',')]
if len(categories) > 1:
for cat in categories:
if cat in lists_to_dict:
lists_to_dict[cat] += lists_to_dict[key]
else:
lists_to_dict[cat] = lists_to_dict[key]
lists_to_dict.pop(key, None)
# To replace space with underscore
for key in lists_to_dict:
lists_to_dict[key.replace(" ", "_")] = lists_to_dict.pop(key)
具有唯一值。否则,在追加列表时,请使用Ticker
使其唯一。
我希望这会有所帮助。