如何从数据框列值创建多个列表

时间:2019-09-19 13:56:54

标签: python

df1

    Ticker          Category
0      XOM           Group 1
1      CVX           Group 1
2  RDSA-GB           Group 2
3    BP-GB  Group 1, Group 2
4  EQNR-NO           Group 3
5    FP-FR           Group 4
6   ENI-IT  Group 3, Group 4
7      COP           Group 5

我要获得的结果将基于“类别”列创建“行情清单”列表,并列出列出“类别”值的名称,同时用“ _”替换空格

第二,如果存在“类别”具有两个值的实例,例如“ US Major,Euro Major”,那么我如何确保“ Ticker”出现在两个Category列表中?

Group_1 = ['XOM','CVX','BP-GB']
Group_2 = ['RDSA-GB','BP-GB']
Group_3 = ['EQNR-NO','ENI-IT']
Group_4 = ['FP-FR','ENI-IT']
Group_5 = ['COP']

谢谢!

3 个答案:

答案 0 :(得分:1)

您说的是名单,我想您的意思是字典?如果是这样,请尝试以下方法:

import pandas as pd

df =  pd.DataFrame([["XOM","US Major"],
["CVX","US Major"],
["RDSA-GB","Euro Major"],
["BP-GB","Euro Major"],
["EQNR-NO","Euro Major"]],columns=["Ticker","Category"])

df_to_lists = df.groupby("Category")["Ticker"].apply(list)
lists_to_dict = dict(df_to_lists)
print(lists_to_dict)

输出:

{'Euro Major': ['RDSA-GB', 'BP-GB', 'EQNR-NO'], 'US Major': ['XOM', 'CVX']}

如果您不想要字典,则df_to_lists输出:

Category
Euro Major    [RDSA-GB, BP-GB, EQNR-NO]
US Major                     [XOM, CVX]
Name: Ticker, dtype: object

答案 1 :(得分:0)

您还可以使用这样的循环功能(假设我的df是您的df1):

lists_with_unique_vals = dict()
for cat in df.Category.unique():
    lists_with_unique_vals[cat.replace(' ', '_')] = list(df[df['Category']==cat]['Ticker'].unique())

结果如下:

>> print(lists_with_unique_vals)
{'US_Major': ['XOM', 'CVX'], 'Euro_Major': ['RDSA-GB', 'BP-GB', 'EQNR-NO']}

答案 2 :(得分:0)

紧跟@nassiam的代码来处理可能存在多个类别的情况,

import re
import sys
from datetime import datetime
rxco = re.compile
rx = {}

#String
s = r'140/154/011/002'

#String in a list
l = ['abc', 'XX123 SHDJ FFFF', s, 'unknown', 'TTL/4/5/6', 'ORD/123']

#Regex to get what I am interested in
rx['ls_pax_split'] = rxco(r'\s?((\d+\/?)*)') 

#For loop returns matches and misses
for i in l:
    m = re.match(rx['ls_pax_split'], i)
    print(m)

#List Comprehension returns ALL entries - NOT EXPECTED
idx = [i for i, item in enumerate(l) if re.match(rx['ls_pax_split'], item)]
print(idx)

#Control Comprehension returns - AS EXPECTED
fruit_list = ['raspberry', 'apple', 'strawberry']
berry_idx = [i for i, item in enumerate(fruit_list) if re.match('rasp', item)]
print(berry_idx)

假设第一列import pandas as pd df = pd.DataFrame([["XOM","US Major"], ["CVX","US Major"], ["RDSA-GB","Euro Major"], ["BP-GB","Euro Major"], ["EQNR-NO","Euro Major"], ["ABC-XYZ", "Euro Major, US Major"], ["DEF-GHI", "Euro Major, US Major"]], columns=["Ticker","Category"]) df_to_lists = df.groupby("Category")["Ticker"].apply(list) lists_to_dict = dict(df_to_lists) print(lists_to_dict) # Till here it is the same code as @nassiam pointed out # To handle multiple-valued category keys = lists_to_dict.keys() for key in keys: categories = [x.strip() for x in key.split(',')] if len(categories) > 1: for cat in categories: if cat in lists_to_dict: lists_to_dict[cat] += lists_to_dict[key] else: lists_to_dict[cat] = lists_to_dict[key] lists_to_dict.pop(key, None) # To replace space with underscore for key in lists_to_dict: lists_to_dict[key.replace(" ", "_")] = lists_to_dict.pop(key) 具有唯一值。否则,在追加列表时,请使用Ticker使其唯一。 我希望这会有所帮助。