以下是pandas数据框中的一列。我想获得所有独特的价值,如风格,颜色,味道,大小,香味名称......
Style: Lovenest - Newborn Pillow|||Color: Gray #Style and Color from this row
Style: Baby Calendula Face Cream #Style from this row
Color: Brown #Color from this row
Color: Matrix|||Item Package Quantity: 1 #Color & Item Package Quantity from this row
Color: Matrix|||Item Package Quantity: 1 #Color & Item Package Quantity from this row
Flavor: Baby Colic Babies Magic Tea|||Size: 1 Pack #Flavor & Size from this row
Scent Name: Sensitive|||Size: 100 Count (Pack of 6) #Scent Name & Size from this row
Scent Name: Sensitive|||Size: 100 Count (Pack of 6) #Scent Name & Size from this row
In [3]: df['variations'].head()
Out[3]:
0 Style: Lovenest - Newborn Pillow|||Color: Gray
1 Style: Lovenest - Newborn Pillow|||Color: Gray
2 Style: Lovenest - Newborn Pillow|||Color: Gray
3 Style: Lovenest - Newborn Pillow|||Color: Gray
4 Flavor: Baby Colic Babies Magic Tea|||Size: 1 Pack
Name: variations, dtype: object
预期输出['样式''颜色','风味''尺寸']
答案 0 :(得分:2)
以下代码可以起作用:
df_new = df['variations'].apply(lambda x: pd.Series({x.split(':')[0]:x.split(':')[1] for x in x.split('|||')}) if pd.notnull(x) else '')
df_new的列名是唯一的。
答案 1 :(得分:0)
您可以使用字符串方法按某种模式拆分值。
稍后将每个字符串拆分为键值对并取出关键部分。
In[1]: df['variations'].str.split('\|\|\|').apply(lambda items: [item.split(':')[0] for item in items])
Out[1]:
0 [Style, Color]
1 [Style, Color]
2 [Style, Color]
3 [Style, Color]
4 [Style, Color]
Name: item, dtype: object
编辑:我看到你改变了输入和预期输出。如果你试图在列中获取所有键的集合,那么你可以在pandas中完成所有操作,而不必逐行迭代它。
In[1]: keys_list = df['variations'].apply(lambda x: [y.split(': ')[0] for y in x.split('|||')]).tolist()
In[2]: list(set([key for keys in keys_list for key in keys]))
Out[2]: ['Flavor', 'Item Package Quantity', 'Size', 'Color', 'Style', 'Scent Name']
答案 2 :(得分:0)
In [25]: data = []
...: for x in df['variations']:
...: if pd.notnull(x):
...: d = {x.split(':')[0]:x.split(':')[1] for x in x.split('|||')}
...: vals = d.keys()
...: data.extend(vals)
...:
...: print list(set(data))
['Style', 'Material', 'Number of Items', 'Pattern', .....