我在熊猫数据框中有一个列表:
EmailOperator(task_id='send_email',to='lee@gmail.com.com',subject="Daily Report
Generated",html_content=""" <h1>Youreports are ready.</h1> """,files
['/usr/local/airflow/store_files_airflow/location_wise_profit_report.csv',
'/usr/local/airflow/store_files_airflow/store_wise_profit_report.csv'], dag=dag)
每个列表都在每一行中。此外,这个列表在每一行中都有不同的长度。 我有一本字典:
0: [car, telephone]
1: [computer, beach, book, language]
2: [rice, bus, street]
在那之后我把字典弄平了
dict = {'car': 'transport',
'rice':'food'
'book':'reading'
}
我想遍历列表中的所有项目并创建此类列,
这是所需的输出:
d = {val:key for key, lst in dict.items() for val in lst}
我试过了:
index col1 col2
0: [car, telephone],transport
1: [computer, beach, book, language], reading
2: [rice, bus, street], food
但我明白
df['col2'] = data_df['col1'].index.map(d)
答案 0 :(得分:1)
您可以.explode
然后使用字典进行翻译,然后再次分组:
示例数据:
import pandas as pd
data = {'id': {0: 1, 1: 2, 2: 3}, 'col': {0: ['car', 'telephone'], 1: ['computer', 'beach', 'book', 'language'], 2: ['rice', 'bus', 'street']}}
df = pd.DataFrame(data)
dct = {'car': 'transport', 'rice':'food', 'book':'reading'}
代码:
df2 = df.explode('col')
df2['col2'] = df2['col'].replace(dct)
df['col2'] = df2[~df2['col'].eq(df2['col2'])]['col2']
输出:
id col col2
0 1 [car, telephone] transport
1 2 [computer, beach, book, language] reading
2 3 [rice, bus, street] food
答案 1 :(得分:1)
您可以在自定义函数上使用 apply
:
import pandas as pd
df = pd.DataFrame([{'col1': ['car', 'telephone']}, {'col1': ['computer', 'beach', 'book', 'language']}, {'col1': ['rice', 'bus', 'street']}])
def get_col2(lst):
d={'car': 'transport','rice':'food','book':'reading'}
for k,v in d.items():
if k in lst:
return v
df['col2'] = df['col1'].apply(get_col2)
输出:
col1 | col2 | |
---|---|---|
0 | ['car', 'telephone'] | 运输 |
1 | ['computer', 'beach', 'book', 'language'] | 阅读 |
2 | ['rice', 'bus', 'street'] | 食物 |