我有一个数据框df_a,带有一个名为“语言”的numpy数组。我想根据语言和与语言相关的语言代码创建另一个numpy数组LanguageCode。
df_a = pd.DataFrame({'Language':[['cantonese', 'japanese',
'mandarin','american'],['mandarin','english'],
['american', 'mandarin','cantonese']]})```
df_a
Language LangugeCode
0 [cantonese, japanese, mandarin, american] [zh_yue,ja,cmn,us]
1 [mandarin, english] [cmn,en]
2 [american, mandarin, cantonese] [us,cmn,zh_yue'
答案 0 :(得分:0)
我假设您有一本字典来关联语言和语言代码,然后使用地图。
请检查它是否对您有帮助:
import pandas as pd
import numpy as np
df_a = pd.DataFrame({'Language':[['cantonese', 'japanese',
'mandarin','american'],['mandarin','english'],
['american', 'mandarin','cantonese']]})
#this is the hypothetical dictionary
lang_codes = {'cantonese': 'zh_yue','japanese': 'ja', 'mandarin': 'cmn','american': 'us','english': 'en'}
df_a['Language Code'] = [list(map(lambda x: lang_codes[x], row)) for row in df_a.Language]
#getting the numpy array format
language_code = np.array(df_a['Language Code'])
type(language_code)
numpy.ndarray
您的数据框将是:
Language Language Code
0 [cantonese, japanese, mandarin, american] [zh_yue, ja, cmn, us]
1 [mandarin, english] [cmn, en]
2 [american, mandarin, cantonese] [us, cmn, zh_yue]