根据条件在熊猫数据中创建另一个numpy.array

时间:2020-10-05 23:44:09

标签: python pandas numpy-ndarray

我有一个数据框df_a,带有一个名为“语言”的numpy数组。我想根据语言和与语言相关的语言代码创建另一个numpy数组LanguageCode。

df_a = pd.DataFrame({'Language':[['cantonese', 'japanese', 
                 'mandarin','american'],['mandarin','english'], 
                 ['american', 'mandarin','cantonese']]})```

df_a

     Language                                  LangugeCode
0   [cantonese, japanese, mandarin, american]  [zh_yue,ja,cmn,us]
1   [mandarin, english]                        [cmn,en]
2   [american, mandarin, cantonese]            [us,cmn,zh_yue'

1 个答案:

答案 0 :(得分:0)

我假设您有一本字典来关联语言和语言代码,然后使用地图。

请检查它是否对您有帮助:

假设:

import pandas as pd
import numpy as np

df_a = pd.DataFrame({'Language':[['cantonese', 'japanese', 
                 'mandarin','american'],['mandarin','english'], 
                 ['american', 'mandarin','cantonese']]})

#this is the hypothetical dictionary
lang_codes = {'cantonese': 'zh_yue','japanese': 'ja', 'mandarin': 'cmn','american': 'us','english': 'en'}

您可以做什么:

df_a['Language Code'] = [list(map(lambda x: lang_codes[x], row)) for row in df_a.Language]

正在检查:

#getting the numpy array format
language_code = np.array(df_a['Language Code'])

type(language_code)

numpy.ndarray

您的数据框将是:

    Language                                    Language Code
0   [cantonese, japanese, mandarin, american]   [zh_yue, ja, cmn, us]
1   [mandarin, english]                         [cmn, en]
2   [american, mandarin, cantonese]             [us, cmn, zh_yue]