寻找使用字典

时间:2017-08-22 15:25:51

标签: python python-3.x pandas dataframe

我正在尝试向现有的pandas数据框添加两个新列。我已经使用带有多个if else语句的python函数实现了它。但我认为这不是最好的方法,如果我可以使用字典或其他方法来实现同样的目标吗?

我使用下面的代码添加新列:

import pandas as pd
df = pd.DataFrame( {"col_1": [1234567, 45677890, 673214, 6709,98765,'',876543]} )
def func(col_1):
    col_1=str(col_1)

    if col_1=="":
        return "NA",""
    elif col_1[0:3]=='123':
        return "some_text_1 "," other_text_1"
    elif col_1[0:3]=='456':
        return "some_text_2 ","other_text_2"
    elif col_1[0:2]=='67':
        return "some_text_3 ","other_text_3"
    elif col_1[0:1]=='9':
        return "some_text_4 ","other_text_4"
    else:
        return "Other","Other"

df["col_2"],df["col_3"]=zip(*df["col_1"].map(func))
print(df)


        col_1         col_2          col_3
    0   1234567  some_text_1    other_text_1
    1  45677890  some_text_2    other_text_2
    2    673214  some_text_3    other_text_3
    3      6709  some_text_3    other_text_3
    4     98765  some_text_4    other_text_4
    5                      NA               
    6    876543         Other          Other    

所以我想在这里找到,因为我有多个if和else语句什么是最好的方法来实现相同。如果我使用字典或任何其他方法,任何指针将不胜感激。

1 个答案:

答案 0 :(得分:2)

你的方法可能很慢,因为它没有矢量化。这是另一种方法:

temp = df['col_1'].astype(str)
df = df.assign(col_2='Other', col_3='Other')
df.loc[temp.str[0] == '9', ['col_2', 'col_3']] = ('some_text_4 ', 'other_text_4')
df.loc[temp.str[0:2] == '67', ['col_2', 'col_3']] = ('some_text_3 ', 'other_text_3')
df.loc[temp.str[0:3] == '456', ['col_2', 'col_3']] = ('some_text_2 ', 'other_text_2')
df.loc[temp.str[0:3] == '123', ['col_2', 'col_3']] = ('some_text_1 ', 'other_text_1')
df.loc[temp == "", ['col_2', 'col_3']] = ("NA", "")
>>> df
      col_1         col_2         col_3
0   1234567  some_text_1   other_text_1
1  45677890  some_text_2   other_text_2
2    673214  some_text_3   other_text_3
3      6709  some_text_3   other_text_3
4     98765  some_text_4   other_text_4
5                      NA              
6    876543         Other         Other

这个想法是你正在颠倒你的if / else语句的顺序,以便你首先执行最不重要的。后续规则优先,可能会覆盖其上方的规则。