Question

我有以下数据框df：

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing
ABC            5      6     10     2015
BCD            6      7     3      2016        
DEF            10     4     5      2017
GHI            8      7     10     2016

我想在他们加入邮件列表的那一年查找客户的价值并将其保存在新列中。

输出将是：

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing | Purchases_1st_year
ABC            5      6     10     2015                       5
BCD            6      7     3      2016                       7       
DEF            10     4     5      2017                       5
GHI            8      9     10     2016                       9

我在python中找到了一些匹配vlookup的解决方案，但没有一个会使用其他列的标头。

Answer 1

使用pd.DataFrame.lookup
请记住，我假设Customer_ID是索引。

df.lookup(df.index, df.Year_joined_mailing)

array([5, 7, 5, 7])

df.assign(
    Purchases_1st_year=df.lookup(df.index, df.Year_joined_mailing)
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

但是，您必须小心比较第一年列中列名和整数中的可能字符串...

确保类型比较的核选项得到尊重。

df.assign(
    Purchases_1st_year=df.rename(columns=str).lookup(
        df.index, df.Year_joined_mailing.astype(str)
    )
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

Answer 2

你可以申请＆＃34;申请＆＃34;到每一行

df.apply(lambda x: x[x['Year_joined_mailing']],axis=1)

Answer 3

我会这样做，假设列标题和Year_joined_mailing是相同的数据类型，并且所有Year_joined_mailing值都是有效列。如果数据类型不相同，您可以通过在适当的位置添加str()或int()来进行转换。

df['Purchases_1st_year'] = [df[df['Year_joined_mailing'][i]][i] for i in df.index]

我们在这里做的是迭代数据框中的索引以获取该索引的'Year_joined_mailing'字段，然后使用它来获取我们想要的列，并再次从列中选择该索引，这一切都列在一个列表中并将其分配给我们的新列'Year_joined_mailing'

如果您的'Year_joined_mailing'列不一定是有效的列名，请尝试：

from numpy import nan
new_col = []
for i in df.index:
    try:
        new_col.append(df[df['Year_joined_mailing'][i]][i])
    except IndexError:
        new_col.append(nan) #or whatever null value you want here)
df['Purchases_1st_year'] = new_col

此较长的代码段完成相同的操作，但如果'Year_joined_mailing'不在df.columns

，则不会中断

Python Pandas根据标题值

3 个答案: