我是python的新手,
我的数据框有两列和多行:
Customer_Acquired_Date|Customer_mobile_number
1/20/2017|100000001
2/2/2017|100000002
2/12/2017|100000001
2/23/2017|100000004
3/1/2017|100000005
3/7/2017|100000004
我想添加一个名为" RepeatOrNew"的列。 此新列中的每个单元格将在相邻列的上述单元格中查找客户移动号码。如果它存在,则键入"重复",如果它不存在,则键入" New"。
输出:
Customer_Acquired_Date|Customer_mobile_number|RepeatOrNew
1/20/2017|100000001|New
2/2/2017|100000002|New
2/12/2017|100000001|Repeat
2/23/2017|100000004|New
3/1/2017|100000005|New
3/7/2017|100000004|Repeat
我完全空白从哪里开始。请协助。
谢谢, Ninad。
答案 0 :(得分:0)
您可以结合grouping
,使用cumcount
GroupBy
方法和numpy的where
函数来获得所需的输出。以下应该是一个不错的起点:
import pandas as pd
import numpy as np
from io import StringIO
data_stream = StringIO("""Customer_Acquired_Date|Customer_mobile_number
1/20/2017|100000001
2/2/2017|100000002
2/12/2017|100000001
2/23/2017|100000004
3/1/2017|100000005
3/7/2017|100000004""")
customers = pd.read_table(data_stream, sep="|", header=0)
counter = customers.groupby('Customer_mobile_number').cumcount()
customers['RepeatOrNew'] = np.where(counter == 0, 'New','Repeat')
或者单行:
customers['RepeatOrNew'] = customers.groupby('Customer_mobile_number').cumcount().apply(lambda x: 'New' if x == 0 else 'Repeat')
应该产生类似的东西:
Customer_Acquired_Date Customer_mobile_number RepeatOrNew
0 1/20/2017 100000001 New
1 2/2/2017 100000002 New
2 2/12/2017 100000001 Repeat
3 2/23/2017 100000004 New
4 3/1/2017 100000005 New
5 3/7/2017 100000004 Repeat
我希望这证明有用。