逐行(pandas) - 如果列A ='Something'和B列> 25然后C列=“类别”

时间:2017-05-30 22:34:01

标签: python pandas if-statement return rows

我正在尝试使用(pandas)在Python中逐行使用脚本。我想要它,如果列A ='Something'和B列> 25然后写入C列=“类别”。

我有月份列和日期列。所以,例如:

当月= 8月和日> = 25然后周= 8月25日

我尝试了几件事,但都没有效果......

首先我尝试了:

import os               ### OS library is imported.
import pandas as pd     ### Pandas library is imporated as 'pd'.

counter = 1             ### Counter starts at the first iteration.

while os.path.exists("CSV-Iteration-"'{0}'"/".format(counter)):     ### Runs the loop until all iteration's folders have been processed.

    a = pd.read_csv("output-"'{0}'".csv".format(counter))           ### Sets 'a' dataframe as holding data from a CSV file.
    a['Week'] = ""

    a[(a['Month'] is 'June') & (a['Day'] < 25)]['Week'] = 'June 18'
    a[(a['Month'] is 'June') & (a['Day'] >= 25)]['Week'] = 'June 25'
    a[(a['Month'] is 'July') & (a['Day'] < 2)]['Week'] = 'June 25'
    a[(a['Month'] is 'July') & (a['Day'] >= 2) & (a['Day'] < 9)]['Week'] = 'July 2'
    a[(a['Month'] is 'July') & (a['Day'] >= 9) & (a['Day'] < 16)]['Week'] = 'July 9'
    a[(a['Month'] is 'July') & (a['Day'] >= 16) & (a['Day'] < 23)]['Week'] = 'July 16'
    a[(a['Month'] is 'July') & (a['Day'] >= 23) & (a['Day'] < 30)]['Week'] = 'July 23'
    a[(a['Month'] is 'July') & (a['Day'] >= 31) & (a['Day'] < 16)]['Week'] = 'July 30'
    a[(a['Month'] is 'August') & (a['Day'] < 6)]['Week'] = 'July 30'

    a[(a['Month'] is 'August') & (a['Day'] >= 6) & (a['Day'] < 13)]['Week'] = 'August 6'
    a[(a['Month'] is 'August') & (a['Day'] >= 13) & (a['Day'] < 20)]['Week'] = 'August 13'
    a[(a['Month'] is 'August') & (a['Day'] >= 20) & (a['Day'] < 27)]['Week'] = 'August 20'
    a[(a['Month'] is 'August') & (a['Day'] >= 27)]['Week'] = 'August 27'
    a[(a['Month'] is 'September') & (a['Day'] < 3)]['Week'] = 'August 27'

    a[(a['Month'] is 'September') & (a['Day'] >= 3) & (a['Day'] < 10)]['Week'] = 'September 3'
    a[(a['Month'] is 'September') & (a['Day'] >= 10) & (a['Day'] < 17)]['Week'] = 'September 10'
    a[(a['Month'] is 'September') & (a['Day'] >= 17) & (a['Day'] < 24)]['Week'] = 'September 17'
    a[(a['Month'] is 'September') & (a['Day'] >= 24)] = 'September 24'

    a[(a['Month'] is 'October') & (a['Day'] >= 1) & (a['Day'] < 8)]['Week'] = 'October 1'
    a[(a['Month'] is 'October') & (a['Day'] >= 8) & (a['Day'] < 15)]['Week'] = 'October 8'
    a[(a['Month'] is 'October') & (a['Day'] >= 15) & (a['Day'] < 22)]['Week'] = 'October 15'
    a[(a['Month'] is 'October') & (a['Day'] >= 22) & (a['Day'] < 29)]['Week'] = 'October 22'
    a[(a['Month'] is 'October') & (a['Day'] >= 29)]['Week'] = 'October 29'
    a[(a['Month'] is 'November') & (a['Day'] < 5)]['Week'] = 'October 29'

    a[(a['Month'] is 'November') & (a['Day'] >= 5) & (a['Day'] < 12)]['Week'] = 'November 5'
    a[(a['Month'] is 'November') & (a['Day'] >= 12) & (a['Day'] < 19)]['Week'] = 'November 12'
    a[(a['Month'] is 'November') & (a['Day'] >= 19) & (a['Day'] < 26)]['Week'] = 'November 19'
    a[(a['Month'] is 'November') & (a['Day'] >= 26)]['Week'] = 'November 26'
    a[(a['Month'] is 'December') & (a['Day'] < 3)]['Week'] = 'November 26'

    a[(a['Month'] is 'December') & (a['Day'] >= 3) & (a['Day'] < 10)]['Week'] = 'December 3'
    a[(a['Month'] is 'December') & (a['Day'] >= 10) & (a['Day'] < 17)]['Week'] = 'December 10'
    a[(a['Month'] is 'December') & (a['Day'] >= 17) & (a['Day'] < 24)]['Week'] = 'December 17'
    a[(a['Month'] is 'December') & (a['Day'] >= 24) & (a['Day'] < 31)]['Week'] = 'December 24'
    a[(a['Month'] is 'December') & (a['Day'] >= 31)]['Week'] = 'December 31'
    a[(a['Month'] is 'January') & (a['Day'] < 7)]['Week'] = 'December 31'

    a[(a['Month'] is 'January') & (a['Day'] >= 7) & (a['Day'] < 14)]['Week'] = 'January 7'
    a[(a['Month'] is 'January') & (a['Day'] >= 14) & (a['Day'] < 21)]['Week'] = 'January 14'
    a[(a['Month'] is 'January') & (a['Day'] >= 21) & (a['Day'] < 28)]['Week'] = 'January 21'
    a[(a['Month'] is 'January') & (a['Day'] >= 28)]['Week'] = 'January 28'

    a.to_csv("TESToutput-"'{0}'".csv".format(counter), index=False)         ### 'a' dataframe becomes 'TESToutput-#.csv' and does not print fields for indexing (index=False).

    counter += 1        ### Adds 1 to the counter.

print 'Date Corrections - All Done!'

然后我尝试了:

import os               ### OS library is imported.
import pandas as pd     ### Pandas library is imporated as 'pd'.

counter = 1             ### Counter starts at the first iteration.

while os.path.exists("CSV-Iteration-"'{0}'"/".format(counter)):     ### Runs the loop until all iteration's folders have been processed.

    a = pd.read_csv("output-"'{0}'".csv".format(counter))           ### Sets 'a' dataframe as holding data from a CSV file.
    a['Week'] = ""

    def this_week (row):
        if row[(a['Month'] is 'June') + (a['Day'] < 25)]:
            return 'June 18'
        if row[(a['Month'] is 'June') + (a['Day'] >= 25)]:
            return 'June 25'
        if row[(a['Month'] is 'July') + (a['Day'] < 2)]:
            return 'June 25'
        if row[(a['Month'] is 'July') + (a['Day'] >= 2) + (a['Day'] < 9)]:
            return 'July 2'
        if row[(a['Month'] is 'July') + (a['Day'] >= 9) + (a['Day'] < 16)]:
            return 'July 9'
        if row[(a['Month'] is 'July') + (a['Day'] >= 16) + (a['Day'] < 23)]:
            return 'July 16'
        if row[(a['Month'] is 'July') + (a['Day'] >= 23) + (a['Day'] < 30)]:
            return 'July 23'
        if row[(a['Month'] is 'July') + (a['Day'] >= 31) + (a['Day'] < 16)]:
            return 'July 30'
        if row[(a['Month'] is 'August') + (a['Day'] < 6)]:
            return 'July 30'
        if row[(a['Month'] is 'August') + (a['Day'] >= 6) + (a['Day'] < 13)]:
            return 'August 6'
        if row[(a['Month'] is 'August') + (a['Day'] >= 13) + (a['Day'] < 20)]:
            return 'August 13'
        if row[(a['Month'] is 'August') + (a['Day'] >= 20) + (a['Day'] < 27)]:
            return 'August 20'
        if row[(a['Month'] is 'August') + (a['Day'] >= 27)]:
            return 'August 27'
        if row[(a['Month'] is 'September') + (a['Day'] < 3)]:
            return 'August 27'
        if row[(a['Month'] is 'September') + (a['Day'] >= 3) + (a['Day'] < 10)]:
            return 'September 3'
        if row[(a['Month'] is 'September') + (a['Day'] >= 10) + (a['Day'] < 17)]:
            return 'September 10'
        if row[(a['Month'] is 'September') + (a['Day'] >= 17) + (a['Day'] < 24)]:
            return 'September 17'
        if row[(a['Month'] is 'September') + (a['Day'] >= 24)]:
            return 'September 24'
        if row[(a['Month'] is 'October') + (a['Day'] >= 1) + (a['Day'] < 8)]:
            return 'October 1'
        if row[(a['Month'] is 'October') + (a['Day'] >= 8) + (a['Day'] < 15)]:
            return 'October 8'
        if row[(a['Month'] is 'October') + (a['Day'] >= 15) + (a['Day'] < 22)]:
            return 'October 15'
        if row[(a['Month'] is 'October') + (a['Day'] >= 22) + (a['Day'] < 29)]:
            return 'October 22'
        if row[(a['Month'] is 'October') + (a['Day'] >= 29)]:
            return 'October 29'
        if row[(a['Month'] is 'November') + (a['Day'] < 5)]:
            return 'October 29'
        if row[(a['Month'] is 'November') + (a['Day'] >= 5) + (a['Day'] < 12)]:
            return 'November 5'
        if row[(a['Month'] is 'November') + (a['Day'] >= 12) + (a['Day'] < 19)]:
            return 'November 12'
        if row[(a['Month'] is 'November') + (a['Day'] >= 19) + (a['Day'] < 26)]:
            return 'November 19'
        if row[(a['Month'] is 'November') + (a['Day'] >= 26)]:
            return 'November 26'
        if row[(a['Month'] is 'December') + (a['Day'] < 3)]:
            return 'November 26'
        if row[(a['Month'] is 'December') + (a['Day'] >= 3) + (a['Day'] < 10)]:
            return 'December 3'
        if row[(a['Month'] is 'December') + (a['Day'] >= 10) + (a['Day'] < 17)]:
            return 'December 10'
        if row[(a['Month'] is 'December') + (a['Day'] >= 17) + (a['Day'] < 24)]:
            return 'December 17'
        if row[(a['Month'] is 'December') + (a['Day'] >= 24) + (a['Day'] < 31)]:
            return 'December 24'
        if row[(a['Month'] is 'December') + (a['Day'] >= 31)]:
            return 'December 31'
        if row[(a['Month'] is 'January') + (a['Day'] < 7)]:
            return 'December 31'
        if row[(a['Month'] is 'January') + (a['Day'] >= 7) + (a['Day'] < 14)]:
            return 'January 7'
        if row[(a['Month'] is 'January') + (a['Day'] >= 14) + (a['Day'] < 21)]:
            return 'January 14'
        if row[(a['Month'] is 'January') + (a['Day'] >= 21) + (a['Day'] < 28)]:
            return 'January 21'
        if row[(a['Month'] is 'January') + (a['Day'] >= 28)]:
            return 'January 28'

    a['Week'] = a.apply (lambda row: this_week (row), axis=1)

    a.to_csv("TESToutput-"'{0}'".csv".format(counter), index=False)         ### 'a' dataframe becomes 'TESToutput-#.csv' and does not print fields for indexing (index=False).

    counter += 1        ### Adds 1 to the counter.

print 'Date Corrections - All Done!'

第二个给了我这个错误:“IndexingError :('Unalignable boolean Series key provided',u'occurred at index 0')”

我是Python的新手,所以我根据我在论坛中阅读的内容将这些内容放在一起。如果有更简单的方法,或者如果有更正或补充使这两个脚本中的一个有效,请告诉我。

谢谢!

=============================================== =====

BREAK - 以下新信息

这是数据文件的样子(减去多余的列)。

Month       Day C.Sym   F.Sym   D.Sym
September   3   1               1
September   27  1       
October     14          1   
October     15          1   
October     17          1   
October     21          1   
October     29          1   
November    30  1               
December    16          1       1
December    17          1           
December    27          1   
January     6           1   
January     8   1               
January     20          1   

我想添加一个列,用于检查Month和Day列以分配“Week”IE。下面:

Month       Day C.Sym   F.Sym   D.Sym    Week
September   3   1               1        Sept 3
September   27  1                        Sept 24
October     14          1                Oct 8
October     15          1                Oct 15
October     17          1                Oct 15
October     21          1                Oct 15
October     29          1                Oct 29
November    30  1                        Oct 29
December    16          1       1        Dec 10
December    17          1                Dec 10
December    27          1                Dec 24
January     6           1                Dec 31
January     8   1                        Jan 7
January     20          1                Jan 14

我现在想要合并的一个elif的例子:

    elif a[(a['Month'] is 'January') & (a['Day'] >= 14) & (a['Day'] < 21)]:
        ['Week'] = 'January 14'

我希望这更具体,更有帮助...

1 个答案:

答案 0 :(得分:0)

>>> df = pd.DataFrame({'column_A': ['something', 'day', 'something'], 'column_B' : [30, 40, 10]})
>>> df
    column_A  column_B
0  something        30
1        day        40
2  something        10


>>> df = df.assign(column_C=((df.column_A == 'something') & (df.column_B > 25)))
>>> df.column_C.replace(True, 'something', inplace=True)
>>> df
    column_A  column_B   column_C
0  something        30  something
1        day        40      False
2  something        10      False