熊猫:如果列“ C”的值为“ x”,则更新列“ E”的值

时间:2019-02-21 17:01:56

标签: python pandas

我有一个数据框,如:

           A           B       C       D  E
0      NSPNT  'ACTENRGY'  'XD01'   'DSU'  F
1      NSPNT  'ACTENRGY'  'XD21'   'DSU'  F
2      NSPNT  'ACTENRGY'  'XD22'   'DSU'  F
3      NSPNT  'ACTENRGY'  'XD23'   'DSU'  F
4      NSPNT  'ACTENRGY'  'XD24'   'DSU'  F
5      NSPNT  'ACTENRGY'  'XD25'   'DSU'  F
6      NSPNT  'ACTENRGY'  'XD01'   'DSU'  F
7      NSPNT  'ACTENRGY'   'ACK'  'MISC'  F
8      NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
9      NSPNT  'ACTENRGY'   'ACK'  'MISC'  F
10     NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
11     NSPNT  'ACTENRGY'   'ACK'  'MISC'  F
12     NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
13     NSPNT  'ACTENRGY'   'ACF'  'MISC'  F
14     NSPNT  'ACTENRGY'   'ASF'  'MISC'  F
15     NSPNT  'ACTENRGY'   'DEF'  'MISC'  F
16     NSPNT  'ACTENRGY'   'RLR'   'RLR'  T

我要达到的目的是当列'C'=='ACK'使列'E'= T 到目前为止,我已经尝试了以下方法:

import os
import pandas as pd


source_folder = 'D:/NSSCDB/STTS_RCL_Export/'

def editNSPNT():

    for somefile in os.listdir(source_folder):
        if (somefile.startswith(('nsscdb_output_dts')) and 
somefile.endswith(('.txt'.lower()))):

            df = pd.read_csv(source_folder + somefile, encoding='utf-8', names = ['A','B','C','D','E'], header=4)
            #for x in df['C']:
                #if (x == 'ACK'):
                    #df['E'] = 'T'
            #df.E = ["T" if x == "ACK" for x in df.C]
            df.loc[(df.C=='ACK')]['E'] = 'T'

            print(df)



def main():

    editNSPNT()


if __name__== "__main__":
    main()   

我一直在尝试的两种方法都没有用,有人可以告诉我我做错了什么吗? 谢谢。

3 个答案:

答案 0 :(得分:3)

数据中的单引号是否会引起问题?

df.loc[df['C'] == "'ACK'",'E'] = 'T'

使用双引号

        A           B       C       D  E
0   NSPNT  'ACTENRGY'  'XD01'   'DSU'  F
1   NSPNT  'ACTENRGY'  'XD21'   'DSU'  F
2   NSPNT  'ACTENRGY'  'XD22'   'DSU'  F
3   NSPNT  'ACTENRGY'  'XD23'   'DSU'  F
4   NSPNT  'ACTENRGY'  'XD24'   'DSU'  F
5   NSPNT  'ACTENRGY'  'XD25'   'DSU'  F
6   NSPNT  'ACTENRGY'  'XD01'   'DSU'  F
7   NSPNT  'ACTENRGY'   'ACK'  'MISC'  T
8   NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
9   NSPNT  'ACTENRGY'   'ACK'  'MISC'  T
10  NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
11  NSPNT  'ACTENRGY'   'ACK'  'MISC'  T
12  NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
13  NSPNT  'ACTENRGY'   'ACF'  'MISC'  F
14  NSPNT  'ACTENRGY'   'ASF'  'MISC'  F
15  NSPNT  'ACTENRGY'   'DEF'  'MISC'  F
16  NSPNT  'ACTENRGY'   'RLR'   'RLR'  T

答案 1 :(得分:2)

使用numpy.where()的解决方案:

df.E=np.where(df.C.eq("'ACK'"),'T',df.E)
print(df)

输出:

        A           B       C       D  E
0   NSPNT  'ACTENRGY'  'XD01'   'DSU'  F
1   NSPNT  'ACTENRGY'  'XD21'   'DSU'  F
2   NSPNT  'ACTENRGY'  'XD22'   'DSU'  F
3   NSPNT  'ACTENRGY'  'XD23'   'DSU'  F
4   NSPNT  'ACTENRGY'  'XD24'   'DSU'  F
5   NSPNT  'ACTENRGY'  'XD25'   'DSU'  F
6   NSPNT  'ACTENRGY'  'XD01'   'DSU'  F
7   NSPNT  'ACTENRGY'   'ACK'  'MISC'  T
8   NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
9   NSPNT  'ACTENRGY'   'ACK'  'MISC'  T
10  NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
11  NSPNT  'ACTENRGY'   'ACK'  'MISC'  T
12  NSPNT  'ACTENRGY'   'ACU'  'MISC'  F
13  NSPNT  'ACTENRGY'   'ACF'  'MISC'  F
14  NSPNT  'ACTENRGY'   'ASF'  'MISC'  F
15  NSPNT  'ACTENRGY'   'DEF'  'MISC'  F
16  NSPNT  'ACTENRGY'   'RLR'   'RLR'  T

答案 2 :(得分:1)

仅纠正loc()

df.loc[df.C == "'ACK'", 'E'] = 'T'

结果是:

    A   B   C   D   E
0   NSPNT   'ACTENRGY'  'XD01'  'DSU'   F
1   NSPNT   'ACTENRGY'  'XD21'  'DSU'   F
2   NSPNT   'ACTENRGY'  'XD22'  'DSU'   F
3   NSPNT   'ACTENRGY'  'XD23'  'DSU'   F
4   NSPNT   'ACTENRGY'  'XD24'  'DSU'   F
5   NSPNT   'ACTENRGY'  'XD25'  'DSU'   F
6   NSPNT   'ACTENRGY'  'XD01'  'DSU'   F
7   NSPNT   'ACTENRGY'  'ACK'   'MISC'  T
8   NSPNT   'ACTENRGY'  'ACU'   'MISC'  F
9   NSPNT   'ACTENRGY'  'ACK'   'MISC'  T
10  NSPNT   'ACTENRGY'  'ACU'   'MISC'  F
11  NSPNT   'ACTENRGY'  'ACK'   'MISC'  T
12  NSPNT   'ACTENRGY'  'ACU'   'MISC'  F
13  NSPNT   'ACTENRGY'  'ACF'   'MISC'  F
14  NSPNT   'ACTENRGY'  'ASF'   'MISC'  F
15  NSPNT   'ACTENRGY'  'DEF'   'MISC'  F
16  NSPNT   'ACTENRGY'  'RLR'   'RLR'   T

使用原始代码,您首先要对数据框进行切片(不考虑简单的引号):

df.loc[(df.C=='ACK')]

然后为切片的数据帧的列E分配一个值。

['E'] = 'T'

换句话说,您正在更新切片,而不是数据帧本身。

来自Pandas文档:

  

.loc []主要基于标签,但也可以与布尔值一起使用   数组。

分解代码:

df.loc[df.C == "'ACK'", 'E']

df.C == "'ACK'"将返回一个标识行的布尔数组,字符串'E'将标识将立即接收新值的列,而不进行切片。