我有一个数据框,如:
A B C D E
0 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
1 NSPNT 'ACTENRGY' 'XD21' 'DSU' F
2 NSPNT 'ACTENRGY' 'XD22' 'DSU' F
3 NSPNT 'ACTENRGY' 'XD23' 'DSU' F
4 NSPNT 'ACTENRGY' 'XD24' 'DSU' F
5 NSPNT 'ACTENRGY' 'XD25' 'DSU' F
6 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
7 NSPNT 'ACTENRGY' 'ACK' 'MISC' F
8 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
9 NSPNT 'ACTENRGY' 'ACK' 'MISC' F
10 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
11 NSPNT 'ACTENRGY' 'ACK' 'MISC' F
12 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
13 NSPNT 'ACTENRGY' 'ACF' 'MISC' F
14 NSPNT 'ACTENRGY' 'ASF' 'MISC' F
15 NSPNT 'ACTENRGY' 'DEF' 'MISC' F
16 NSPNT 'ACTENRGY' 'RLR' 'RLR' T
我要达到的目的是当列'C'=='ACK'使列'E'= T 到目前为止,我已经尝试了以下方法:
import os
import pandas as pd
source_folder = 'D:/NSSCDB/STTS_RCL_Export/'
def editNSPNT():
for somefile in os.listdir(source_folder):
if (somefile.startswith(('nsscdb_output_dts')) and
somefile.endswith(('.txt'.lower()))):
df = pd.read_csv(source_folder + somefile, encoding='utf-8', names = ['A','B','C','D','E'], header=4)
#for x in df['C']:
#if (x == 'ACK'):
#df['E'] = 'T'
#df.E = ["T" if x == "ACK" for x in df.C]
df.loc[(df.C=='ACK')]['E'] = 'T'
print(df)
def main():
editNSPNT()
if __name__== "__main__":
main()
我一直在尝试的两种方法都没有用,有人可以告诉我我做错了什么吗? 谢谢。
答案 0 :(得分:3)
数据中的单引号是否会引起问题?
df.loc[df['C'] == "'ACK'",'E'] = 'T'
使用双引号
A B C D E
0 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
1 NSPNT 'ACTENRGY' 'XD21' 'DSU' F
2 NSPNT 'ACTENRGY' 'XD22' 'DSU' F
3 NSPNT 'ACTENRGY' 'XD23' 'DSU' F
4 NSPNT 'ACTENRGY' 'XD24' 'DSU' F
5 NSPNT 'ACTENRGY' 'XD25' 'DSU' F
6 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
7 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
8 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
9 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
10 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
11 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
12 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
13 NSPNT 'ACTENRGY' 'ACF' 'MISC' F
14 NSPNT 'ACTENRGY' 'ASF' 'MISC' F
15 NSPNT 'ACTENRGY' 'DEF' 'MISC' F
16 NSPNT 'ACTENRGY' 'RLR' 'RLR' T
答案 1 :(得分:2)
使用numpy.where()
的解决方案:
df.E=np.where(df.C.eq("'ACK'"),'T',df.E)
print(df)
输出:
A B C D E
0 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
1 NSPNT 'ACTENRGY' 'XD21' 'DSU' F
2 NSPNT 'ACTENRGY' 'XD22' 'DSU' F
3 NSPNT 'ACTENRGY' 'XD23' 'DSU' F
4 NSPNT 'ACTENRGY' 'XD24' 'DSU' F
5 NSPNT 'ACTENRGY' 'XD25' 'DSU' F
6 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
7 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
8 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
9 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
10 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
11 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
12 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
13 NSPNT 'ACTENRGY' 'ACF' 'MISC' F
14 NSPNT 'ACTENRGY' 'ASF' 'MISC' F
15 NSPNT 'ACTENRGY' 'DEF' 'MISC' F
16 NSPNT 'ACTENRGY' 'RLR' 'RLR' T
答案 2 :(得分:1)
仅纠正loc():
df.loc[df.C == "'ACK'", 'E'] = 'T'
结果是:
A B C D E
0 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
1 NSPNT 'ACTENRGY' 'XD21' 'DSU' F
2 NSPNT 'ACTENRGY' 'XD22' 'DSU' F
3 NSPNT 'ACTENRGY' 'XD23' 'DSU' F
4 NSPNT 'ACTENRGY' 'XD24' 'DSU' F
5 NSPNT 'ACTENRGY' 'XD25' 'DSU' F
6 NSPNT 'ACTENRGY' 'XD01' 'DSU' F
7 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
8 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
9 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
10 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
11 NSPNT 'ACTENRGY' 'ACK' 'MISC' T
12 NSPNT 'ACTENRGY' 'ACU' 'MISC' F
13 NSPNT 'ACTENRGY' 'ACF' 'MISC' F
14 NSPNT 'ACTENRGY' 'ASF' 'MISC' F
15 NSPNT 'ACTENRGY' 'DEF' 'MISC' F
16 NSPNT 'ACTENRGY' 'RLR' 'RLR' T
使用原始代码,您首先要对数据框进行切片(不考虑简单的引号):
df.loc[(df.C=='ACK')]
然后为切片的数据帧的列E
分配一个值。
['E'] = 'T'
换句话说,您正在更新切片,而不是数据帧本身。
来自Pandas文档:
.loc []主要基于标签,但也可以与布尔值一起使用 数组。
分解代码:
df.loc[df.C == "'ACK'", 'E']
df.C == "'ACK'"
将返回一个标识行的布尔数组,字符串'E'
将标识将立即接收新值的列,而不进行切片。