我有一项任务是根据用于创建所述DataFrame的文件在我的DataFrame中创建一个列。我可以通过下面的代码示例来解决这个问题,但我认为有更好的方法可以解决这个问题。我很确定我可以跳过创建列并将其设置为零的步骤:dfp['F'] = 0
并且可能使函数更清晰。
您如何优化此代码?:
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN],
'B' : [1,0,3,5,0,0,np.NaN,9,0,0],
'C' : ['AA1233445','A9875', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
print(dfp)
file2 = r'desktop\somefolder\foo.txt'
def filename():
if 'foo' in file2.lower():
return 'foo'
elif 'bar' in file2.lower():
return 'bar'
dfp['F'] = 0
dfp['F'] = dfp['F'] = filename()
print(dfp)
PS:我通常使用pd.read_excel()
在数据框中读取,因此在函数中使用文件名。还使用pandas版本0.19.2
答案 0 :(得分:2)
不需要为每一行执行此操作。您可以执行一次并将其填充到整个列中。
使用re
模块
import re
fnames = re.findall('(foo|bar)', file2)
fname = fnames[0] if fnames else None
dfp['F'] = fname
dfp
A B C D E F
0 NaN 1.0 AA1233445 123456.0 Assign foo
1 NaN 0.0 A9875 123456.0 Unassign foo
2 3.0 3.0 rmacy 1234567.0 Assign foo
3 4.0 5.0 Idaho Rx 12345678.0 Ugly foo
4 5.0 0.0 Ab123455 12345.0 Appreciate foo
5 5.0 0.0 TV192837 12345.0 Undo foo
6 3.0 NaN RX 12345678.0 Assign foo
7 1.0 9.0 Ohio Drugs 123456789.0 Unicycle foo
8 5.0 0.0 RX12345 1234567.0 Assign foo
9 NaN 0.0 USA Pharma NaN Unicorn foo
答案 1 :(得分:0)
我可能错过了这一点,但这里是你如何将文件名分配给列:
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN],
'B' : [1,0,3,5,0,0,np.NaN,9,0,0],
'C' : ['AA1233445','A9875', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
file2 = r'desktop\somefolder\foo.txt'
filename = file2.split('\\')[-1].split('.')[0]
dfp['F'] = filename
print(dfp)