使用Function优化设置Pandas列

时间:2017-04-13 15:13:04

标签: python pandas dataframe

我有一项任务是根据用于创建所述DataFrame的文件在我的DataFrame中创建一个列。我可以通过下面的代码示例来解决这个问题,但我认为有更好的方法可以解决这个问题。我很确定我可以跳过创建列并将其设置为零的步骤:dfp['F'] = 0并且可能使函数更清晰。

您如何优化此代码?:

import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
                    'C' : ['AA1233445','A9875', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
print(dfp)

file2 = r'desktop\somefolder\foo.txt'
def filename():
    if 'foo' in file2.lower():
        return 'foo'
    elif 'bar' in file2.lower():
        return 'bar'

dfp['F'] = 0
dfp['F'] = dfp['F'] = filename()

print(dfp)

PS:我通常使用pd.read_excel()在数据框中读取,因此在函数中使用文件名。还使用pandas版本0.19.2

2 个答案:

答案 0 :(得分:2)

不需要为每一行执行此操作。您可以执行一次并将其填充到整个列中。

使用re模块

import re


fnames = re.findall('(foo|bar)', file2)
fname = fnames[0] if fnames else None

dfp['F'] = fname

dfp

     A    B           C            D           E    F
0  NaN  1.0   AA1233445     123456.0      Assign  foo
1  NaN  0.0       A9875     123456.0    Unassign  foo
2  3.0  3.0       rmacy    1234567.0      Assign  foo
3  4.0  5.0    Idaho Rx   12345678.0        Ugly  foo
4  5.0  0.0    Ab123455      12345.0  Appreciate  foo
5  5.0  0.0    TV192837      12345.0        Undo  foo
6  3.0  NaN          RX   12345678.0      Assign  foo
7  1.0  9.0  Ohio Drugs  123456789.0    Unicycle  foo
8  5.0  0.0     RX12345    1234567.0      Assign  foo
9  NaN  0.0  USA Pharma          NaN     Unicorn  foo

答案 1 :(得分:0)

我可能错过了这一点,但这里是你如何将文件名分配给列:

import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
                    'C' : ['AA1233445','A9875', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
file2 = r'desktop\somefolder\foo.txt'
filename = file2.split('\\')[-1].split('.')[0]
dfp['F'] = filename
print(dfp)