在编辑值

时间:2016-02-29 20:24:10

标签: python pandas

如果HDI值大于.5,则将新列添加到名为ADJ_HDI的homework2数据帧,即HDI值。否则等于零。

我们已经花了好几个小时为此创建语法而没有运气,有人可以帮忙吗?

3 个答案:

答案 0 :(得分:0)

尝试这一点,假设您的HDI位于名为“HDI”的列中,并且您尝试创建一个等于HDI的新列,或者如果HDI <0则创建0。 0.5

def adj_hdi(row):
    hdi = row['HDI']
    if hdi>.5:
        return hdi
    else:
        return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)

答案 1 :(得分:0)

替代解决方案:

homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']

答案 2 :(得分:0)

我认为您可以使用numpy.where非常快速的解决方案:

homework2['ADJ_HDI'] = np.where(homework2['HDI'] > .5, homework2['HDI'], 0)

<强>计时

import pandas as pd
import numpy as np

homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2],
                           "HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]})

#for test 7k uncomment row bellow  
#homework2 =  pd.concat([homework2]*1000).reset_index(drop=True)
print homework2
h = homework2.copy()
h1 = homework2.copy()
def a(mydataframe):
    def adj_hdi(row):
        hdi = row['HDI']
        if hdi>.5:
            return hdi
        else:
            return 0
    mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)
    return mydataframe

def b(homework2):
    homework2['ADJ_HDI'] = 0
    homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']
    return homework2

def c(homework2):
    homework2['ADJ_HDI'] = np.where(homework2['HDI'] > .5, homework2['HDI'], 0)
    return homework2 

print a(homework2)    
print b(h)  
print c(h1)

len(homework2) = 7

In [2]: %timeit a(homework2)
1000 loops, best of 3: 376 µs per loop

In [3]: %timeit b(h)
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.49 ms per loop

In [4]: %timeit c(h1)
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 283 µs per loop

len(homework2) = 7k

In [7]: %timeit a(homework2)
10 loops, best of 3: 106 ms per loop

In [8]: %timeit b(h)
100 loops, best of 3: 2.63 ms per loop

In [9]: %timeit c(h1)
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 324 µs per loop