如果HDI值大于.5,则将新列添加到名为ADJ_HDI的homework2数据帧,即HDI值。否则等于零。
我们已经花了好几个小时为此创建语法而没有运气,有人可以帮忙吗?
答案 0 :(得分:0)
尝试这一点,假设您的HDI位于名为“HDI”的列中,并且您尝试创建一个等于HDI的新列,或者如果HDI <0则创建0。 0.5
def adj_hdi(row):
hdi = row['HDI']
if hdi>.5:
return hdi
else:
return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)
答案 1 :(得分:0)
替代解决方案:
homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']
答案 2 :(得分:0)
我认为您可以使用numpy.where
非常快速的解决方案:
homework2['ADJ_HDI'] = np.where(homework2['HDI'] > .5, homework2['HDI'], 0)
<强>计时强>:
import pandas as pd
import numpy as np
homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2],
"HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]})
#for test 7k uncomment row bellow
#homework2 = pd.concat([homework2]*1000).reset_index(drop=True)
print homework2
h = homework2.copy()
h1 = homework2.copy()
def a(mydataframe):
def adj_hdi(row):
hdi = row['HDI']
if hdi>.5:
return hdi
else:
return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)
return mydataframe
def b(homework2):
homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']
return homework2
def c(homework2):
homework2['ADJ_HDI'] = np.where(homework2['HDI'] > .5, homework2['HDI'], 0)
return homework2
print a(homework2)
print b(h)
print c(h1)
len(homework2) = 7
:
In [2]: %timeit a(homework2)
1000 loops, best of 3: 376 µs per loop
In [3]: %timeit b(h)
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 1.49 ms per loop
In [4]: %timeit c(h1)
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 283 µs per loop
len(homework2) = 7k
:
In [7]: %timeit a(homework2)
10 loops, best of 3: 106 ms per loop
In [8]: %timeit b(h)
100 loops, best of 3: 2.63 ms per loop
In [9]: %timeit c(h1)
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 324 µs per loop