我有一个包含9行的数据框。我想将前三行乘以一个值,将第二行乘以第二个值,将第三行乘以另一个值。
我使用这些变量:
import pandas as pd
df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))
a = pd.Series(range(3))
print df
A B C D E
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8
我能够让它像这样工作:
for i, e in a.iteritems():
start, end = i * len(a), (i + 1) * len(a)
df.iloc[start:end] *= e
print df
A B C D E
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 12 12 12 12 12
7 14 14 14 14 14
8 16 16 16 16 16
答案 0 :(得分:3)
你可以使用numpy重塑
df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
答案 1 :(得分:2)
另一个解决方案多df
mul
numpy array
numpy.repeat
扩展{/ 3}}:
print (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
A B C D E
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 12 12 12 12 12
7 14 14 14 14 14
8 16 16 16 16 16
计时 - (len(df)=9
):
In [20]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
The slowest run took 6.12 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 197 µs per loop
In [21]: %%timeit
...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
The slowest run took 6.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 199 µs per loop
时间安排的代码 - (len(df)=90k
):
df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))
df = pd.concat([df]*10000).reset_index(drop=True)
a = pd.Series(range(3000))
print (df)
计时 - (len(df)=90k
):
In [24]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
100 loops, best of 3: 3.58 ms per loop
In [33]: %%timeit
...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
...:
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop
In [34]: %%timeit
...: df.iloc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
...:
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop