我目前正在计算过去 10 年中大约 19,000 家公司的现金价格比。我将所有这些都放在一个数据框中,并且有 20 多个变量。我想解决的问题是,一旦引入新的股票代码,就重新启动滚动总和。我在下面编码的方式导致新股票的前三个条目也对列中它之前的股票的 Q_Cashflow 求和。
我编码如下:
df['K_Cashflow'] = df.Q_Cashflow.rolling(4).sum()
df['cash-to-price'] = df['K_Cashflow']/df['Market Cap']
输出是:
Ticker Symbol |Q_Cashflow |Market Cap |cash-to-price |K_Cashflow|
44 ADCT.1 | 16.9 |709.0700 |0.120157 | 85.2 |
45 ADCT.1 | 102.2 |718.7700 |0.310948 | 223.5 |
46 ADCT.1 | 136.6 |1231.5240 |0.260815 | 321.2 |
47 AAL | 456.0 |3034.1766 |0.234561 | 711.7 |
48 AAL | 1173.0 |2258.1468 |0.827138 | 1867.8 |
49 AAL | 1090.0 |2088.2862 |1.367437 | 2855.6 |
50 AAL | 1241.0 |2597.5755 |1.524499 | 3960.0 |
对于 K_Cashflow,第 47:50 行应该是 NaN。 对于每个不同的股票代码,我如何将 K_Cashflow 的前三个条目更改为 Nan?
答案 0 :(得分:3)
实现此目的的一种方法是根据股票代码创建排名列,然后将最低的三个排名分配给 nan。举个例子:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'ticker': ['a'] * 7 + ['b'] * 10,
'cash_flow': range(17),
})
# Create the rank
df['rank'] = df.groupby('ticker').rank()
# Set the first 3 instances of each ticker to nan
df.loc[df['rank'] < 4, ['cash_flow']] = np.nan
df
ticker cash_flow rank
0 a NaN 1.0
1 a NaN 2.0
2 a NaN 3.0
3 a 3.0 4.0
4 a 4.0 5.0
5 a 5.0 6.0
6 a 6.0 7.0
7 b NaN 1.0
8 b NaN 2.0
9 b NaN 3.0
10 b 10.0 4.0
11 b 11.0 5.0
12 b 12.0 6.0
13 b 13.0 7.0
14 b 14.0 8.0
15 b 15.0 9.0
16 b 16.0 10.0