我有一个像这样的熊猫数据框:
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price':7, 'Check': 0},
{'Date': '2-10-19','Price':8.5, 'Check': 0},
{'Date': '3-10-19','Price':9, 'Check': 1},
{'Date': '4-10-19','Price':50, 'Check': 1},
{'Date': '5-10-19','Price':80, 'Check': 1},
{'Date': '6-10-19','Price':100, 'Check': 1}]
df = pd.DataFrame(raw_data)
df.set_index('Date')
这是它的样子:
Price Check
Date
1-10-19 7.0 0
2-10-19 8.5 0
3-10-19 9.0 1
4-10-19 50.0 1
5-10-19 80.0 1
6-10-19 100.0 1
现在我要做的是,对于“检查”为1的每一行,我想检查价格低于该行价格10%的那一行之前的行数。例如,对于价格为100的第六行,我要遍历前几行并计数行,直到价格小于10(100%的10%),在这种情况下,价格会比价格高3行是9。然后要将结果保存到新列中。
最终结果如下:
Price Check Rows_till_small
Date
1-10-19 7.0 0 NaN
2-10-19 8.5 0 NaN
3-10-19 9.0 1 Nan
4-10-19 50.0 1 NaN
5-10-19 80.0 1 4
6-10-19 100.0 1 3
我已经考虑了如何使用某种滚动功能来做到这一点,但我认为这是不可能的。我还考虑过使用iterrows或itertuples遍历整个DataFrame,但是我无法想象一种在效率极低的情况下进行迭代的方法。
答案 0 :(得分:1)
您可以通过以下方式解决该问题:
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price': 7, 'Check': 0},
{'Date': '2-10-19', 'Price': 8.5, 'Check': 0},
{'Date': '3-10-19', 'Price': 9, 'Check': 1},
{'Date': '4-10-19', 'Price': 50, 'Check': 1},
{'Date': '5-10-19', 'Price': 80, 'Check': 1},
{'Date': '6-10-19', 'Price': 100, 'Check': 1}]
df = pd.DataFrame(raw_data)
new_column = [None] * len(df["Price"]) # create new column
for i in range(len(df["Price"])):
if df['Check'][i] == 1:
percent_10 = df['Price'][i] * 0.1
for j in range(i, -1, -1):
print(j)
if df['Price'][j] < percent_10:
new_column[i] = i - j
break
df["New"] = new_column # add new column
print(df)
希望答案对您有用,请随时提问。
答案 1 :(得分:1)
检查一下
diff = df['Price'].apply(lambda x:x > (df['Price']*.1))
RTS=[]
for i in range(len(df)):
check = (diff)[i]
ind = check.idxmax()
if ind != 0:
val = (i-ind)+1
else:
val = np.nan
RTS.append(val)
df['Rows_till_small'] = RTS
print(df)
输出
Date Price Check Rows_till_small
0 1-10-19 7.0 0 NaN
1 2-10-19 8.5 0 NaN
2 3-10-19 9.0 1 NaN
3 4-10-19 50.0 1 NaN
4 5-10-19 80.0 1 4.0
5 6-10-19 100.0 1 3.0