Question

我正在玩@piRSquared提供的非常好的代码，这段代码可以在下面看到。

我添加了另一个条件if row[col2] == 4000，这只在我添加的附加列中看到过一次。正如预期的那样，这个附加代码的函数只产生一行，因为条件只出现一次。

我的问题是如何修改代码，然后在移动>= move_size之后产生另一行。

所需的输出是两行。一个是row['B'] == 4000（现在代码生成），另一个是在Col >= move_size中看到A移动时。我将这些视为交易进入和退出，因此根据下面显示的所需输出，在另一个数据框列df['C']中有一个订单ID会很好。

来自原始帖子的代码：

#starting python community conventions
import numpy as np
import pandas as pd

# n is number of observations
n = 5000

day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()

s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot()

略有修改的生成器功能：

def mover_df(df, col,col2, move_size=10):
    ref = None
    for i, row in df.iterrows():
        #added test condition for new col2 signal column
        if row[col2] == 4000:
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row
                ref = row.loc[col]

生成数据

df = s.to_frame()
df['B'] = range(0,len(df))

moves_df = pd.concat(mover_df(df, 'A','B', 3), axis=1).T

当前输出：

                                  A         B
2013-02-06 14:30:43.874386317   -50.136432  4000.0

期望的输出：

（第二行的cols A,B中的值将是代码生成的任何内容，我刚刚添加了随机值以显示我感兴趣的格式.Col C是交易ID和对于每两行，这将增加+1）

                                  A         B       C
2013-02-06 14:30:43.874386317   -50.136432  4000.0  1
2013-02-06 14:30:43.874386317   -47.136432  6000.0  1

我一直想把这个代码编写好几个小时（对于孩子们在学校假期里跑来跑去的孩子们没有帮助......）并且感谢任何帮助。从@piRSquared那里得到意见非常棒，但感谢人们忙碌。

Answer 1

我对发电机或熊猫没有太多经验，但这有用吗？由于随机种子，我的数据输出不同，所以我不确定。

我更改了生成器以包含给定的备用案例，即第一列row[col2] == 4000，因此调用生成器两次应该同时给出两个值：

def mover_df(df, col, col2, move_size=10, found=False):
    ref = None
    for i, row in df.iterrows():
        #added test condition for new col2 signal column
        if row[col2] == 4000:
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row
                found = True   # flag that we found the first row we want
                ref = row.loc[col]
        elif found:  # if we found the first row, find the second meeting the condition
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row

然后你可以像这样使用它：

data_generator = mover_df(df, 'A', 'B', 3)
moves_df = pd.concat([data.next(), data.next()], axis=1).T

Answer 2

我像这样编辑mover_df 的 注意：的
我将4000条件更改为% 1000 == 0以再提供一些样本

def mover_df(df, move_col, look_col, move_size=10): ref, seen = None, False for i, row in df.iterrows(): #added test condition for new col2 signal column look_cond = row[look_col] % 1000 == 0 if look_cond and not seen: yield row ref, seen = row.loc[move_col], True elif seen: move_cond = (abs(ref - row.loc[move_col]) >= move_size) if move_cond: yield row ref, seen = None, False df = s.to_frame() df['B'] = range(0,len(df)) moves_df = pd.concat(mover_df(df, 'A','B', 3), axis=1).T print(moves_df) A B 2013-02-06 08:00:03.264481639 0.554390 0.0 2013-02-06 08:04:26.609855185 -2.479520 35.0 2013-02-06 09:38:07.962175581 -15.042391 1000.0 2013-02-06 09:40:50.737806497 -18.385956 1026.0 2013-02-06 11:13:03.018013689 -29.074125 2000.0 2013-02-06 11:14:30.980633575 -32.221009 2019.0 2013-02-06 12:49:41.432845325 -35.048040 3000.0 2013-02-06 12:50:28.098114592 -38.881795 3012.0 2013-02-06 14:27:15.008225195 13.437165 4000.0 2013-02-06 14:27:32.790466500 9.513736 4003.0

<强> 警告
这将继续寻找退出，直到找到它为止，或者即使到达另一个可能的入口点，也会到达数据帧的末尾。意思是，在我的例子中，我查看每1000行并输入。然后我寻找移动大于10并退出的时间。如果在下一个1000行市场到来之前我没有发现大于10的移动，我将忽略该1000行标记并继续寻找退出。

理念是，如果我在交易中，我必须退出。在解决我仍在进行的交易之前，我不想进入另一笔交易。

路径依赖切片 - 功能代码修改

2 个答案: