Question

我有以下数据框：

      id  sub_id  timestamp            dist     time_dif     speed     status
   1   1   1      2016-07-01 00:01:00  20       00:01:00     0.0075    True
   2   1   1      2016-07-01 00:01:59  29       00:00:59     0.3450    True
   3   1   1      2016-07-01 00:03:00  30       00:01:00     0.0987    True
   4   1   2      2016-07-01 00:03:59  21       00:59:00     0.5319    True
   5   1   2      2016-07-01 00:05:00  40       00:01:00     0.0076    False

在上述数据框中，只要距离> 30，状态= False。

我想建议创建一个函数或方法，这样每当状态为“false”时，这意味着距离＆gt; 30，（在上面的数据框中，第5行）我可以执行以下操作：

处理status = False（ROW 5）

的行

第5行中“dist”下的值（其中status = False，dist = 40）变为30，因为30是阈值距离且不能超过30。所以，40 - 30 = 10，这个额外的10应该转移到下一行。

“status”变为“True”（as dist = 30）

“速度”保持不变，

“id”，“sub_id”保持不变

“time_diff”有一个新值，因为我们在第5行有速度和距离，可以计算时间

“timestamp”也应该改变，如果我们计算time_diff，我们可以将time_diff添加到第4行的“time”值，并获得第5行的新时间戳。

处理以下行（第6行）

现在，当dist> 30 / status = False时，应在数据框中插入第6行或后续行，以便前一行中的任何额外距离都会进入此新行。

在上面的例子中，第6行下的“dist”的值为（40-30），即10.

“id”保持不变，

“sub_id”变为3（增加1），

由于10现在小于30，“状态”应为真。

“速度”保持不变。

“time_diff”将再次使用第6行中“dist”和“speed”的值进行计算。

“timestamp”也将通过将“time_diff”添加到上一行“timestamp”的值来计算

虽然数据框中的其余行照常，但会遇到状态为False的另一行。

此外，可能存在“dist”= 70的情况，因此在这种情况下，dist = 70的行应该具有dist = 30，那么下一行应该具有dist = 40，其仍然大于30，所以它应该只保留30，并将剩余的10插入下一行。

如果有任何不清楚的地方，请告诉我。提前谢谢。

Answer 1

我还没有包括距离，时间和速度字段的变化，但这个想法应该是相似的。如果这有效，请告诉我，我会尝试从那里添加编辑。由于对您迭代的对象进行更改通常很糟糕，因此我创建了一个新的DataFrame来存储更改。

df2 = pd.DataFrame(columns = df.columns)
limit = 30
Index = 0
for row in df.itertuples():
    if row[7] == False: # 7 is the index of the status column
        tempRow = list(row[:])
        tempRow[4]=limit # 4 is the index of the dist column
        tempRow[7] = True
        df2.loc[Index] = tempRow
        Index +=1
        tempRow[4] = row[4]-limit
        tempRow[7] = tempRow[7] < limit
        tempRow [2]= row[2]+1 # 2 is the index of the sub_id column
        df2.loc[Index] = tempRow
    else:
        df2.loc[Index] = row
    Index += 1
df2

如何拆分值并在pandas数据框中插入新行？

1 个答案: