Question

这是我的df的示例：

pd.DataFrame([["1", "2"], ["1", "2"], ["3", "other_value"]],
                     columns=["a", "b"])
    a   b
0   1   2
1   1   2
2   3   other_value

我想到达这里：

pd.DataFrame([["1", "2"], ["1", "2"], ["3", "other_value"], ["3", "row_duplicated_with_edits_in_this_column"]],
                     columns=["a", "b"])
    a   b
0   1   2
1   1   2
2   3   other_value
3   3   row_duplicated_with_edits_in_this_column

规则是使用apply方法，进行一些检查（为简单起见，我不包括这些检查），但是在某些条件下，对于apply函数中的某些行，请复制该行，然后进行编辑到该行，然后在df中插入两行。

类似这样：

def f(row):
   if condition:
      row["a"] = 3
   elif condition:
      row["a"] = 4
   elif condition:
      row_duplicated = row.copy()
      row_duplicated["a"] = 5 # I need also this row to be included in the df

   return row
df.apply(f, axis=1)

我不想将重复的行存储在班级中的某个位置，并在末尾添加它们。我想即时进行。

我已经看到了这个pandas: apply function to DataFrame that can return multiple rows，但是我不确定groupby是否可以在这里为我提供帮助。

谢谢

Answer 1

这是在列表理解中使用df.iterrows的一种方法。您需要将行附加到循环中，然后进行合并。

def func(row):
   if row['a'] == "3":
        row2 = row.copy()
        # make edits to row2
        return pd.concat([row, row2], axis=1)
   return row

pd.concat([func(row) for _, row in df.iterrows()], ignore_index=True, axis=1).T

   a            b
0  1            2
1  1            2
2  3  other_value
3  3  other_value

我发现在我的情况下，最好不要使用ignore_index=True，因为我后来合并了2个dfs。

Answer 2

您的逻辑似乎大部分都是矢量化的。由于输出中的行顺序似乎很重要，因此可以将默认的RangeIndex加0.5，然后使用sort_index。

def row_appends(x):
    newrows = x.loc[x['a'].isin(['3', '4', '5'])].copy()
    newrows.loc[x['a'] == '3', 'b'] = 10  # make conditional edit
    newrows.loc[x['a'] == '4', 'b'] = 20  # make conditional edit
    newrows.index = newrows.index + 0.5
    return newrows

res = pd.concat([df, df.pipe(row_appends)])\
        .sort_index().reset_index(drop=True)

print(res)

   a            b
0  1            2
1  1            2
2  3  other_value
3  3           10

Answer 3

我将对其进行矢量化处理，按类别对其进行分类：

df[df_condition_1]["a"] = 3
df[df_condition_2]["a"] = 4

duplicates = df[df_condition_3] # somehow we store it ?     
duplicates["a"] = 5 

#then 
df.join(duplicates, how='outer')

此解决方案是否适合您的需求？

在pandas apply方法中，根据条件复制行

3 个答案: