Question

我正在尝试根据其中一列中的int值来复制pandas DataFrame（v.0.23.4，python v.3.7.1）的行。我正在应用this question中的代码来执行此操作，但是遇到以下数据类型转换错误：TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'。基本上，我不理解为什么这段代码试图转换为int32。

从此开始

dummy_dict = {'c1': ['a','b','c'],
              'c2': [0,1,2]}
dummy_df = pd.DataFrame(dummy_dict)

    c1  c2  c3
0   a   0   textA
1   b   1   textB
2   c   2   textC

我正在这样做

dummy_df_test = dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2']))

我最后要这个。但是，我收到了以上错误。

    c1  c2  c3
0   a   0   textA
1   b   1   textB
2   c   2   textC
3   c   2   textC

Answer 1

只是一种解决方法：

pd.concat([dummy_df[dummy_df.c2.eq(0)],dummy_df.loc[dummy_df.index.repeat(dummy_df.c2)]])

由@Wen提供的另一个很棒的建议

dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2'].clip(lower=1)))

Answer 2

我相信可以在这里找到有关其发生原因的答案： https://github.com/numpy/numpy/issues/4384

将dtype指定为int32应该可以解决原始注释中突出显示的问题。

Answer 3

在第一次尝试中，所有行都是重复的，而在第二次尝试中，只是索引为2的行。感谢concat函数。

        A       B      C
0  'name1'   'foo'  'bar'
1  'name2'  'foo''  'bar'
2  'name3'   'foo'  'bar'
3  'name4'   'foo'  'bar'

df2 = pd.concat([df]*2, ignore_index=True)
print(df2)

df3= pd.concat([df, df.iloc[[2]]])
print(df3)

如果您打算在最后重置索引

  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
3  a   0  textA
4  b   1  textB
5  c   2  textC
  c1  c2     c3
0  a   0  textA
1  b   1  textB
2  c   2  textC
2  c   2  textC

如何根据列值重复熊猫数据框记录

3 个答案: