将一列添加到另一数据框时,panda数据框索引陷入混乱并重复行

时间:2018-08-13 22:33:26

标签: python-3.x pandas

我有两个df_ordersdf_sku_list数据帧,它们具有相同的行数。

我想向df_orders添加一个新列,该列与df_sku_list中的单个列完全一样。

在两个数据帧上都有重置索引。

>>> df_orders.reset_index(drop=True)
       OrderNo  PledgeID   ReferrerID FulfillmentStatus                FundingDate  PaymentMethod      ...       ShippingZip/PostalCode ShippingCountry skucount ArticleNo ArticleName  NumberOfItems
0     6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan      ...                     ="11201"   United States        5         0           0              0
1     6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan      ...                     ="11201"   United States        5         0           0              0
2     6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan      ...                     ="11201"   United States        5         0           0              0
3     6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan      ...                     ="11201"   United States        5         0           0              0
4     6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan      ...                     ="11201"   United States        5         0           0              0
5     6894.000  24040780 18497676.000            Placed  2018-08-05 22:28:46 -0700            nan      ...                     ="27517"   United States        3         0           0              0
6     6894.000  24040780 18497676.000            Placed  2018-08-05 22:28:46 -0700            nan      ...                     ="27517"   United States        3         0           0              0
7     6894.000  24040780 18497676.000            Placed  2018-08-05 22:28:46 -0700            nan      ...                     ="27517"   United States        3         0           0              0
8     6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan      ...                    ="880000"         Vietnam        4         0           0              0
9     6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan      ...                    ="880000"         Vietnam        4         0           0              0
10    6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan      ...                    ="880000"         Vietnam        4         0           0              0
11    6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan      ...                    ="880000"         Vietnam        4         0           0              0
12    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan      ...                    ="388598"       Singapore        5         0           0              0
13    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan      ...                    ="388598"       Singapore        5         0           0              0
14    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan      ...                    ="388598"       Singapore        5         0           0              0
15    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan      ...                    ="388598"       Singapore        5         0           0              0
16    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan      ...                    ="388598"       Singapore        5         0           0              0
17    6891.000  24040072          nan            Placed  2018-08-05 19:26:49 -0700            nan      ...                     ="94107"   United States        3         0           0              0

这是重置sku_list上的索引的结果

>>> df_sku_list.reset_index(drop=True)
       SKU_list
0      SKU00066
1      SKU00067
2      SKU00066
3      SKU00067
4      SKU00078
5      SKU00066
6      SKU00074
7      SKU00074
8      SKU00066
9      SKU00066
10     SKU00074
11     SKU00074
12     SKU00067
13     SKU00066
14     SKU00067
15     SKU00066
16     SKU00078
17     SKU00067
18     SKU00074
19     SKU00074
20     SKU00067
21     SKU00074
22     SKU00074
23     SKU00066
24     SKU00074
25     SKU00074
26     SKU00067
27     SKU00066
28     SKU00074
29     SKU00074
...         ...

我尝试添加df_sku_list的唯一列

>>> df_orders['SKU_list'] = df_sku_list['SKU_list']
>>> df_orders
      OrderNo  PledgeID   ReferrerID FulfillmentStatus                FundingDate  PaymentMethod    ...    ShippingCountry skucount ArticleNo ArticleName NumberOfItems  SKU_list
0    6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan    ...      United States        5         0           0             0  SKU00066
0    6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan    ...      United States        5         0           0             0  SKU00066
0    6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan    ...      United States        5         0           0             0  SKU00066
0    6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan    ...      United States        5         0           0             0  SKU00066
0    6895.000  24042541          nan            Placed  2018-08-06 06:23:55 -0700            nan    ...      United States        5         0           0             0  SKU00066
1    6894.000  24040780 18497676.000            Placed  2018-08-05 22:28:46 -0700            nan    ...      United States        3         0           0             0  SKU00067
1    6894.000  24040780 18497676.000            Placed  2018-08-05 22:28:46 -0700            nan    ...      United States        3         0           0             0  SKU00067
1    6894.000  24040780 18497676.000            Placed  2018-08-05 22:28:46 -0700            nan    ...      United States        3         0           0             0  SKU00067
2    6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan    ...            Vietnam        4         0           0             0  SKU00066
2    6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan    ...            Vietnam        4         0           0             0  SKU00066
2    6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan    ...            Vietnam        4         0           0             0  SKU00066
2    6893.000  24040663          nan            Placed  2018-08-05 21:59:40 -0700            nan    ...            Vietnam        4         0           0             0  SKU00066
3    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan    ...          Singapore        5         0           0             0  SKU00067
3    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan    ...          Singapore        5         0           0             0  SKU00067
3    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan    ...          Singapore        5         0           0             0  SKU00067
3    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan    ...          Singapore        5         0           0             0  SKU00067
3    6892.000  24040660          nan            Placed  2018-08-05 21:58:46 -0700            nan    ...          Singapore        5         0           0             0  SKU00067
4    6891.000  24040072          nan            Placed  2018-08-05 19:26:49 -0700            nan    ...      United States        3         0           0             0  SKU00078
4    6891.000  24040072          nan            Placed  2018-08-05 19:26:49 -0700            nan    ...      United States        3         0           0             0  SKU00078
4    6891.000  24040072          nan            Placed  2018-08-05 19:26:49 -0700            nan    ...      United States        3         0           0             0  SKU00078
5    6890.000  24039921 18497676.000            Placed  2018-08-05 18:56:15 -0700            nan    ...      United States        3         0           0             0  SKU00066
5    6890.000  24039921 18497676.000            Placed  2018-08-05 18:56:15 -0700            nan    ...      United States        3         0           0             0  SKU00066
5    6890.000  24039921 18497676.000            Placed  2018-08-05 18:56:15 -0700            nan    ...      United States        3         0           0             0  SKU00066
6    6888.000  24039345 18497676.000            Placed  2018-08-05 17:07:14 -0700            nan    ...        Switzerland        3         0           0             0  SKU00074

如您所见,df_orders的前五行表示相同的订单号。并非总是这样。我根据需要使用了重复功能来复制行。由于某些原因,当我尝试添加新列时,似乎会将df_orders的索引重置为重置之前的样子。

要复制行,我在脚本中使用了此代码。 skucount是包含整数的列。下面的命令使重复行数达到了该数量。不确定这些信息是否有帮助,但我想添加一下,因为这可能是问题所在:

df_orders = df_orders.loc[np.repeat(df_orders.index.values,df_orders['skucount'])]

1 个答案:

答案 0 :(得分:1)

如果我理解正确,请使用df_sku_list['SKU_list']中的值数组:

df_orders['SKU_list'] = df_sku_list['SKU_list'].values

它将对索引视而不见,并在df_orders中添加一列,就像在df_sku_list中的单列一样。