我有两个df_orders
和df_sku_list
数据帧,它们具有相同的行数。
我想向df_orders添加一个新列,该列与df_sku_list中的单个列完全一样。
在两个数据帧上都有重置索引。
>>> df_orders.reset_index(drop=True)
OrderNo PledgeID ReferrerID FulfillmentStatus FundingDate PaymentMethod ... ShippingZip/PostalCode ShippingCountry skucount ArticleNo ArticleName NumberOfItems
0 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... ="11201" United States 5 0 0 0
1 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... ="11201" United States 5 0 0 0
2 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... ="11201" United States 5 0 0 0
3 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... ="11201" United States 5 0 0 0
4 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... ="11201" United States 5 0 0 0
5 6894.000 24040780 18497676.000 Placed 2018-08-05 22:28:46 -0700 nan ... ="27517" United States 3 0 0 0
6 6894.000 24040780 18497676.000 Placed 2018-08-05 22:28:46 -0700 nan ... ="27517" United States 3 0 0 0
7 6894.000 24040780 18497676.000 Placed 2018-08-05 22:28:46 -0700 nan ... ="27517" United States 3 0 0 0
8 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... ="880000" Vietnam 4 0 0 0
9 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... ="880000" Vietnam 4 0 0 0
10 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... ="880000" Vietnam 4 0 0 0
11 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... ="880000" Vietnam 4 0 0 0
12 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... ="388598" Singapore 5 0 0 0
13 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... ="388598" Singapore 5 0 0 0
14 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... ="388598" Singapore 5 0 0 0
15 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... ="388598" Singapore 5 0 0 0
16 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... ="388598" Singapore 5 0 0 0
17 6891.000 24040072 nan Placed 2018-08-05 19:26:49 -0700 nan ... ="94107" United States 3 0 0 0
这是重置sku_list上的索引的结果
>>> df_sku_list.reset_index(drop=True)
SKU_list
0 SKU00066
1 SKU00067
2 SKU00066
3 SKU00067
4 SKU00078
5 SKU00066
6 SKU00074
7 SKU00074
8 SKU00066
9 SKU00066
10 SKU00074
11 SKU00074
12 SKU00067
13 SKU00066
14 SKU00067
15 SKU00066
16 SKU00078
17 SKU00067
18 SKU00074
19 SKU00074
20 SKU00067
21 SKU00074
22 SKU00074
23 SKU00066
24 SKU00074
25 SKU00074
26 SKU00067
27 SKU00066
28 SKU00074
29 SKU00074
... ...
我尝试添加df_sku_list的唯一列
>>> df_orders['SKU_list'] = df_sku_list['SKU_list']
>>> df_orders
OrderNo PledgeID ReferrerID FulfillmentStatus FundingDate PaymentMethod ... ShippingCountry skucount ArticleNo ArticleName NumberOfItems SKU_list
0 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... United States 5 0 0 0 SKU00066
0 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... United States 5 0 0 0 SKU00066
0 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... United States 5 0 0 0 SKU00066
0 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... United States 5 0 0 0 SKU00066
0 6895.000 24042541 nan Placed 2018-08-06 06:23:55 -0700 nan ... United States 5 0 0 0 SKU00066
1 6894.000 24040780 18497676.000 Placed 2018-08-05 22:28:46 -0700 nan ... United States 3 0 0 0 SKU00067
1 6894.000 24040780 18497676.000 Placed 2018-08-05 22:28:46 -0700 nan ... United States 3 0 0 0 SKU00067
1 6894.000 24040780 18497676.000 Placed 2018-08-05 22:28:46 -0700 nan ... United States 3 0 0 0 SKU00067
2 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... Vietnam 4 0 0 0 SKU00066
2 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... Vietnam 4 0 0 0 SKU00066
2 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... Vietnam 4 0 0 0 SKU00066
2 6893.000 24040663 nan Placed 2018-08-05 21:59:40 -0700 nan ... Vietnam 4 0 0 0 SKU00066
3 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... Singapore 5 0 0 0 SKU00067
3 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... Singapore 5 0 0 0 SKU00067
3 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... Singapore 5 0 0 0 SKU00067
3 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... Singapore 5 0 0 0 SKU00067
3 6892.000 24040660 nan Placed 2018-08-05 21:58:46 -0700 nan ... Singapore 5 0 0 0 SKU00067
4 6891.000 24040072 nan Placed 2018-08-05 19:26:49 -0700 nan ... United States 3 0 0 0 SKU00078
4 6891.000 24040072 nan Placed 2018-08-05 19:26:49 -0700 nan ... United States 3 0 0 0 SKU00078
4 6891.000 24040072 nan Placed 2018-08-05 19:26:49 -0700 nan ... United States 3 0 0 0 SKU00078
5 6890.000 24039921 18497676.000 Placed 2018-08-05 18:56:15 -0700 nan ... United States 3 0 0 0 SKU00066
5 6890.000 24039921 18497676.000 Placed 2018-08-05 18:56:15 -0700 nan ... United States 3 0 0 0 SKU00066
5 6890.000 24039921 18497676.000 Placed 2018-08-05 18:56:15 -0700 nan ... United States 3 0 0 0 SKU00066
6 6888.000 24039345 18497676.000 Placed 2018-08-05 17:07:14 -0700 nan ... Switzerland 3 0 0 0 SKU00074
如您所见,df_orders的前五行表示相同的订单号。并非总是这样。我根据需要使用了重复功能来复制行。由于某些原因,当我尝试添加新列时,似乎会将df_orders的索引重置为重置之前的样子。
要复制行,我在脚本中使用了此代码。 skucount是包含整数的列。下面的命令使重复行数达到了该数量。不确定这些信息是否有帮助,但我想添加一下,因为这可能是问题所在:
df_orders = df_orders.loc[np.repeat(df_orders.index.values,df_orders['skucount'])]
答案 0 :(得分:1)
如果我理解正确,请使用df_sku_list['SKU_list']
中的值数组:
df_orders['SKU_list'] = df_sku_list['SKU_list'].values
它将对索引视而不见,并在df_orders
中添加一列,就像在df_sku_list
中的单列一样。