我有一个以下的熊猫系列。
new_orders_list
Out[853]:
Cluster 1 [525, 526, 533]
Cluster 2 [527, 528, 532]
Cluster 3 [519, 534, 535]
Cluster 4 [530]
Cluster 5 [529, 531]
Cluster 6 [520, 521, 524]
而且,在对数据帧进行一些切片后,我还有两个系列。
condition
Out[854]:
5 525
Name: order_id, dtype: object
condition2
Out[855]:
Clusters
Cluster 6 1
Name: quant_bought, dtype: int64
现在,我想在condition
new_orders_list
位置将Cluster 6
系列525的值添加到(index from condition2 series)
。并从525
位置删除Cluster 1
。所以,它看起来应该是这样的
Cluster 1 [526, 533]
Cluster 2 [527, 528, 532]
Cluster 3 [519, 534, 535]
Cluster 4 [530]
Cluster 5 [529, 531]
Cluster 6 [520, 521, 524, 525]
我正在使用Python进行跟踪。但它附加到先前存储的值。
new_orders_list.append(pd.Series(condition.values ,index =
condition2.index))
Cluster 1 [525, 526, 533]
Cluster 2 [527, 528, 532]
Cluster 3 [519, 534, 535]
Cluster 4 [530]
Cluster 5 [529, 531]
Cluster 6 [520, 521, 524]
Cluster 6 525
答案 0 :(得分:1)
您可以尝试此解决方案。
创建了新系列的删除数据,称为remseries
。
lists
Series
中的new_orders_list
中的值类型是整数,其他Series
的类型是strings
,因此所有值都会转换为字符串。< / p>
然后按isin
按子集选择行,并添加和删除值。
print new_orders_list
Clusters
Cluster 1 [525, 526, 533]
Cluster 2 [527, 528, 532]
Cluster 3 [519, 534, 535]
Cluster 4 [530]
Cluster 5 [529, 531]
Cluster 6 [520, 521, 524]
Name: no, dtype: object
print condition
5 525
Name: order_id, dtype: object
print condition2
Clusters
Cluster 6 1
Name: quant_bought, dtype: int64
#create new Series for remove
remseries = pd.Series(condition.values, index = ['Cluster 1'], name='rem')
print remseries
Cluster 1 525
Name: rem, dtype: object
#create dataframe from series
df = new_orders_list.reset_index()
print df
Clusters no
0 Cluster 1 [525, 526, 533]
1 Cluster 2 [527, 528, 532]
2 Cluster 3 [519, 534, 535]
3 Cluster 4 [530]
4 Cluster 5 [529, 531]
5 Cluster 6 [520, 521, 524]
#convert values in list from int to string
df['no'] = df['no'].apply(lambda x: [str(i) for i in x])
#add and remove items
df.loc[df['Clusters'].isin(condition2.index.tolist()), 'no'] =
df['no'].apply(lambda x: x + condition.values.tolist())
df.loc[df['Clusters'].isin(remseries.index.tolist()), 'no'] =
df['no'].apply(lambda x: [k for k in x if k != ''.join(remseries.values)])
#check types of values in list
print [ type(x) for x in df['no'][0]]
[<type 'str'>, <type 'str'>]
#convert values in list from string to int
df['no'] = df['no'].apply(lambda x: [int(i) for i in x])
print df
Clusters no
0 Cluster 1 [526, 533]
1 Cluster 2 [527, 528, 532]
2 Cluster 3 [519, 534, 535]
3 Cluster 4 [530]
4 Cluster 5 [529, 531]
5 Cluster 6 [520, 521, 524, 525]
#check types of values in list
print [ type(x) for x in df['no'][0]]
[<type 'int'>, <type 'int'>]