有条件地在一个df中过滤特定列的行,这些列对于另一个df的子集是通用的

时间:2018-05-04 14:24:57

标签: python python-3.x pandas dataframe

让我们假设一个df1

 df1 = pd.DataFrame(
{'col1': {0: 500.0, 1: 500.0, 2: 833.3, 3: 500.0, 4: 833.3, 5: 500.0, 6: 833.3},
'col2': {0: 1833.3, 1: 1000.0, 2: 1833.3, 3: 2666.7, 4: 1833.3, 5: 3500.0, 6: 1000.0},
'col3': {0: 250.0, 1: 250.0, 2: 30.0, 3: 30.0, 4: 30.0, 5: 103.3, 6: 176.7},
'col4': {0: 3.4, 1: 4.0, 2: 2.2, 3: 3.4, 4: 2.2, 5: 4.0, 6: 3.4},
'col5': {0: 0.25, 1: 0.15, 2: 0.1, 3: 0.25, 4: 0.25, 5: 0.1, 6: 0.1},
'col6': {0: 364, 1: 937, 2: 579, 3: 313, 4: 600, 5: 49, 6: 13}})

和df2

 df2 = pd.DataFrame(
{'col1': {0: 833.3, 1: 500.0, 2: 500.0, 3: 500.0, 4: 500.0, 5: 500.0, 6: 500.0, 7: 833.3, 8: 500.0, 9: 833.3, 10: 500.0, 11: 500.0, 12: 833.3, 13: 833.3, 14: 833.3},
'col2': {0: 1833.3, 1: 1000.0, 2: 1833.3, 3: 3500.0, 4: 3500.0, 5: 1000.0, 6: 2666.7, 7: 1833.3, 8: 2666.7, 9: 1000.0, 10: 2666.7, 11: 2666.7, 12: 1000.0, 13: 1833.3, 14: 1833.3},
'col3': {0: 30.0, 1: 250.0, 2: 250.0, 3: 103.3, 4: 176.7, 5: 103.3, 6: 30.0, 7: 103.3, 8: 30.0, 9: 176.7, 10: 250.0, 11: 103.3, 12: 30.0, 13: 30.0, 14: 250.0},
'col4': {0: 2.2, 1: 4.0, 2: 3.4, 3: 4.0, 4: 2.2, 5: 2.8, 6: 2.8, 7: 2.8, 8: 3.4, 9: 3.4, 10: 2.8, 11: 2.8, 12: 3.4, 13: 2.2, 14: 2.8}, 
'col5': {0: 0.25, 1: 0.15, 2: 0.25, 3: 0.1, 4: 0.2, 5: 0.15, 6: 0.15, 7: 0.25, 8: 0.25, 9: 0.1, 10: 0.15, 11: 0.1, 12: 0.15, 13: 0.1, 14: 0.2}})

在df2中删除行的最pythonic方法是什么,其中col1和col2以及col3和col4(以及coln)与df1的各列具有相同的值? 我不想合并数据帧,只删除df2中的任何行(可能是多行),其中感兴趣列上的行元组在两个dfs中都是相同的。

我只想了解如何使用:

new_df = df2.loc[df2[col1].isin(df1[col1]) &
              df2[col2].isin(df1[col2]) &
              df2[col3].isin(df1[col3]) &
              df2[col4].isin(df1[col4]) &
              df2[col5].isin(df1[col5]) ]

对于较大的数据集和更多列而言有点麻烦。

有什么好主意的想法吗?

2 个答案:

答案 0 :(得分:2)

您可以使用client = Client.objects.bulk_create([Client(name='WaltDisnep', created_at=timezone.now(), updated_at=timezone.now()), Client(name='Google', created_at=timezone.now(), updated_at=timezone.now()), Client(name='JetAirways', created_at=timezone.now(), updated_at=timezone.now())]) building = Building.objects.create(description='TestBuilding', is_active=1, client_id=client.id, country_code='NL') ,在此之前我们需要使用col1~coln中的所有值创建一个键(转换为str并粘贴在一起)

isin

答案 1 :(得分:2)

您可以使用eulerOrder结合pd.Index.difference来提取结果:

scala> val array = Array(1,2)
array: Array[Int] = Array(1, 2)

scala> val array2 = array ++ Array(3)
array2: Array[Int] = Array(1, 2, 3)

scala> array.update(0, 10)
array.update(0, 10)

scala> array
array
res2: Array[Int] = Array(10, 2)

scala> array2
array2
res3: Array[Int] = Array(1, 2, 3)

此方法的好处是它不需要数字到字符串转换。