我有两个具有相同结构的数据帧列表,并且如果任何list_b数据帧的col_a中至少存在来自list_a [df_a] [col_a]的一个值,我试图省略list_a中的每个数据帧。我已经通过了几次,但是还没有找到真正可以完成的事情。我的方法可能是错误的,朝着正确的方向表示赞赏!
方法:
for df_a in list_a:
for df_b in list_b:
temp = df_a[~df_a['col_a'].isin([df_b['col_a']])] # error 'list indices must be integers or slices, not
if len(temp.index) > 0:
list_a.remove(df_a)
list_a [0]
col_a temp
877 12/17/2019 0.300807486
886 12/31/2019 0.143508662
list_a [1]
col_a temp
651 7/27/2019 0.435680418
660 8/10/2019 0.229333215
list_b [0]
col_a temp
1 12/31/2019 0.843356517
10 1/14/2020 0.846720719
list_omit [0]
col_a temp
1 12/17/2019 0.600807486
2 12/31/2019 0.143508662
结果: 由于list_a [0]和list_b [0]的重叠日期为12/31/2019,因此应从list_a删除list_a [0]并将其添加到dfs的“省略”列表中
复制:
import numpy as np
import pandas as pd
temp = list(range(0, 2))
list_a = []
list_b = []
for l in temp:
df = pd.DataFrame(np.random.randint(0,100,size=(2, 2)), columns=list(['col_a','temp']))
list_a.append(df)
for l in temp:
df = pd.DataFrame(np.random.randint(0,100,size=(2, 2)), columns=list(['col_a','temp']))
list_b.append(df)
print(list_a)
print(list_b)
谢谢您的帮助。
答案 0 :(得分:1)
您可以使用修改后的this solution:
np.random.seed(2020)
temp = list(range(0, 2))
list_a = []
list_b = []
for l in temp:
df = pd.DataFrame(np.random.randint(0,20,size=(3, 2)), columns=list(['col_a','temp']))
list_a.append(df)
for l in temp:
df = pd.DataFrame(np.random.randint(0,30,size=(2, 2)), columns=list(['col_a','temp']))
list_b.append(df)
print(list_a)
print(list_b)
创建排除所有可能值的集合:
b = set([y for x in list_b for y in x['col_a']])
print (b)
{3, 28, 5, 23}
然后在循环中添加以从DataFrame
中排除列表以及list_a
的值的新列表:
exclude = []
a = []
for df_a in list_a:
if df_a['col_a'].isin(b).any():
exclude.append(df_a)
else:
a.append(df_a)
print (exclude)
[ col_a temp
0 0 8
1 3 3
2 3 7]
print (a)
[ col_a temp
0 16 0
1 10 9
2 19 11]
另一个具有列表理解能力的想法:
exclude = [df_a for df_a in list_a if df_a['col_a'].isin(b).any()]
print (exclude)
[ col_a temp
0 0 8
1 3 3
2 3 7]
new_a = [df_a for df_a in list_a if not df_a['col_a'].isin(b).any()]
print (new_a)
[ col_a temp
0 16 0
1 10 9
2 19 11]