我的数据集如下
Col1 Col2 Col3
A 10 x1
B 100 x2
C 1000 x3
这就是我得到的输出的样子,
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A 10 x1 Empty Empty Empty Empty Empty Empty
B 100 x2 Empty Empty Empty Empty Empty Empty
C 1000 x3 Empty Empty Empty Empty Empty Empty
A 10 x1 B 100 x2 Empty Empty Empty
B 100 x2 C 1000 x3 Empty Empty Empty
A 10 x1 B 100 x2 C 1000 x3
感谢本网站的帮助,可以使用-
arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)
pd.DataFrame(arr)
但是如果数据集如下,
Col1 Col2 Col3 Structure
A 10 x1 1
B 100 x2 1
C 1000 x3 2
输出必须是这个-
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Answer
A 10 x1 Empty Empty Empty Empty Empty Empty No
B 100 x2 Empty Empty Empty Empty Empty Empty No
C 1000 x3 Empty Empty Empty Empty Empty Empty Yes
A 10 x1 B 100 x2 Empty Empty Empty Yes
B 100 x2 C 1000 x3 Empty Empty Empty No
A 10 x1 B 100 x2 C 1000 x3 No
A和B基本上是说“是”,因为它们在同一结构中,而C本身是“是”,因为它本身在该结构中。 其他所有行(例如A,B,ABC)均为“否”,因为它们的结构不同。如何获得上面想要的表?
代码
arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)
pd.DataFrame(arr)
将此输出提供给我
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A 10 x1 Empty Empty Empty Empty Empty Empty
B 100 x2 Empty Empty Empty Empty Empty Empty
C 1000 x3 Empty Empty Empty Empty Empty Empty
A 10 x1 B 100 x2 Empty Empty Empty
B 100 x2 C 1000 x3 Empty Empty Empty
A 10 x1 B 100 x2 C 1000 x3
如何将“答案”列添加到此输出以获得最终表?
答案 0 :(得分:1)
由于DataFrame的结构,我们知道,当我们应用itertools.combinations
时,Structure
列将首先显示在第三列中,然后每隔四列显示一次:
0 1 2 3 4 5 6 7 8 9 10 11
0 A 10 x1 1 None NaN None NaN None NaN None NaN
1 B 100 x2 1 None NaN None NaN None NaN None NaN
2 C 1000 x3 2 None NaN None NaN None NaN None NaN
3 A 10 x1 1 B 100.0 x2 1.0 None NaN None NaN
4 A 10 x1 1 C 1000.0 x3 2.0 None NaN None NaN
5 B 100 x2 1 C 1000.0 x3 2.0 None NaN None NaN
6 A 10 x1 1 B 100.0 x2 1.0 C 1000.0 x3 2.0
我们可以使用它来仅索引Structure
列,检查它们是否包含组中的所有成员,然后删除它们:
checker = df.groupby('Structure').size().to_dict()
def helper(row):
u = row[~row.isnull()].values
return (len(np.unique(u)) == 1) & (checker[u[0]] == len(u))
s = out[out.columns[3::4]].apply(helper, 1).replace({False: 'No', True: 'Yes'})
0 No
1 No
2 Yes
3 Yes
4 No
5 No
6 No
dtype: object
要删除其他列并分配给DataFrame:
out.drop(out.columns[3::4], 1).assign(final=s)
0 1 2 4 5 6 8 9 10 final
0 A 10 x1 None NaN None None NaN None No
1 B 100 x2 None NaN None None NaN None No
2 C 1000 x3 None NaN None None NaN None Yes
3 A 10 x1 B 100.0 x2 None NaN None Yes
4 A 10 x1 C 1000.0 x3 None NaN None No
5 B 100 x2 C 1000.0 x3 None NaN None No
6 A 10 x1 B 100.0 x2 C 1000.0 x3 No