似乎找不到正确的词来键入Stack Overflow,而且我在过去编写的代码中也找不到答案,所以我别无选择,只能再次询问。
我正在尝试在特定列的所有组合上扩展数据框:
注意:Pandas 23.4版
给出以下数据框
name num
A 1X,2Y,3Z
B 1X,2Y,3Z
C 9Z
我正在努力做到:
name num
A 1X,2Y
A 1X,3Z
A 2Y,3Z
B 1X,2Y
B 1X,3Z
B 2Y,3Z
C 9Z
我(想法)是正确的:
s = (pd.DataFrame(df.pop('num').values.tolist(), index = df.index)
.stack()
.reset_index(level=1, drop=True)
.rename('num')
.astype(str)
)
df = df.join(s)
答案 0 :(得分:3)
这是一种基于itertools
的方法:
from itertools import chain, combinations
# split the strings by "," and
# extract all length 2 combinations from the strings
l = df.num.str.split(',').apply(combinations, r=2).map(list)
# construct a dataframe from the result
out = pd.DataFrame({'name':df.name.repeat(l.str.len()),
'num':list(chain.from_iterable(l.values))})
# join the tuples containing each combination
out['num'] = out.num.str.join(', ')
name num
0 A 1X, 2Y
0 A 1X, 3Z
0 A 2Y, 3Z
1 B 1X, 2Y
1 B 1X, 3Z
1 B 2Y, 3Z
更新
如果条目中只有一项:
l = df.num.str.split(',').apply(lambda x: list(combinations(x, r=2)) or x)
nums = chain.from_iterable([map(', '.join, i) if len(i) > 1 else i for i in l])
out = pd.DataFrame({'name':df.name.repeat(l.str.len()),
'num':list(nums)})
name num
0 A 1X, 2Y
0 A 1X, 3Z
0 A 2Y, 3Z
1 B 1X, 2Y
1 B 1X, 3Z
1 B 2Y, 3Z
2 C 9Z