我有以下数据框。
'a1' 'f1' 'a0'
0 [5261, 5247, 5246] [526, 557, 5246] [1, 32, 5261]
1 [521, 5547, 5246] 'NaN' [61, 5247, 246]
[5261, 5247, 5246] join with [526, 557, 5246] and the resultant array does
not have duplicates.
required ans:[5261,5247,5246,526,557].
Same with the rest below.
[5261, 5247, 5246] with 'NaN'
[521, 5547, 5246] with [526, 557, 5246]
[521, 5547, 5246] with 'NaN'
这些结果需要存储在某个地方,并且结果结果(计数为4个nos)也要用'a0'重复。
我尝试了很多方法,但没有解决。 任何帮助表示赞赏。
谢谢, 索尼娅
答案 0 :(得分:1)
我会尝试使它成为 tidy 格式(某些术语,查起来,我认为R人发明了该术语)。
In [58]: s = pd.Series({'a1': [['5261', '5247', '5246'], ['521', '5547', '5246']], 'f1': [['526', '557', '5246']], 'a0': [['1', '32', '26'], ['61', '47', '246']]})
In [59]: s
Out[59]:
a1 [[5261, 5247, 5246], [521, 5547, 5246]]
f1 [[526, 557, 5246]]
a0 [[1, 32, 26], [61, 47, 246]]
dtype: object
In [60]: s.exp
s.expanding s.explode
In [60]: s.explode()
Out[60]:
a1 [5261, 5247, 5246]
a1 [521, 5547, 5246]
f1 [526, 557, 5246]
a0 [1, 32, 26]
a0 [61, 47, 246]
dtype: object
In [61]: s.explode().explode()
Out[61]:
a1 5261
a1 5247
a1 5246
a1 521
a1 5547
a1 5246
f1 526
f1 557
f1 5246
a0 1
a0 32
a0 26
a0 61
a0 47
a0 246
dtype: object
In [62]: s.index
Out[62]: Index(['a1', 'f1', 'a0'], dtype='object')
In [63]: s.values
Out[63]: array([list([['5261', '5247', '5246'], ['521', '5547', '5246']]), list([['526', '557', '5246']]), list([['1', '32', '26'], ['61', '47', '246']])], dtype=object)
In [68]: d = s.explode().explode()
In [69]: d = d.reset_index()
In [70]: d
Out[70]:
index 0
0 a1 5261
1 a1 5247
2 a1 5246
3 a1 521
4 a1 5547
5 a1 5246
6 f1 526
7 f1 557
8 f1 5246
9 a0 1
10 a0 32
11 a0 26
12 a0 61
13 a0 47
14 a0 246
In [71]: d.columns = ['A', 'B'] # whatever
In [72]: d.to_parquet('here.parquet')