`numpy`

Question

我有一个如下所示的数据框：

from      to         cc      extra columns
-------------------------------------------
1          2          3           sth
1|1        4                      sth
3          1|2       4|5          sth

我想要一个新的数据框，为每个管道分隔值创建一个新行，如下所示：

from       to        cc       extra columns
--------------------------------------------
1           2         3          sth
1           4                    sth
1           4                    sth
3           1         4          sth
3           2         4          sth
3           1         5          sth
3           2         5          sth

有人可以帮帮我吗？

谢谢！

Answer 1

一个不优雅但有效的解决方案：

.data
enter: .asciiz "Please enter your integer:\n"
binaryI: .asciiz "\nHere is the input in binary: "
hexI: .asciiz "\n\nHere is the input in hexadecimal: "
binaryO: .asciiz "\n\nHere is the output in binary: "
hexO: .asciiz "\n\nHere is the output in hexadecimal: "

.text 

prompt: 
li $v0, 4   
la $a0, enter
syscall

li $v0, 5   
syscall
add $s2, $0, $v0

li $v0, 4
la $a0, binaryI
syscall

li $v0, 35  
move $a0, $s2
syscall

li $v0, 4
la $a0, hexI
syscall

li $v0, 34  
move $a0, $s2
syscall

addi $t0, $0, 7
srl $s0, $s2, 12
and $s0, $s0, $t0

li $v0, 4
la $a0, hexO
syscall

li $v0, 35
move $a0, $s0
syscall

li $v0, 4
la $a0, binaryO
syscall

li $v0, 34
move $a0, $s0
syscall

li $v0, 1   
add $a0, $0, $s0
syscall

li $v0, 10
syscall

Answer 2

`numpy`

split。
concatenate目标列，而repeat位于阵列的其余部分

def explode(v, i, sep='|'):
    v = v.astype(str)
    n, m = v.shape
    a = v[:, i]
    bslc = np.r_[0:i, i+1:m]
    asrt = np.append(i, bslc).argsort()
    b = v[:, bslc]
    a = np.core.defchararray.split(a, sep)
    A = np.concatenate(a)[:, None]
    counts = [len(x) for x in a.tolist()]
    rpt = np.arange(n).repeat(counts)
    return np.concatenate([A, b[rpt]], axis=1)[:, asrt]

pd.DataFrame(
    explode(explode(explode(df.values, 0), 1), 2),
    columns=df.columns
)

  from to cc extra_columns
0    1  2  3           sth
1    1  4              sth
2    1  4              sth
3    3  1  4           sth
4    3  1  5           sth
5    3  2  4           sth
6    3  2  5           sth

时间测试
给出数据
快10倍

给定数据重复1000次
快100 X

Answer 3

最简单的方法是移出pandas进行列表操作，然后将新数据移回pandas数据帧。在这里，我们使用itertools中的product来创建所有组合，然后将每一行拆分成一堆列表。

请注意，我使用字母而不是数字作为数据。

import pandas as pd
from itertools import product

df = pd.DataFrame([['a',  'b', 'c', 'xtra'],
                   ['a|a','d',  '', 'xtra'],
                   ['c',  'a|b','d|e','xtra']],
                   columns=['From','To','CC','extra_cols'])

split_data = []
for row in df.values.tolist():
    split_data.extend(list(product(*[item.split('|') for item in row])))

new_df = pd.DataFrame(split_data, columns=['From','To','CC','extra_cols'])

> new_df
#   From To CC extra_cols
# 0    a  b  c       xtra
# 1    a  d          xtra
# 2    a  d          xtra
# 3    c  a  d       xtra
# 4    c  a  e       xtra
# 5    c  b  d       xtra
# 6    c  b  e       xtra

Pandas dataframe：单元格中的管道分隔值

3 个答案:

`numpy`