我有一组不等长度列表,这是计算每个长度的最快方法

时间:2016-11-10 00:54:50

标签: python pandas numpy

考虑数组a

a = np.array([
        [list(range(np.random.randint(4, 10))) for _ in range(10)],
        [list(range(np.random.randint(4, 10))) for _ in range(10)]
    ]).T

print(a)

[[[0, 1, 2, 3, 4, 5, 6] [0, 1, 2, 3, 4, 5]]
 [[0, 1, 2, 3, 4, 5, 6, 7] [0, 1, 2, 3, 4, 5, 6]]
 [[0, 1, 2, 3, 4, 5, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8]]
 [[0, 1, 2, 3, 4, 5, 6, 7, 8] [0, 1, 2, 3, 4]]
 [[0, 1, 2, 3, 4, 5] [0, 1, 2, 3, 4, 5, 6, 7]]
 [[0, 1, 2, 3, 4, 5] [0, 1, 2, 3, 4, 5, 6]]
 [[0, 1, 2, 3, 4] [0, 1, 2, 3, 4]]
 [[0, 1, 2, 3, 4, 5, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8]]
 [[0, 1, 2, 3, 4, 5, 6, 7, 8] [0, 1, 2, 3, 4, 5]]
 [[0, 1, 2, 3, 4, 5, 6, 7] [0, 1, 2, 3, 4, 5, 6]

我希望输出像这样

[[7 6]
 [8 7]
 [9 9]
 [9 5]
 [6 8]
 [6 7]
 [5 5]
 [9 9]
 [9 6]
 [8 7]]

2 个答案:

答案 0 :(得分:3)

为了让它最有效,我建议使用发电机:

mygen = (map(len, row) for row in a)

这样您就不需要一次计算所有内容。您只在需要时才这样做。但我没有任何CPU基准来备份它。

答案 1 :(得分:1)

方法1
pandas

def pir1(a):
    return pd.Series(a.ravel()).str.len().values.reshape(a.shape)

方法2
itertoolsmaplen

def pir2(a):
    return np.array(
        [i for i in map(len, itertools.chain.from_iterable(a))]).reshape(a.shape)

方法3
@Marcin

def marcin(a):
    return(np.array([[i for i in map(len, row) for row in a])

小数组

n, m = 10, 2
a = np.array(
    [[list(range(np.random.randint(1, 21))) for _ in range(m)] for _ in range(n)]
)

enter image description here

大数组

n, m = 1000, 20
a = np.array(
    [[list(range(np.random.randint(1, 21))) for _ in range(m)] for _ in range(n)]
)

enter image description here

非常大的数组

n, m = 10000, 200
a = np.array(
    [[list(range(np.random.randint(1, 21))) for _ in range(m)] for _ in range(n)]
)

enter image description here