Question

我有一个包含5个项目的列表

[6,4,2,4,6]

如果我需要列表长10个项目，并且总和和相对分布相同，则看起来像

[3,3,2,2,1,1,2,2,3,3]

这很简单，因为我将项目的数量增加了一倍，所以我将前一个列表中的每个项目都一分为二。

我正在尝试找出一种方法，使第一个列表的长度不限数量。也许3、7、12、999 ...

很明显，通常不可能完美地做到这一点（如简单的示例中所示），我正在尝试找到一种最合适的解决方案。

Answer 1

IIUC，使用x <- udpipe_annotate(udmodel, x = df$Section) x %>% as_data_frame %>% select(token, POSTag = upos) %>% # select needed columns # add start/end locations mutate(locations = map(token, function(x) data.frame(stringi::stri_locate(df$Section, fixed = x)))) %>% unnest # A tibble: 10 x 4 token POSTag start end <chr> <chr> <int> <int> 1 If SCONJ 1 2 2 an DET 4 5 3 infusion NOUN 7 14 4 reaction NOUN 16 23 5 occurs NOUN 25 30 6 , PUNCT 31 31 7 interrupt VERB 33 41 8 the DET 43 45 9 infusion NOUN 7 14 10 . PUNCT 55 55：

np.repeat

输出：

a1 = np.array([6,4,2,4,6])
targ_len = 10 #Enter target length
len_arr = len(a1)
factor = targ_len / len_arr
a2 = np.repeat(a1, factor) / factor

Answer 2

有许多方法可以满足您的要求，具体取决于您要保留的发行版的属性。一种方法是将其视为插值问题，并尝试将分布的分数保留到给定的（相对）x坐标。我们可以取这些值的累加和，在所需的目标点上进行插值，然后求和结果：

def resampler(orig, N):
    x = np.arange(len(orig) + 1)
    y = np.insert(orig.cumsum(), 0, 0)
    target = np.linspace(0, len(orig), N+1)
    integ = np.interp(target, xp=x, fp=y)
    result = np.diff(integ)
    return result

给我

In [137]: x = np.array([6,4,2,4,6])

In [138]: resampler(x, 5)
Out[138]: array([ 6.,  4.,  2.,  4.,  6.])

In [139]: resampler(x, 10)
Out[139]: array([ 3.,  3.,  2.,  2.,  1.,  1.,  2.,  2.,  3.,  3.])

In [140]: resampler(x, 3)
Out[140]: array([ 8.66666667,  4.66666667,  8.66666667])

In [141]: resampler(x, 3).sum()
Out[141]: 22.0

In [142]: resampler(x, 20)
Out[142]: 
array([ 1.5,  1.5,  1.5,  1.5,  1. ,  1. ,  1. ,  1. ,  0.5,  0.5,  0.5,
        0.5,  1. ,  1. ,  1. ,  1. ,  1.5,  1.5,  1.5,  1.5])

In [143]: resampler(x, 13)
Out[143]: 
array([ 2.30769231,  2.30769231,  2.        ,  1.53846154,  1.53846154,
        0.92307692,  0.76923077,  0.92307692,  1.53846154,  1.53846154,
        2.        ,  2.30769231,  2.30769231])

In [144]: resampler(x, 13).sum()
Out[144]: 22.0

将列表转换为包含更多或更少项目的列表，保持列表的总和以及项目之间的相对差异

2 个答案: