Question

使用PySpark。

跟进：我想我只需要知道如何在列表中的元素之后选择n元素，并将它们与列表本身结合起来。

例如，您有一个列表'a'，'b'，'c'，'d'，'e'，'f'，'g'

+-------+-----+
| _index| item|
+-------+-----+
|   0   |   a |
|   1   |   b |
|   2   |   c |
|   3   |   d |
|   4   |   e |
|   5   |   f |
|   6   |   g |
+-------+-----+

索引0至6的

；然后我们想将列表本身加入{c}之后的n=3元素，然后得到

+--------+-------+-------+
| _index | item1 | item2 |
+--------+-------+-------+
|   3    |   d   |   d   |
|   4    |   e   |   e   |
|   5    |   f   |   f   |
+--------+-------+-------+

以下是一段相关代码。是否可以修改此代码，以选择A内n内的元素，并将它们与列表包含A？我是新来的火花，我需要一些帮助！谢谢！

假设我们有很多列表。我们首先在这些列表中找到一个条件为condition1的元素。给它起别名A。

如果我们随机选择A的索引之后的另一个元素（在一定的索引距离内，例如1-3），然后将其与包含A的列表一起加入，那么我们可以请执行以下操作。

df.where(
    (col('condition1')==0) # finds an element satisfying some condition, name it as 'A'
).alias('A').join(
    df.alias('B'), 
    # randomly pick another element after 'A' within index distance 1 to 3
    # and join it with the list that contains 'A'
    ((col('A.ListId')==col('B.ListId')) & (random.randint(1,4)+col('A._index'))==col('B._index'))
)

Answer 1

以下是您可以应用的可能解决方法的示例：

git rev-list --parents [...]

所以我认为除了您的联接之外，唯一缺少的部分是从A的索引中获取整数。

将列表中元素后面的n个元素与列表本身连接

1 个答案: