Question

嗨，我有一个数据框，如下所示：

    starttime                     endtime                                        positions
0   2019-05-16 05:34:26.870 2019-05-16 05:34:41.721 [7, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24...
1   2019-05-16 05:33:56.143 2019-05-16 05:34:10.995 [9, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23...
2   2019-05-16 05:33:35.659 2019-05-16 05:33:50.510 [13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 2...
3   2019-05-16 05:33:04.933 2019-05-16 05:33:19.784 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,...
4   2019-05-16 05:34:11.507 2019-05-16 05:34:26.358 [3, 4, 9, 10, 11, 12, 15, 16, 17, 18, 19, 20, ...

我要对行进行排序，以使列表包含list(range(min(val),max(val)))形式的连续值。

我尝试了

df[df["positions"] == list(range(min(df["positions"],max(df["positions"]))))]

但是我得到如下错误：

ValueError：长度必须匹配才能进行比较

是因为每个列表都有不同的长度吗？如果可以，该怎么解决？

Answer 1

一种方法是在列表列上使用.apply：

df['position'].apply(lambda x: x == list(range(min(x), max(x) + 1)))

最小示例

# Example input
df = pd.DataFrame({'starttime': list(range(3)), 
                   'endtime': list(range(1, 4)), 
                   'positions': None})

# Manually insert lists into the 'positions' column entries
df.iat[0, 2] = [1, 4, 9]
df.iat[1, 2] = list(range(6))
df.iat[2, 2] = list(range(-4, 3))

# Get a boolean Series
df['positions'].apply(lambda x: x == list(range(min(x), max(x) + 1)))

0    False
1     True
2     True

筛选包含在熊猫数据框中具有连续值的列表的行

1 个答案:

最小示例