遍历列表并比较值

时间:2020-06-26 15:11:07

标签: python for-loop pyspark apache-spark-sql

im试图遍历日期列表的列,并说第二个日期是否比第一个日期晚10分钟或更长,则返回“ 1”,否则返回“ 0”,如果第三个日期比第二个日期晚10分钟或更大,则提示“ '1'否则'0'等。

很抱歉,如果我已经回答了这个问题,我似乎找不到任何帮助。

列表大小各不相同。有人知道我该怎么做吗?

df = df_data_collective.groupBy("customer_id").agg(
    F.expr("collect_list(start_dt)").alias("start_times")
)

这将输出客户ID和喜欢的日期时间列表

['2020-04-02T08:15:50+01:00', '2020-04-02T08:15:53+01:00', '2020-04-02T08:15:56+01:00', '2020-04-02T08:16:01+01:00', '2020-04-02T08:16:07+01:00', '2020-04-02T08:21:05+01:00', '2020-04-02T08:21:17+01:00', '2020-04-02T08:21:30+01:00', '2020-04-02T08:21:43+01:00', '2020-04-02T08:21:49+01:00', '2020-04-02T08:22:11+01:00', '2020-04-02T08:22:16+01:00', '2020-04-02T08:24:02+01:00', '2020-04-02T08:24:09+01:00', '2020-04-02T08:24:37+01:00', '2020-04-02T08:36:26+01:00', '2020-04-02T08:39:25+01:00', '2020-04-02T08:39:41+01:00', '2020-04-02T08:39:52+01:00', '2020-04-02T08:40:18+01:00', '2020-04-02T08:40:27+01:00', '2020-04-02T08:40:33+01:00', '2020-04-02T08:40:49+01:00', '2020-04-02T08:41:03+01:00', '2020-04-02T08:41:29+01:00', '2020-04-02T08:42:00+01:00', '2020-04-02T08:42:23+01:00', '2020-04-02T08:42:57+01:00', '2020-04-02T08:44:43+01:00', '2020-04-02T08:44:49+01:00']

我对for循环有非常基本的了解,但是仍在培训中,希望了解是否有人可以提供任何建议?

3 个答案:

答案 0 :(得分:0)

from datetime import datetime, timedelta

dt_str_list = ['2020-04-02T08:15:50+01:00', '2020-04-02T08:15:53+01:00',
               '2020-04-02T08:15:56+01:00', '2020-04-02T08:16:01+01:00',
               '2020-04-02T08:16:07+01:00', '2020-04-02T08:21:05+01:00',
               '2020-04-02T08:21:17+01:00', '2020-04-02T08:21:30+01:00',
               '2020-04-02T08:21:43+01:00', '2020-04-02T08:21:49+01:00',
               '2020-04-02T08:22:11+01:00', '2020-04-02T08:22:16+01:00',
               '2020-04-02T08:24:02+01:00', '2020-04-02T08:24:09+01:00',
               '2020-04-02T08:24:37+01:00', '2020-04-02T08:36:26+01:00',
               '2020-04-02T08:39:25+01:00', '2020-04-02T08:39:41+01:00',
               '2020-04-02T08:39:52+01:00', '2020-04-02T08:40:18+01:00',
               '2020-04-02T08:40:27+01:00', '2020-04-02T08:40:33+01:00',
               '2020-04-02T08:40:49+01:00', '2020-04-02T08:41:03+01:00',
               '2020-04-02T08:41:29+01:00', '2020-04-02T08:42:00+01:00',
               '2020-04-02T08:42:23+01:00', '2020-04-02T08:42:57+01:00',
               '2020-04-02T08:44:43+01:00', '2020-04-02T08:44:49+01:00']


dt_list = [datetime.strptime(dt_str, '%Y-%m-%dT%H:%M:%S%z')
           for dt_str in dt_str_list]

minute_10 = timedelta(minutes=10)
flags = [1 if dt_list[i] - dt_list[i-1] > minute_10 else 0
         for i in range(1, len(dt_list))]

答案 1 :(得分:0)

您可以使用MySecondClass::MySecondClass(PyObject* p){ // get the attribute from p; equivalent of cpp_attr = p.attr PyObject* cpp_attr = PyObject_getAttrString(p, (char*)"attr")); // somehow get back the pointer to MyClass object created in function1 } 方法:

str.split()

输出:

from datetime import datetime, timedelta

lst = ['2020-04-02T08:15:50+01:00', '2020-04-02T08:15:53+01:00', '2020-04-02T08:15:56+01:00', '2020-04-02T08:16:01+01:00', '2020-04-02T08:16:07+01:00', '2020-04-02T08:21:05+01:00', '2020-04-02T08:21:17+01:00', '2020-04-02T08:21:30+01:00', '2020-04-02T08:21:43+01:00', '2020-04-02T08:21:49+01:00', '2020-04-02T08:22:11+01:00', '2020-04-02T08:22:16+01:00', '2020-04-02T08:24:02+01:00', '2020-04-02T08:24:09+01:00', '2020-04-02T08:24:37+01:00', '2020-04-02T08:36:26+01:00', '2020-04-02T08:39:25+01:00', '2020-04-02T08:39:41+01:00', '2020-04-02T08:39:52+01:00', '2020-04-02T08:40:18+01:00', '2020-04-02T08:40:27+01:00', '2020-04-02T08:40:33+01:00', '2020-04-02T08:40:49+01:00', '2020-04-02T08:41:03+01:00', '2020-04-02T08:41:29+01:00', '2020-04-02T08:42:00+01:00', '2020-04-02T08:42:23+01:00', '2020-04-02T08:42:57+01:00', '2020-04-02T08:44:43+01:00', '2020-04-02T08:44:49+01:00']

def s(d):
    h,m,s = d.split(':',2)
    h = int(h[-2:])*60*60
    m = int(m)*60
    s = int(s[:2])
    return h+m+s

c = [1 if s(lst[i-1])-s(d) >= 600 and i else 0 for i,d in enumerate(lst)]

print(c)

答案 2 :(得分:0)

首先,必须将 start_dt 转换为 timestamp 格式,然后在收集列表之后,我们可以应用 {{1} } 功能与 transform(with index as i) 一起获得所需的输出。 (从 unix_timestamp 开始可用转换)

spark2.4