我有一个看起来像这个代表性数据集的数据集(它是this query的结果集):
time
2012-02-01 23:43:16.9088243 <--
2012-02-01 23:43:16.9093561
2012-02-01 23:43:16.9098879
2012-02-01 23:43:17.1018243 <--
2012-02-01 23:43:17.1023561
2012-02-01 23:43:17.1028879
2012-02-01 23:43:17.2018243 <--
2012-02-01 23:43:17.2023561
2012-02-01 23:43:17.2028879
结果包含数百万行,所以现在我们需要一种方法来细化它,以便我们分析它。
如果你注意到,上面例子的前三行是彼此的千分之一秒,但接下来的三行是十分之一秒,而后面的三行也是由一个十分之一秒。我已经添加了空行(不在原始数据中)来说明这一点。
我需要一个查询,识别那些距离上一个时间戳超过千分之一秒的时间戳。结果输出(假设第一组三个也是十分之一第二个远离前一个)将是:
2012-02-01 23:43:16.9088243
2012-02-01 23:43:17.1018243
2012-02-01 23:43:17.2018243
我知道我可能需要某种Row_Number功能和分区,但我无法完全理解它。
答案 0 :(得分:1)
您可以使用Traceback (most recent call last):
File "test_script.py", line 8, in <module>
data = pd.concat(data1, ignore_index=True)
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 206, in concat
copy=copy)
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 236, in __init__
objs = list(objs)
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 978, in __next__
return self.get_chunk()
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1042, in get_chunk
return self.read(nrows=size)
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1005, in read
ret = self._engine.read(nrows)
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1748, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 893, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10885)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: out of memory
/opt/gridengine/default/Federation/spool/execd/kcompute030/job_scripts/5883517: line 10: 29990 Segmentation fault (core dumped) python3.6 test_script.py
:
lag()