将日期时间数据框与期间数据框进行比较

时间:2019-05-28 17:10:49

标签: python python-3.x pandas dataframe datetime

我一直遇到一个简单的熊猫数据框问题,也许有人在...之前遇到过这种情况...

预先感谢您:)

您好,有两个数据帧df1和df2:

df1

unique_id    timestamp
1            2019-01-21
2            2019-02-01
3            2019-04-05
4            2019-05-01
5            2019-05-12
...          ...

df2

classification     from            to
A                  2019-01-05      2019-02-02
B                  2019-02-03      2019-02-28
C                  2019-03-01      2019-04-05
D                  2019-04-06      2019-05-03
E                  2019-05-04      2019-05-31
...                ...             ...

我的目标是将df1中的每个时间戳与df2中的每个 日期间隔进行比较,并能够对每个 df1的unique_id 与df2的对应分类

我正在尝试这样的事情:

df1.loc[(df1['timestamp'] > df2['from]) & (df1['timestamp'] < df2['to']), 'class'] = df2['classification']´

始终会出现 ValueError:尽管两个datetime dtypes完全相同,但只能比较标记相同的Series对象 datetime64 [ns] ...

预期产量

unique_id         timestamp        classification
1                 2019-01-21       A
2                 2019-02-01       A
3                 2019-04-05       C
4                 2019-05-01       D
5                 2019-05-12       E
...               ...              ...

3 个答案:

答案 0 :(得分:0)

我个人要做的是将时间戳转换为unix时间戳。

for row in df1['timestamp']:
    row = int(mktime(row.timetuple())   

对df2做同样的操作以获取您的开始和结束时间戳记,因此您可以使用编写的df1.loc[(df1['timestamp'] > df2['from]) & (df1['timestamp'] < df2['to']), 'class'] = df2['classification']´而不获取错误消息

答案 1 :(得分:0)

尝试:
import numpy as np
现在代替
df1['timestamp'] > df2['from]
试试:
np.greater(df1['timestamp'],df2['from])
看起来您正在尝试获得正确\错误的答案。
可能要在这里看看:https://docs.scipy.org/doc/numpy/reference/routines.logic.html

答案 2 :(得分:0)

您正在混合两个数据帧的索引。您建议使用这种语法,按行进行比较。如果我们精简以下数据帧(大小不同),就可以看到它:

df1 = pd.DataFrame(
    [[1, "2019-01-21"],
    [2, "2019-02-01"],
    [3, "2019-04-05"],
    [4, "2019-04-05"],
    [5, "2019-04-05"],
    [6, "2019-04-05"],
    [7, "2019-05-01"],
    [8, "2019-05-12"]],
    columns=["unique_id", "timestamp"])

df2 = pd.DataFrame([
    ["A", "2019-01-05", "2019-02-02"],
    ["D", "2019-04-06", "2019-05-03"],
    ["C", "2019-03-01", "2019-04-05"],
    ["B", "2019-02-03", "2019-02-28"],
    ["E", "2019-05-04", "2019-05-31"],],
    columns=["classification", "from", "to"])

# Comparaison of different dataframes
print((df1['timestamp'] > df2['from']))

引发错误:

  

ValueError:只能比较标记相同的Series对象

此处,您要根据匹配的日期时间间隔进行比较。因此,您需要区分两个数据框。要将字符串数据转换为日期,pandas.to_datetime(doc)

这里是一种方法:

# import modules
import pandas as pd

df1 = pd.DataFrame(
    [[1, "2019-01-21"],
    [2, "2019-02-01"],
    [3, "2019-04-05"],
    [4, "2019-04-05"],
    [5, "2019-04-05"],
    [6, "2019-04-05"],
    [7, "2019-05-01"],
    [8, "2019-05-12"]],
    columns=["unique_id", "timestamp"])

df2 = pd.DataFrame([
    ["A", "2019-01-05", "2019-02-02"],
    ["D", "2019-04-06", "2019-05-03"],
    ["C", "2019-03-01", "2019-04-05"],
    ["B", "2019-02-03", "2019-02-28"],
    ["E", "2019-05-04", "2019-05-31"],],
    columns=["classification", "from", "to"])

# convert to datetime
df1["timestamp"] = pd.to_datetime(df1["timestamp"], format="%Y-%m-%d")
df2[["from", "to"]] = df2[["from", "to"]].apply(pd.to_datetime, format="%Y-%m-%d")

# Try to compare 2 different dataframes
# print((df1['timestamp'] > df2['from']))

class_column = []
for index, row in df1.iterrows():
    class_fd2 = df2[(df2["from"] <= row["timestamp"]) & (df2["to"] >= row["timestamp"])]["classification"].values[0]
    class_column.append(class_fd2)
df1["class1"] = class_column
print(df1)
#    unique_id  timestamp class1
# 0          1 2019-01-21      A
# 1          2 2019-02-01      A
# 2          3 2019-04-05      C
# 3          4 2019-04-05      C
# 4          5 2019-04-05      C
# 5          6 2019-04-05      C
# 6          7 2019-05-01      D
# 7          8 2019-05-12      E

您也可以在函数中执行此操作以应用于df1

def set_class(row):
    return df2[(df2["from"] <= row["timestamp"]) & (
        df2["to"] >= row["timestamp"])]["classification"].values[0]
# Process
df1["class2"] = df1.apply(set_class, axis=1)
print(df1)
#    unique_id  timestamp class1 class2
# 0          1 2019-01-21      A      A
# 1          2 2019-02-01      A      A
# 2          3 2019-04-05      C      C
# 3          4 2019-04-05      C      C
# 4          5 2019-04-05      C      C
# 5          6 2019-04-05      C      C
# 6          7 2019-05-01      D      D
# 7          8 2019-05-12      E      E