Question

我试图将pandas数据帧放入字典中，而不是相反。

我尝试将数据框块列表作为字典中的值放置，Python返回错误而没有任何解释。

这是我想要做的事情：

我将messenger chatlog csv文件导入到pandas数据框中并设法按日期拆分，并将它们全部放在列表中。

现在我想迭代这个列表并将其拆分得更多：如果聊天停止时间超过15分钟，它就会被分成两部分。我想制作另一个特定日期聊天块的列表，然后将它们放在一个字典中，其中键是日期，值是这些块的列表。

然后突然间Python返回错误。下面是我卡住的地方，错误返回。

import pandas as pd
from datetime import datetime

# Get chatlog and turn it into Pandas Dataframe
ktlk_csv = pd.read_csv(r'''C:\Users\Jaepil\PycharmProjects\test_pycharm/5years.csv''', encoding="utf-8")
df = pd.DataFrame(ktlk_csv)

# Change "Date" column from String to DateTime 
df["Date"] = pd.to_datetime(df["Date"])

# Make a column "time_diff" which is literally diffences of timestamp between chats. 
df["time_diff"] = df["Date"].diff()
df["time_diff"] = df["time_diff"].dt.total_seconds()

# Criteria to split chat chunks 
chunk_tolerance = 900 # 900: 15min of silence splits a chat
chunk_min = 5 # a chat less than 5 min is not a chunk. 

# Split a chatlog by date. (1st split)
df_byDate = []
for group in df.groupby(lambda x: df["Date"][x].day):
    df_byDate.append(group)

# Iterate over the list of splitted chats and split them into many chunks
df_chunk = {}
for day in df_byDate:
    table = day[1]
    list_of_daily_chunks = []
    for group in table.groupby(lambda x: table["time_diff"][x] < chunk_tolerance ):
        list_of_daily_chunks.append(group)

    # It does NOT return any error up to this point. 

    key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")
    df_chunk[key] = list_of_daily_chunks

这会返回错误：

＆GT; C：/用户/ Jaepil / PycharmProjects / test_pycharm / PYNEER_KatalkBot _-_ CSV_to_Chunk.py Traceback（最近一次调用最后一次）：文件＆＃34; C：/ Users / Jaepil / PycharmProjects / test_pycharm / PYNEER_KatalkBot _-_ CSV_to_Chunk.py＆＃34;，第32行，in key = table.loc [：，＆＃34; Date＆＃34;]。dt.date [0] .strftime（＆＃34;％Y-％m-％d＆＃34;）文件＆＃34; C：\ Users \ Jaepil \ Anaconda3 \ lib \ site-packages \ pandas \ core \ series.py＆＃34;，第601行， getitem result = self.index.get_value（self，key）文件＆＃34; C：\ Users \ Jaepil \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py＆＃34;，第2477行，在get_value中 tz = getattr（series.dtype，＆＃39; tz＆＃39;，None））文件＆＃34; pandas_libs \ index.pyx＆＃34;，第98行，在pandas._libs.index.IndexEngine.get_value（pandas_libs \ index.c：4404）文件＆＃34; pandas_libs \ index.pyx＆＃34;，第106行，pandas._libs.index.IndexEngine.get_value（pandas_libs \ index.c：4087）文件＆＃34; pandas_libs \ index.pyx＆＃34;，第154行，pandas._libs.index.IndexEngine.get_loc（pandas_libs \ index.c：5126）文件＆＃34; pandas_libs \ hashtable_class_helper.pxi＆＃34;，第759行，pandas._libs.hashtable.Int64HashTable.get_item（pandas_libs \ hashtable.c：14031）文件＆＃34; pandas_libs \ hashtable_class_helper.pxi＆＃34;，第765行，在pandas._libs.hashtable.Int64HashTable.get_item（pandas_libs \ hashtable.c：13975） KeyError：0

我做错了什么？起初，我得到一个错误，系列对象不能被散列，所以我把它改成了一个字符串。但是，现在存在不同的错误。

"Series objects are mutable and cannot be hashed" error

Answer 1

我认为你需要改为：

key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")

首先按strftime转换为string，然后按iat选择第一个值：

key = table["Date"].dt.strftime("%Y-%m-%d").iat[0]

或使用iloc选择get_loc的第一行作为列Date的位置：

key = table.iloc[0, df.columns.get_loc("Date")].strftime("%Y-%m-%d")

如何将包含pandas数据帧列表的字典设为值？

1 个答案: