我试图将pandas数据帧放入字典中,而不是相反。
我尝试将数据框块列表作为字典中的值放置,Python返回错误而没有任何解释。
这是我想要做的事情:
我将messenger chatlog csv文件导入到pandas数据框中并设法按日期拆分,并将它们全部放在列表中。
现在我想迭代这个列表并将其拆分得更多:如果聊天停止时间超过15分钟,它就会被分成两部分。我想制作另一个特定日期聊天块的列表,然后将它们放在一个字典中,其中键是日期,值是这些块的列表。
然后突然间Python返回错误。下面是我卡住的地方,错误返回。
import pandas as pd
from datetime import datetime
# Get chatlog and turn it into Pandas Dataframe
ktlk_csv = pd.read_csv(r'''C:\Users\Jaepil\PycharmProjects\test_pycharm/5years.csv''', encoding="utf-8")
df = pd.DataFrame(ktlk_csv)
# Change "Date" column from String to DateTime
df["Date"] = pd.to_datetime(df["Date"])
# Make a column "time_diff" which is literally diffences of timestamp between chats.
df["time_diff"] = df["Date"].diff()
df["time_diff"] = df["time_diff"].dt.total_seconds()
# Criteria to split chat chunks
chunk_tolerance = 900 # 900: 15min of silence splits a chat
chunk_min = 5 # a chat less than 5 min is not a chunk.
# Split a chatlog by date. (1st split)
df_byDate = []
for group in df.groupby(lambda x: df["Date"][x].day):
df_byDate.append(group)
# Iterate over the list of splitted chats and split them into many chunks
df_chunk = {}
for day in df_byDate:
table = day[1]
list_of_daily_chunks = []
for group in table.groupby(lambda x: table["time_diff"][x] < chunk_tolerance ):
list_of_daily_chunks.append(group)
# It does NOT return any error up to this point.
key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")
df_chunk[key] = list_of_daily_chunks
这会返回错误:
&GT; C:/用户/ Jaepil / PycharmProjects / test_pycharm / PYNEER_KatalkBot _-_ CSV_to_Chunk.py Traceback(最近一次调用最后一次): 文件&#34; C:/ Users / Jaepil / PycharmProjects / test_pycharm / PYNEER_KatalkBot _-_ CSV_to_Chunk.py&#34;,第32行,in key = table.loc [:,&#34; Date&#34;]。dt.date [0] .strftime(&#34;%Y-%m-%d&#34;) 文件&#34; C:\ Users \ Jaepil \ Anaconda3 \ lib \ site-packages \ pandas \ core \ series.py&#34;,第601行, getitem result = self.index.get_value(self,key) 文件&#34; C:\ Users \ Jaepil \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py&#34;,第2477行,在get_value中 tz = getattr(series.dtype,&#39; tz&#39;,None)) 文件&#34; pandas_libs \ index.pyx&#34;,第98行,在pandas._libs.index.IndexEngine.get_value(pandas_libs \ index.c:4404) 文件&#34; pandas_libs \ index.pyx&#34;,第106行,pandas._libs.index.IndexEngine.get_value(pandas_libs \ index.c:4087) 文件&#34; pandas_libs \ index.pyx&#34;,第154行,pandas._libs.index.IndexEngine.get_loc(pandas_libs \ index.c:5126) 文件&#34; pandas_libs \ hashtable_class_helper.pxi&#34;,第759行,pandas._libs.hashtable.Int64HashTable.get_item(pandas_libs \ hashtable.c:14031) 文件&#34; pandas_libs \ hashtable_class_helper.pxi&#34;,第765行,在pandas._libs.hashtable.Int64HashTable.get_item(pandas_libs \ hashtable.c:13975) KeyError:0
我做错了什么? 起初,我得到一个错误,系列对象不能被散列,所以我把它改成了一个字符串。但是,现在存在不同的错误。