我有一个数据帧,显示item_number,number_picked和date_expected,并且想添加一个新列,并用与日期相对应的星期几自动填充(大数据集,无法单独标注)。
我已尝试确保查询的数据以日期格式显示,但不确定是否成功。它不提供任何错误,但仍将列列为“对象”。 我也尝试使用dataframe.dt.datetime和dataframe.dt.day_name来完成此操作,但无济于事。
我尝试通过如下所示的两种方式启动查询来完成此操作:
SQL = ('SELECT item_number AS UPC, quantity_picked, date_expec AS date_expected FROM [Data] ORDER BY [date_expected] ASC')
SQL = ('SELECT item_number AS UPC, quantity_picked, CAST(date_expec AS date) AS date_expected FROM [Data] ORDER BY [date_expected] ASC')
我尝试了上述两种方法和以下两种方法的每种组合,试图将带有星期几的新列添加到数据框中:
practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)
practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)
作为另一种尝试,我一次将第二段代码剥离了下来,发现从该行中删除了parse_dates段,从而将查询结果转换为一个数据帧,而所有其他行都允许该代码运行而没有错误。然后,我尝试了以下方法...
practice_df = pd.read_sql_query(SQL, con=sql_conn)
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df.append(practice_df['date_num'].dt.day_name())
print(practice_df)
在研究了pd.read_sql_query和series.dt.datetime文档并查看以下发布并回答的问题以寻求指导之后,我尝试自行提出解决方案:
How does parse_dates work with pd.read_sql_query
Create a day-of-week column in a Pandas dataframe using Python
当查询选项和第二个数据框选项中的任何一个出现错误消息
File "...anaconda3\lib\site-packages\numpy\core\shape_base.py", line 283, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
MemoryError
在创建数据框并添加新列时使用第一个选项,数据将打印为:
UPC quantity_picked date_expected day_of_week
0 0001111085148 1.0 NaT NaN
1 0001111086984 1.0 NaT NaN
2 0001111088636 1.0 NaT NaN
3 0001111097045 1.0 NaT NaN
4 0001450002690 1.0 NaT NaN
5 0001600012479 1.0 NaT NaN
6 0003800019891 1.0 NaT NaN
7 0004450034115 1.0 NaT NaN
8 0005100021165 1.0 NaT NaN
当我尝试对上面列出的数据框进行最后一次查询时,收到以下错误:
File
"...lib\site-packages\pandas\core\internals\managers.py", line 1325, in _make_na_block
block_values = np.empty(block_shape, dtype=dtype)
MemoryError
是否应该有一种更简单的方法来解决此问题或缺少的事情?任何指导都将不胜感激。
答案 0 :(得分:0)
您可以直接使用DATENAME
在SQL Server中处理此问题:
SELECT
item_number AS UPC,
quantity_picked,
date_expec AS date_expected,
DATENAME(dw, date_expec) AS day_of_week
FROM [Data]
ORDER BY [date_expected]