如何在一周中的几天中单独标记列

时间:2019-06-24 14:17:42

标签: python sql-server pandas dataframe datetime

我有一个数据帧,显示item_number,number_picked和date_expected,并且想添加一个新列,并用与日期相对应的星期几自动填充(大数据集,无法单独标注)。

我已尝试确保查询的数据以日期格式显示,但不确定是否成功。它不提供任何错误,但仍将列列为“对象”。 我也尝试使用dataframe.dt.datetime和dataframe.dt.day_name来完成此操作,但无济于事。

我尝试通过如下所示的两种方式启动查询来完成此操作:

SQL = ('SELECT item_number AS UPC, quantity_picked, date_expec AS date_expected FROM [Data] ORDER BY [date_expected] ASC')

SQL = ('SELECT item_number AS UPC, quantity_picked, CAST(date_expec AS date) AS date_expected FROM [Data] ORDER BY [date_expected] ASC')

我尝试了上述两种方法和以下两种方法的每种组合,试图将带有星期几的新列添加到数据框中:

practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)
practice_df = pd.read_sql_query(SQL, con=sql_conn, parse_dates={'date_expected':'%Y%m%d'})
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df['date_expected'].dt.day_name()
print(practice_df)

作为另一种尝试,我一次将第二段代码剥离了下来,发现从该行中删除了parse_dates段,从而将查询结果转换为一个数据帧,而所有其他行都允许该代码运行而没有错误。然后,我尝试了以下方法...

practice_df = pd.read_sql_query(SQL, con=sql_conn)
practice_df['date_num'] = practice_df.append(pd.to_datetime(practice_df['date_expected']))
practice_df['day_of_week'] = practice_df.append(practice_df['date_num'].dt.day_name())
print(practice_df)

在研究了pd.read_sql_query和series.dt.datetime文档并查看以下发布并回答的问题以寻求指导之后,我尝试自行提出解决方案:

How does parse_dates work with pd.read_sql_query

Create a day-of-week column in a Pandas dataframe using Python

当查询选项和第二个数据框选项中的任何一个出现错误消息

  File "...anaconda3\lib\site-packages\numpy\core\shape_base.py", line 283, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)

MemoryError

在创建数据框并添加新列时使用第一个选项,数据将打印为:

                 UPC  quantity_picked date_expected  day_of_week
0      0001111085148              1.0           NaT          NaN
1      0001111086984              1.0           NaT          NaN
2      0001111088636              1.0           NaT          NaN
3      0001111097045              1.0           NaT          NaN
4      0001450002690              1.0           NaT          NaN
5      0001600012479              1.0           NaT          NaN
6      0003800019891              1.0           NaT          NaN
7      0004450034115              1.0           NaT          NaN
8      0005100021165              1.0           NaT          NaN

当我尝试对上面列出的数据框进行最后一次查询时,收到以下错误:

  File 
"...lib\site-packages\pandas\core\internals\managers.py", line 1325, in _make_na_block
    block_values = np.empty(block_shape, dtype=dtype)

MemoryError

是否应该有一种更简单的方法来解决此问题或缺少的事情?任何指导都将不胜感激。

1 个答案:

答案 0 :(得分:0)

您可以直接使用DATENAME在SQL Server中处理此问题:

SELECT
    item_number AS UPC,
    quantity_picked,
    date_expec AS date_expected,
    DATENAME(dw, date_expec) AS day_of_week
FROM [Data]
ORDER BY [date_expected]