我有两个表,一个包含SCHEDULE_DATE
(超过300,000条记录)和WORK_WEEK_CODE
,第二个表包含WORK_WEEK_CODE
,START_DATE
和END_DATE
。第一个表具有重复的计划日期,第二个表是3,200个唯一值。我需要根据计划日期所在的范围,使用表2中的WORK_WEEK_CODE
填充表{1}中的WORK_WEEK_CODE
。两张表的样本如下。
我能够使用嵌套的arcpy.da.SearchCursor使用arcpy.da.UpdateCursor
完成任务,但是使用记录量需要很长时间。任何关于更好(和更少时间)方法的建议都将受到高度赞赏。
注意:日期字段的格式为字符串
表1
SCHEDULE_DATE,WORK_WEEK_CODE
20160219
20160126
20160219
20160118
20160221
20160108
20160129
20160201
20160214
20160127
表2
WORK_WEEK_CODE,START_DATE,END_DATE
1601,20160104,20160110
1602,20160111,20160117
1603,20160118,20160124
1604,20160125,20160131
1605,20160201,20160207
1606,20160208,20160214
1607,20160215,20160221
答案 0 :(得分:0)
您可以使用Pandas dataframes作为更有效的方法。这是使用Pandas的方法。希望这会有所帮助:
import pandas as pd
# First you need to convert your data to Pandas Dataframe I read them from csv
Table1 = pd.read_csv('Table1.csv')
Table2 = pd.read_csv('Table2.csv')
# Then you need to add a shared key for join
Table1['key'] = 1
Table2['key'] = 1
#The following line joins the two tables
mergeddf = pd.merge(Table1,Table2,how='left',on='key')
#The following line converts the string dates to actual dates
mergeddf['SCHEDULE_DATE'] = pd.to_datetime(mergeddf['SCHEDULE_DATE'],format='%Y%m%d')
mergeddf['START_DATE'] = pd.to_datetime(mergeddf['START_DATE'],format='%Y%m%d')
mergeddf['END_DATE'] = pd.to_datetime(mergeddf['END_DATE'],format='%Y%m%d')
#The following line will filter and keep only lines that you need
result = mergeddf[(mergeddf['SCHEDULE_DATE'] >= mergeddf['START_DATE']) & (mergeddf['SCHEDULE_DATE'] <= mergeddf['END_DATE'])]