我正在用python编写一个脚本,该脚本使两个表彼此相对。如果满足条件,则脚本将对从属表进行更新。
到目前为止,我的python代码:
def updatedata():
for y in range(updatetable.shape[0]):
for x in range(mastertable.shape[0]):
if updatetable[y].s_date <= mastertable[x].index <= updatetable[y].e_date:
mastertable[x].field2 = updatetable[y]. field2
mastertable[y].field3 = updatetable[y]. field3
我也有这种迭代技术:
for index, row in mastertable.iterrows():
print (row['Value'], index)
for index, row in updatetable.iterrows():
print (row['field1'], row['field2'])
我正在遵循如何在VBA中编写代码:
For x = 1 to lastrow_update
for y = 1 to lastrow_master
if update(x,1) <= master(y,1) and master(y,1) <= update(x,2) then
master (y,2) = update(x,3)
我在python代码中遇到错误。 1)如何为“ for循环”创建两个控制变量 2)如何在比赛后退出内循环以减少运行时间
def updatedata()错误
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/user1/Desktop/project4.py", line 41, in <module>
updatedata()
File "/Users/user1/Desktop/project4.py", line 20, in updatedata
if presidents_data[y].tookoffice <= sp500[x].index <= presidents_data[y].leftoffice:
File "/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File "/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 3843, in get
loc = self.items.get_loc(item)
File "/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
数据样本:
president tookoffice leftoffice party
0 Lyndon B. Johnson 1963-11-22 1969-01-20 Democratic
1 Franklin D. Roosevelt 1933-03-04 1945-04-12 Democratic
2 Herbert Hoover 1929-03-04 1933-03-04 Republican
3 Warren G. Harding 1921-03-04 1923-08-02 Republican
4 Barack Obama 2009-01-20 2017-01-20 Democratic
Value president party_of_president
Date
1871-01-01 4.44 president party_of_president
1871-02-01 4.50 president party_of_president
1871-03-01 4.61 president party_of_president
1871-04-01 4.74 president party_of_president
1871-05-01 4.86 president party_of_president
答案 0 :(得分:0)
使用整数索引按行建立索引时,您需要.iloc
:
if updatetable.iloc[y].s_date <= mastertable.iloc[x].index <= updatetable.iloc[y].e_date:
语法updatetable.iloc [y]的意思是“获取名为y的列”,在这种情况下,y应该是“ president”或您拥有列的另一个字符串。
答案 1 :(得分:0)
考虑后退方向使用熊猫的merge_asof
(即“间隔合并”),或者使用 leftoffice 等效使用前进方向:
merge_df = pd.merge_asof(value_df, pres_df, left_on='Date', right_on='tookoffice',
suffixes=['','_'], direction='backward')
merge_df = pd.merge_asof(value_df, pres_df, left_on='Date', right_on='leftoffice',
suffixes=['','_'], direction='forward')
下面是使用随机数据进行演示的示例,该示例镜像了发布的数据。对于以下解决方案,必须完成两件事:
数据
from io import StringIO
import numpy as np
import pandas as pd
txt = '''
president tookoffice leftoffice party
0 "Lyndon B. Johnson" "1963-11-22" "1969-01-20" Democratic
1 "Franklin D. Roosevelt" "1933-03-04" "1945-04-12" Democratic
2 "Herbert Hoover" "1929-03-04" "1933-03-04" Republican
3 "Warren G. Harding" "1921-03-04" "1923-08-02" Republican
4 "Barack Obama" "2009-01-20" "2017-01-20" Democratic'''
pres_df = pd.read_table(StringIO(txt), sep="\s+", index_col=[0],
parse_dates=['tookoffice', 'leftoffice'])
pres_df = pres_df.sort_values(['tookoffice', 'leftoffice'])
np.random.seed(7012018) # SEEDED FOR REPRODUCIBILITY
value_df = pd.DataFrame({'Value': 4 + abs(np.random.randn(1765)),
'president': 'president',
'party_of_president': 'party_of_president'},
columns=['Value', 'president', 'party_of_president'],
index=pd.date_range('1871-01-01', '2018-01-01', freq='MS'))\
.rename_axis('Date')
value_df = value_df.reset_index()
合并
merge_df = pd.merge_asof(value_df, pres_df, left_on='Date', right_on='tookoffice',
suffixes=['','_'], direction='backward')
# UPDATE NEEDED COLUMNS TO ADJACENT COLUMNS
merge_df['president'] = merge_df['president_']
merge_df['party_of_president'] = merge_df['party']
merge_df['president'] = merge_df['president_']
merge_df['party_of_president'] = merge_df['party']
# CLEAN UP (IN CASE PRESIDENT DF IS NOT EXHAUSTIVE BETWEEN 1871-2018)
mask = ~merge_df['Date'].between(merge_df['tookoffice'], merge_df['leftoffice'])
merge_df.loc[mask, 'president'] = np.nan
merge_df.loc[mask, 'party_of_president'] = np.nan
# SUBSET FIRST 4 COLUMNS AND SET INDEX
merge_df = merge_df[merge_df.columns[:4]].set_index('Date')
输出
print(merge_df.shape) # SAME SHAPE AS ORIGINAL value_df
# (1765, 3)
# FIRST 20 RECORDS
print(merge_df.head(20))
# Value president party_of_president
# Date
# 1871-01-01 4.859688 NaN NaN
# 1871-02-01 4.309355 NaN NaN
# 1871-03-01 5.003074 NaN NaN
# 1871-04-01 4.769772 NaN NaN
# 1871-05-01 5.765133 NaN NaN
# 1871-06-01 5.408663 NaN NaN
# 1871-07-01 4.177684 NaN NaN
# 1871-08-01 5.980318 NaN NaN
# 1871-09-01 5.029296 NaN NaN
# 1871-10-01 4.604133 NaN NaN
# 1871-11-01 4.691276 NaN NaN
# 1871-12-01 5.387712 NaN NaN
# 1872-01-01 4.387162 NaN NaN
# 1872-02-01 4.002513 NaN NaN
# 1872-03-01 6.105690 NaN NaN
# 1872-04-01 5.604589 NaN NaN
# 1872-05-01 4.860393 NaN NaN
# 1872-06-01 4.776127 NaN NaN
# 1872-07-01 4.280952 NaN NaN
# 1872-08-01 4.886334 NaN NaN
# FIRST NON-NULL VALUES
print(merge_df[~pd.isnull(merge_df['president'])].head(20))
# Value president party_of_president
# Date
# 1921-04-01 5.713479 Warren G. Harding Republican
# 1921-05-01 4.542561 Warren G. Harding Republican
# 1921-06-01 5.148667 Warren G. Harding Republican
# 1921-07-01 4.949704 Warren G. Harding Republican
# 1921-08-01 5.138469 Warren G. Harding Republican
# 1921-09-01 5.797446 Warren G. Harding Republican
# 1921-10-01 4.498131 Warren G. Harding Republican
# 1921-11-01 4.216718 Warren G. Harding Republican
# 1921-12-01 6.110533 Warren G. Harding Republican
# 1922-01-01 5.179318 Warren G. Harding Republican
# 1922-02-01 4.808477 Warren G. Harding Republican
# 1922-03-01 4.466641 Warren G. Harding Republican
# 1922-04-01 4.307025 Warren G. Harding Republican
# 1922-05-01 4.337476 Warren G. Harding Republican
# 1922-06-01 4.396854 Warren G. Harding Republican
# 1922-07-01 4.391316 Warren G. Harding Republican
# 1922-08-01 4.748302 Warren G. Harding Republican
# 1922-09-01 5.468115 Warren G. Harding Republican
# 1922-10-01 4.295268 Warren G. Harding Republican
# 1922-11-01 5.432448 Warren G. Harding Republican