Python使用匹配/更新条件嵌套循环

时间:2018-07-01 00:14:47

标签: python pandas loops

我正在用python编写一个脚本,该脚本使两个表彼此相对。如果满足条件,则脚本将对从属表进行更新。

到目前为止,我的python代码:

def updatedata():
    for y in range(updatetable.shape[0]):
        for x in range(mastertable.shape[0]):
            if updatetable[y].s_date <= mastertable[x].index <= updatetable[y].e_date:
                mastertable[x].field2 = updatetable[y]. field2
                mastertable[y].field3 = updatetable[y]. field3

我也有这种迭代技术:

for index, row in mastertable.iterrows():
    print (row['Value'], index)

for index, row in updatetable.iterrows():
    print (row['field1'], row['field2'])

我正在遵循如何在VBA中编写代码:

For x = 1 to lastrow_update
    for y = 1 to lastrow_master
        if update(x,1) <= master(y,1) and master(y,1) <= update(x,2) then
        master (y,2) = update(x,3)

我在python代码中遇到错误。 1)如何为“ for循环”创建两个控制变量 2)如何在比赛后退出内循环以减少运行时间

def updatedata()错误

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/user1/Desktop/project4.py", line 41, in <module>
    updatedata()
  File "/Users/user1/Desktop/project4.py", line 20, in updatedata
    if presidents_data[y].tookoffice <= sp500[x].index <= presidents_data[y].leftoffice:
  File "/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

数据样本:

               president tookoffice leftoffice       party
0      Lyndon B. Johnson 1963-11-22 1969-01-20  Democratic
1  Franklin D. Roosevelt 1933-03-04 1945-04-12  Democratic
2         Herbert Hoover 1929-03-04 1933-03-04  Republican
3      Warren G. Harding 1921-03-04 1923-08-02  Republican
4           Barack Obama 2009-01-20 2017-01-20  Democratic
            Value  president  party_of_president
Date                                            
1871-01-01   4.44  president  party_of_president
1871-02-01   4.50  president  party_of_president
1871-03-01   4.61  president  party_of_president
1871-04-01   4.74  president  party_of_president
1871-05-01   4.86  president  party_of_president

2 个答案:

答案 0 :(得分:0)

使用整数索引按行建立索引时,您需要.iloc

if updatetable.iloc[y].s_date <= mastertable.iloc[x].index <= updatetable.iloc[y].e_date:

语法updatetable.iloc [y]的意思是“获取名为y的列”,在这种情况下,y应该是“ president”或您拥有列的另一个字符串。

答案 1 :(得分:0)

考虑后退方向使用熊猫的merge_asof(即“间隔合并”),或者使用 leftoffice 等效使用前进方向:

merge_df = pd.merge_asof(value_df, pres_df, left_on='Date', right_on='tookoffice', 
                         suffixes=['','_'], direction='backward')


merge_df = pd.merge_asof(value_df, pres_df, left_on='Date', right_on='leftoffice', 
                         suffixes=['','_'], direction='forward')

下面是使用随机数据进行演示的示例,该示例镜像了发布的数据。对于以下解决方案,必须完成两件事:

  1. 总统数据帧的 tookoffice leftoffice 必须排序;
  2. 应该重新设置
  3. 值数据框的索引,以将日期作为数据框中的一列(设置在末尾)。

数据

from io import StringIO
import numpy as np
import pandas as pd

txt = '''
               president tookoffice leftoffice       party
0      "Lyndon B. Johnson" "1963-11-22" "1969-01-20"  Democratic
1  "Franklin D. Roosevelt" "1933-03-04" "1945-04-12"  Democratic
2         "Herbert Hoover" "1929-03-04" "1933-03-04"  Republican
3      "Warren G. Harding" "1921-03-04" "1923-08-02"  Republican
4           "Barack Obama" "2009-01-20" "2017-01-20"  Democratic'''


pres_df = pd.read_table(StringIO(txt), sep="\s+", index_col=[0], 
                        parse_dates=['tookoffice', 'leftoffice'])

pres_df = pres_df.sort_values(['tookoffice', 'leftoffice'])


np.random.seed(7012018)   # SEEDED FOR REPRODUCIBILITY
value_df = pd.DataFrame({'Value': 4 + abs(np.random.randn(1765)),
                         'president': 'president',
                         'party_of_president': 'party_of_president'},
                        columns=['Value', 'president', 'party_of_president'],
                        index=pd.date_range('1871-01-01', '2018-01-01', freq='MS'))\
                       .rename_axis('Date')

value_df = value_df.reset_index()

合并

merge_df = pd.merge_asof(value_df, pres_df, left_on='Date', right_on='tookoffice', 
                         suffixes=['','_'], direction='backward')

# UPDATE NEEDED COLUMNS TO ADJACENT COLUMNS
merge_df['president'] = merge_df['president_']
merge_df['party_of_president'] = merge_df['party']

merge_df['president'] = merge_df['president_']
merge_df['party_of_president'] = merge_df['party']

# CLEAN UP (IN CASE PRESIDENT DF IS NOT EXHAUSTIVE BETWEEN 1871-2018)
mask = ~merge_df['Date'].between(merge_df['tookoffice'], merge_df['leftoffice'])

merge_df.loc[mask, 'president'] = np.nan
merge_df.loc[mask, 'party_of_president'] = np.nan

# SUBSET FIRST 4 COLUMNS AND SET INDEX
merge_df = merge_df[merge_df.columns[:4]].set_index('Date')

输出

print(merge_df.shape)    # SAME SHAPE AS ORIGINAL value_df
# (1765, 3)


# FIRST 20 RECORDS
print(merge_df.head(20))   
#                Value president party_of_president
# Date                                             
# 1871-01-01  4.859688       NaN                NaN
# 1871-02-01  4.309355       NaN                NaN
# 1871-03-01  5.003074       NaN                NaN
# 1871-04-01  4.769772       NaN                NaN
# 1871-05-01  5.765133       NaN                NaN
# 1871-06-01  5.408663       NaN                NaN
# 1871-07-01  4.177684       NaN                NaN
# 1871-08-01  5.980318       NaN                NaN
# 1871-09-01  5.029296       NaN                NaN
# 1871-10-01  4.604133       NaN                NaN
# 1871-11-01  4.691276       NaN                NaN
# 1871-12-01  5.387712       NaN                NaN
# 1872-01-01  4.387162       NaN                NaN
# 1872-02-01  4.002513       NaN                NaN
# 1872-03-01  6.105690       NaN                NaN
# 1872-04-01  5.604589       NaN                NaN
# 1872-05-01  4.860393       NaN                NaN
# 1872-06-01  4.776127       NaN                NaN
# 1872-07-01  4.280952       NaN                NaN
# 1872-08-01  4.886334       NaN                NaN


# FIRST NON-NULL VALUES
print(merge_df[~pd.isnull(merge_df['president'])].head(20))
#                Value          president party_of_president
# Date                                                      
# 1921-04-01  5.713479  Warren G. Harding         Republican
# 1921-05-01  4.542561  Warren G. Harding         Republican
# 1921-06-01  5.148667  Warren G. Harding         Republican
# 1921-07-01  4.949704  Warren G. Harding         Republican
# 1921-08-01  5.138469  Warren G. Harding         Republican
# 1921-09-01  5.797446  Warren G. Harding         Republican
# 1921-10-01  4.498131  Warren G. Harding         Republican
# 1921-11-01  4.216718  Warren G. Harding         Republican
# 1921-12-01  6.110533  Warren G. Harding         Republican
# 1922-01-01  5.179318  Warren G. Harding         Republican
# 1922-02-01  4.808477  Warren G. Harding         Republican
# 1922-03-01  4.466641  Warren G. Harding         Republican
# 1922-04-01  4.307025  Warren G. Harding         Republican
# 1922-05-01  4.337476  Warren G. Harding         Republican
# 1922-06-01  4.396854  Warren G. Harding         Republican
# 1922-07-01  4.391316  Warren G. Harding         Republican
# 1922-08-01  4.748302  Warren G. Harding         Republican
# 1922-09-01  5.468115  Warren G. Harding         Republican
# 1922-10-01  4.295268  Warren G. Harding         Republican
# 1922-11-01  5.432448  Warren G. Harding         Republican