我试图遍历pandas数据框列,并根据下一行是否不包含“属性地址”,将下一行的信息添加到上一行。例如,如果我有一列从上到下的[“属性地址”,“替代地址”,“属性地址”],我想从“替代地址”中获取信息并将该信息添加到上面的列中它(“属性地址”)。我已经仔细检查了是否没有尾随或前导空格,并且所有内容都是小写,以便所有比较都能正常进行。但是,我仍然收到此错误:
import pandas as pd
import time
df = pd.read_excel('BRH.xls') # Reads the Excel File and creates a
dataframe
# Column Headers
df = df[['street', 'state', 'zip', 'Address Type', 'mStreet', 'mState', 'mZip']]
propertyAddress = "Property Address" # iterates thru column and replaces
the current row with info from next row down
for i in df['Address Type']:
if i == "Property Address" and df.loc[i+1, :] != "Property Address":
df['mStreet'] == df.loc[i + 1, 'street']
df['mState'] == df.loc[i + 1, 'state']
df['mZip'] = df.loc[i + 1, 'zip']
df.to_excel('BRHOut.xls')
print('operation complete in:', time.process_time(), 'ms')
有人对我能做些什么有想法,以使其可行吗?我是Python的新手,我真的迷路了。请让我知道是否还有其他信息可以使您更轻松地回答此问题。谢谢
到目前为止,这是我的代码:
{{1}}
答案 0 :(得分:0)
您可以使用pd.Series.shift
来构建适当的蒙版。
这是一些未经测试的伪代码:
m1 = df['AddressType'].shift() == 'Property Address'
m2 = df['AddressType'] != 'Property Address'
mask = m1 & m2
for col in ['Street', 'State', 'Zip']:
df.loc[mask, 'm'+col] = df.loc[mask, col.lower()].shift(-1)
答案 1 :(得分:0)
发生TypeError
是因为i
是一个字符串。调用df.loc[i+1, :]
时,您正在尝试执行类似"Property Address" + 1
的操作。解决该问题后,for循环的正文中仍将存在一些索引编制问题。
@jpp给出了一个非常简洁的答案,但是我相信它可以从预期的目标中提取信息并将其写入预期的源中。换句话说,“属性地址”和“备用地址”的作用相反。我相信这将带来正确的结果:
import pandas as pd
df = pd.DataFrame(data={
'street': [
'123 Main Street',
'1600 Pennsylvania Ave',
'567 Fake Ave',
'1 University Ave'
],
'state': ['CA', 'DC', 'DC', 'CA'],
'zip': ['95126', '20500', '20500', '94301'],
'Address Type': [
'Property Address',
'Alternate Address',
'Property Address',
'Alternate Address'
],
'mStreet': [None, None, None, None],
'mState': [None, None, None, None],
'mZip': [None, None, None, None],
},
columns=[
'street',
'state',
'zip',
'Address Type',
'mStreet',
'mState',
'mZip'
])
# Create a new dataframe with all address attributes shifted UP one row
next_address_attributes = df[['Address Type', 'street', 'state', 'zip']].shift(-1)
# Create a series to indicate whether information should be drawn from next row
# All the decision-making is right here
get_attributes_from_next_address = ((df['Address Type'] == 'Property Address')
& (next_address_attributes['Address Type'] != 'Property Address'))
for i, getting_attributes_is_necessary in get_attributes_from_next_address.iteritems():
if getting_attributes_is_necessary:
df.at[i, 'mStreet'] = next_address_attributes.at[i, 'street']
df.at[i, 'mState'] = next_address_attributes.at[i, 'state']
df.at[i, 'mZip'] = next_address_attributes.at[i, 'zip']
df.loc[get_attributes_from_next_address, 'mStreet'] = next_address_attributes.loc[get_attributes_from_next_address, 'street']
df.loc[get_attributes_from_next_address, 'mState'] = next_address_attributes.loc[get_attributes_from_next_address, 'state']
df.loc[get_attributes_from_next_address, 'mZip'] = next_address_attributes.loc[get_attributes_from_next_address, 'zip']