Question

我试图遍历pandas数据框列，并根据下一行是否不包含“属性地址”，将下一行的信息添加到上一行。例如，如果我有一列从上到下的[“属性地址”，“替代地址”，“属性地址”]，我想从“替代地址”中获取信息并将该信息添加到上面的列中它（“属性地址”）。我已经仔细检查了是否没有尾随或前导空格，并且所有内容都是小写，以便所有比较都能正常进行。但是，我仍然收到此错误：

import pandas as pd
import time

df = pd.read_excel('BRH.xls') # Reads the Excel File and creates a 
dataframe 
# Column Headers
df = df[['street', 'state', 'zip', 'Address Type', 'mStreet', 'mState', 'mZip']]

propertyAddress = "Property Address" # iterates thru column and replaces 
the current row with info from next row down

for i in df['Address Type']:
  if i == "Property Address" and df.loc[i+1, :] != "Property Address":
      df['mStreet'] == df.loc[i + 1, 'street']
      df['mState'] == df.loc[i + 1, 'state']
      df['mZip'] = df.loc[i + 1, 'zip']

df.to_excel('BRHOut.xls')
print('operation complete in:', time.process_time(), 'ms')

有人对我能做些什么有想法，以使其可行吗？我是Python的新手，我真的迷路了。请让我知道是否还有其他信息可以使您更轻松地回答此问题。谢谢

到目前为止，这是我的代码：

{{1}}

Answer 1

您可以使用pd.Series.shift来构建适当的蒙版。

这是一些未经测试的伪代码：

m1 = df['AddressType'].shift() == 'Property Address'
m2 = df['AddressType'] != 'Property Address'
mask = m1 & m2

for col in ['Street', 'State', 'Zip']:
    df.loc[mask, 'm'+col] = df.loc[mask, col.lower()].shift(-1)

Answer 2

发生TypeError是因为i是一个字符串。调用df.loc[i+1, :]时，您正在尝试执行类似"Property Address" + 1的操作。解决该问题后，for循环的正文中仍将存在一些索引编制问题。

@jpp给出了一个非常简洁的答案，但是我相信它可以从预期的目标中提取信息并将其写入预期的源中。换句话说，“属性地址”和“备用地址”的作用相反。我相信这将带来正确的结果：

设置

import pandas as pd

df = pd.DataFrame(data={
        'street': [
            '123 Main Street',
            '1600 Pennsylvania Ave',
            '567 Fake Ave',
            '1 University Ave'
        ],
        'state': ['CA', 'DC', 'DC', 'CA'],
        'zip': ['95126', '20500', '20500', '94301'],
        'Address Type': [
            'Property Address',
            'Alternate Address',
            'Property Address',
            'Alternate Address'
        ],
        'mStreet': [None, None, None, None],
        'mState': [None, None, None, None],
        'mZip': [None, None, None, None],
    },
    columns=[
        'street',
        'state',
        'zip',
        'Address Type',
        'mStreet',
        'mState',
        'mZip'
    ])

# Create a new dataframe with all address attributes shifted UP one row
next_address_attributes = df[['Address Type', 'street', 'state', 'zip']].shift(-1)

# Create a series to indicate whether information should be drawn from next row
# All the decision-making is right here
get_attributes_from_next_address = ((df['Address Type'] == 'Property Address')
    & (next_address_attributes['Address Type'] != 'Property Address'))

使用循环

for i, getting_attributes_is_necessary in get_attributes_from_next_address.iteritems():
    if getting_attributes_is_necessary:
        df.at[i, 'mStreet'] = next_address_attributes.at[i, 'street']
        df.at[i, 'mState'] = next_address_attributes.at[i, 'state']
        df.at[i, 'mZip'] = next_address_attributes.at[i, 'zip']

一无所有

df.loc[get_attributes_from_next_address, 'mStreet'] = next_address_attributes.loc[get_attributes_from_next_address, 'street']
df.loc[get_attributes_from_next_address, 'mState'] = next_address_attributes.loc[get_attributes_from_next_address, 'state']
df.loc[get_attributes_from_next_address, 'mZip'] = next_address_attributes.loc[get_attributes_from_next_address, 'zip']

如何遍历pandas列并用下一行向下的信息替换单元格

2 个答案:

设置

使用循环

一无所有