我正在尝试使用字符串方法基于其他三列的条件来计算新列。
样本数据:
d = pd.DataFrame({'street1': ['1000 foo dr', '1001 bar dr', '1002 foo dr suite101', '1003 bar dr'],
'street2': ['city_a', np.nan, 'suite 101', 'suite 102'],
'city': ['city_a', 'city_b', np.nan, 'city_c']})
street1 street2 city
1000 foo dr city_a city_a
1001 bar dr NaN city_b
1002 foo dr suite101 suite 101 NaN
1003 bar dr suite 102 city_c
理想输出:
Address
1000 foo dr
1001 bar dr
1002 foo dr suite 101
1003 bar dr suite 102
这里的想法是
street2
与city
相匹配,请忽略street2
与street1
的结尾匹配,请忽略street1
和street2
连接起来我尝试过的事情:
def address_clean(row):
if not row['street2']:
return row['street1']
if row['street2'] == row['city']:
return row['street1']
elif row['street1'].str.replace(' ', '').find(row['street2'].str.replace(' ', '')) != -1:
return row['street1']
else:
return row['street1'] + row['street2']
d.apply(lambda row: address_clean(row), axis=1).head()
这引发了我一个错误:
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 1')
似乎row[street1]
是string
而不是pd.Series
。但是,即使我从原始函数中删除了.str
部分,它也变成了:
def address_clean(row):
if not row['street2']:
return row['street1']
if row['street2'] == row['city']:
return row['street1']
elif row['street1'].replace(' ', '').find(row['street2'].replace(' ', '')) != -1:
return row['street1']
else:
return row['street1'] + row['street2']
d.apply(lambda row: address_clean(row), axis=1).head()
代码向我抛出以下错误:
AttributeError: ("'float' object has no attribute 'replace'", 'occurred at index 1')
我想知道函数的哪一部分使用不正确,以及如何解决此错误。
答案 0 :(得分:1)
在系列中搜索模式很容易,但是我不得不使用apply
来查找一列是否以另一列的内容结尾。顺便说一句,我不得不略微更改您的数据,因为'...suite101'
不会以'suite 101'
结尾,除非要忽略空格。所以我用:
d = pd.DataFrame({'street1': ['1000 foo dr', '1001 bar dr', '1002 foo dr suite 101', '1003 bar dr'],
'street2': ['city_a', np.nan, 'suite 101', 'suite 102'],
'city': ['city_a', 'city_b', np.nan, 'city_c']})
print(pd.DataFrame({'Address': np.where(d.street2.str.contains('city', na=True)
| d.apply(lambda x: x.street1.endswith(str(x.street2)), axis = 1),
d.street1,
d.street1.str.cat(d.street2, sep=' '))}))
给出预期的结果:
Address
0 1000 foo dr
1 1001 bar dr
2 1002 foo dr suite 101
3 1003 bar dr suite 102