我有一个城市名称列表和一个带有城市,州和邮政编码列的df。缺少一些邮政编码。如果缺少邮政编码,我想使用基于城市的通用邮政编码。例如,城市为圣何塞,因此邮政编码应为通用的“ SJ_zipcode”。
pattern_city = '|'.join(cities) #works
foundit = ( (df['cty_nm'].str.contains(pattern_city, flags=re.IGNORECASE)) & (df['zip_cd']==0) & (df['st_cd'].str.match('CA') ) ) #works--is this foundit a df?
df['zip_cd'] = foundit.replace( 'SJ_zipcode' ) #nope, error
错误:“ pad_1d [bool]的dtype无效”
以where
df['zip_cd'].where( (df['cty_nm'].str.contains(pattern_city, flags=re.IGNORECASE)) & (df['zip_cd']==0) & (df['st_cd'].str.match('CA') ), "SJ_Zipcode", inplace = True) #nope, empty set; all set to nan?
以loc
df['zip_cd'].loc[ (df['cty_nm'].str.contains(pattern_city, flags=re.IGNORECASE)) & (df['zip_cd']==0) & (df['st_cd'].str.match('CA') ) ] = "SJ_Zipcode"
一些可能不可行的解决方案
df.loc[df['First Season'] > 1990, 'First Season'] = 1
,我曾经使用过df.loc[foundit, 'zip_cd'] = 'SJ_zipcode'
Pandas DataFrame: replace all values in a column, based on condition,与Conditional Replace Pandas类似/相同df['c'] = df.apply( lambda row: row['a']*row['b'] if np.isnan(row['c']) else row['c'], axis=1)
,但是,我不乘数值https://datascience.stackexchange.com/questions/17769/how-to-fill-missing-value-based-on-other-columns-in-pandas-dataframe where
解决方案,但是,它似乎用nan代替了不满足条件的值,但是nan值并没有帮助https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html replace
的示例,其中没有多个条件和模式Replacing few values in a pandas dataframe column with another value 另一个“想要”;我想用值更新数据框,我不想创建新的数据框。
答案 0 :(得分:0)
尝试一下:
df = pd.DataFrame(data)
df
city state zip
0 Burbank California 44325
1 Anaheim California nan
2 El Cerrito California 57643
3 Los Angeles California 56734
4 san Fancisco California 32819
def generate_placeholder_zip(row):
if pd.isnull(row['zip'] ):
row['zip'] =row['city']+'_ZIPCODE'
return row
df.apply(generate_placeholder_zip, axis =1)
city state zip
0 Burbank California 44325
1 Anaheim California Anaheim_ZIPCODE
2 El Cerrito California 57643
3 Los Angeles California 56734
4 san Fancisco California 32819