如果整个列只是长度为0或NaN的字符串,我想检查数据框的列并更新值。
我知道如何访问每一行和每一列以及如何逐项遍历它们,但是我想执行的任何事情都应该被矢量化(或者至少是Pythonic)
import pandas as pd
import numpy as np
# Create a dataframe for example purposes, filled with data to be left alone
np.random.seed(0)
df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'], index=np.random.randint(1,100,10)).sort_index()
# Create an example column that I would modify when encountered in the wild
df['c'] = ''
df['d'] = ''
df.iloc[np.random.randint(low=0,high=(len(df)-1)), df.columns.get_loc('c')] = 'Avoid me'
'''
THIS IS WHERE THE FUN BEGINS :
'''
# If I were to use label-based referencing :
for index, row in df.iterrows():
if len(row['c']) == 0:
df.at[index,'c'] = 'Update Me'
# df.loc[index]['c'] = 'Update Me'
# or if I were to use position-based referencing :
for i in range(len(df)):
if len(df.loc[i,'c']) == 0:
df.loc[i,'c'] = 'Update Me'
这是我看来可以最接近我的目标的方法,但是我想确认一下这是一次检查/更新整个系列赛的最佳方法。
if (len(df['c'].unique()) == 1) and (df.['c'].unique()[0]==''):
df['c'] = 'Update Me'
这将填充指定列中的值,但是我希望避免在一个列中使用除空字符串之外的任何值(即,仅更改仅填充空白字符串的列)。感谢Erfan的输入。
df['c'] = np.where(df['c'].str.len().eq(0), 'Update Me', df['c'])
答案 0 :(得分:3)
如果您只想检查空字符串,我认为这种单行代码可以满足您的需求:
df.loc[:, (df == '').all()] = 'Update me'
如果您需要NaN,只需检查它们是否已填充:
df.loc[:, (df.fillna('') == '').all()] = 'Update me'
答案 1 :(得分:2)
如果要将任何空字符串设置为'Update me'
,则可以执行以下操作:
idx_empty_strings = df['SECTION'].str.len() == 0
df.loc[idx_empty_strings, 'SECTION'] = 'Update me'
如果您只想在整列为空字符串或 np.NaN
时执行此操作,则:
col = 'SECTION'
idx_empty_strings = df[col].str.len() == 0
idx_nan = df[col].isna()
if all(idx_empty_string | idx_nan):
df[col] = 'Update me'