我有一个带有两列的df,我想要将两列都合并,忽略NaN值。问题是,有时两列都有NaN值,在这种情况下我希望新列也有NaN。这是一个例子:
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df
Out[10]:
foodstuff type
0 apple-martini None
1 apple-pie None
2 None strawberry-tart
3 None dessert
4 None None
我尝试使用fillna
并解决此问题:
df['foodstuff'].fillna('') + df['type'].fillna('')
我得到了:
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4
dtype: object
第4行已成为空白值。在这种情况下我不想要的是NaN值,因为两个组合列都是NaN。
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
dtype: object
答案 0 :(得分:29)
答案 1 :(得分:2)
fillna
两列一起sum(1)
添加replace('', np.nan)
df.fillna('').sum(1).replace('', np.nan)
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 NaN
dtype: object
答案 2 :(得分:2)
您可以将combine
方法与lambda
:
df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)
如果a为(a or "")
,则 ""
返回None
,然后在连接上应用相同的逻辑(如果连接为空字符串,则结果为None
)
答案 3 :(得分:1)
您始终可以使用无
填充新列中的空字符串import numpy as np
df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
完整代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')
df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
df
输出:
foodstuff type new_col
0 apple-martini None apple-martini
1 apple-pie None apple-pie
2 None strawberry-tart strawberry-tart
3 None dessert dessert
4 None None NaN
答案 4 :(得分:0)
您可以将非零值替换为列名,例如
df1 = df.replace(1,pd.Series(df.columns,df.columns))
用空字符串替换0,然后合并如下所示的列
f = f.replace(0,'') f ['new'] = f.First + f.Second + f.Three + f.Four
请参阅下面的完整代码。
const binaryString = window.atob(response.documentContent);
const binaryLen = binaryString.length;
const bytes = new Uint8Array(binaryLen);
for (let i = 0; i < binaryLen; i += 1) {
const ascii = binaryString.charCodeAt(i);
bytes[i] = ascii;
}
const blob = new Blob([bytes], { type: response.mimeType });
const link = document.createElement('a');
link.href = window.URL.createObjectURL(blob);
link.download = 'myFileName.pdf';
link.click();
df1:
import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four
df2:
In [3]: df1
Out[3]:
Second First Three Four cl
0 0 First 0 0 3D
1 Second 0 0 0 Wireless
2 0 0 Three 0 Accounting
3 0 0 0 Four cisco
最终df为:
In [4]: df2
Out[4]:
pi
0 Accounting
1 cisco
2 3D
3 Wireless
答案 5 :(得分:0)
如果您处理的列包含其他列不包含的内容,反之亦然,则可以很好地完成工作的单行是
>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
foodstuff
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
...如果您有多个“复杂”的列,只要您可以定义 ~.rename
映射,该解决方案也可以很好地概括。此类重命名的目的是创建重复项,然后 ~.stack().unstack()
将为您处理。
如前所述,此解决方案仅适用于具有正交列的配置,即从不同时赋值的列。