我看到的数据框看起来像下面的示例
col_a col_b col_c col_d extra1 extra2 extra3
a a a a b c d
a a a a b c d
a a a b c d Nan
a a a b c d Nan
a a b c d Nan Nan
a a b c d Nan Nan
a b c d Nan Nan Nan
a b c d Nan Nan Nan
我必须将其转换为如下形式:
col_a col_b col_c col_d
a a a a b c d
a a a a b c d
a a a b c d
a a a b c d
a a b c d
a a b c d
a b c d
a b c d
因此,根据NaN的位置(额外1 2或3),我将总是不得不在将Nan列限制在列之前将最后3个cols移位,并将先前的列连接到col_a中。
答案 0 :(得分:3)
使用:
#if necessary convert string `Nan` to missing values
df = df.replace('Nan', np.nan)
df = df.apply(lambda x: x.shift(x.isnull().sum()), axis=1)
print (df)
col_a col_b col_c col_d extra1 extra2 extra3
0 a a a a b c d
1 a a a a b c d
2 NaN a a a b c d
3 NaN a a a b c d
4 NaN NaN a a b c d
5 NaN NaN a a b c d
6 NaN NaN NaN a b c d
7 NaN NaN NaN a b c d
df1 = df.iloc[:, -3:]
df1.insert(0, 'a', df.iloc[:, :-3].add(' ').fillna('').sum(axis=1))
df1.columns = df.columns[:4]
print (df1)
col_a col_b col_c col_d
0 a a a a b c d
1 a a a a b c d
2 a a a b c d
3 a a a b c d
4 a a b c d
5 a a b c d
6 a b c d
7 a b c d
答案 1 :(得分:2)
您可以使用itertools groupby,这对于带有分组的任务很常见。但是,这将使用可能影响效果的循环(理解)。
df = pd.DataFrame(
data = [[' '.join(g) for k,g in groupby(row) if k] for row in df.fillna('').values],
columns = df.columns[:4]
)
完整示例:
import pandas as pd
from itertools import groupby
data = '''\
col_a col_b col_c col_d extra1 extra2 extra3
a a a a b c d
a a a a b c d
a a a b c d Nan
a a a b c d Nan
a a b c d Nan Nan
a a b c d Nan Nan
a b c d Nan Nan Nan
a b c d Nan Nan Nan'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+', na_values=['Nan'])
df = pd.DataFrame(
data = [[' '.join(g) for k,g in groupby(row) if k] for row in df.fillna('').values],
columns = df.columns[:4]
)
print(df)
返回:
col_a col_b col_c col_d
0 a a a a b c d
1 a a a a b c d
2 a a a b c d
3 a a a b c d
4 a a b c d
5 a a b c d
6 a b c d
7 a b c d
答案 2 :(得分:0)
您需要:
temp = df[['col_a','col_b','col_c','col_d']].eq("a").sum(axis=1)
print(temp)
v = []
for i in temp:
a_col = "a"*i
v.append(a_col)
df['col_a'] = v
df['col_b'] = 'b'
df['col_c'] = 'c'
df['col_d'] = 'd'
df.drop(['ex_1','ex_2','ex_3'],1,inplace=True)
print(df)
输出:
col_a col_b col_c col_d
0 aaaa b c d
1 aaaa b c d
2 aaa b c d
3 aaa b c d
4 aa b c d
5 aa b c d
6 a b c d
7 a b c d