我的数据:
df
Out[79]:
INC Theme Theme_Hat TRAIN_TEST
0 123 A NaN TRAIN
1 124 A NaN TRAIN
2 125 A NaN TRAIN
3 126 A NaN TRAIN
4 127 A NaN TRAIN
5 128 A NaN TRAIN
6 129 A NaN TRAIN
7 130 A NaN TRAIN
8 131 B NaN TRAIN
9 132 B B TEST
10 133 B A TEST
11 134 B A TEST
12 135 B A TEST
我正在尝试将Theme_Hat
列折叠到Theme
列,同时保留TRAIN_TEST
指标。我在下面使用了for
循环,但我的直觉告诉我必须有更多pandas
- esque解决方案。以下尝试未达到我想要的输出,因为TEST
在df
中不断重复,而不是保留的TRAIN
信息。这是我想要的输出:
Out[81]:
INC Theme TRAIN_TEST
0 123 A TRAIN
1 124 A TRAIN
2 125 A TRAIN
3 126 A TRAIN
4 127 A TRAIN
5 128 A TRAIN
6 129 A TRAIN
7 130 A TRAIN
8 131 B TRAIN
9 132 B TRAIN
10 132 B TEST
11 133 B TRAIN
12 133 A TEST
13 134 B TRAIN
14 134 A TEST
15 135 B TRAIN
16 135 A TEST
这是我到目前为止所做的:
# copy so we can reference the original dataframe as rows are inserted into df
df2 = df.copy(deep = True)
no_nulls = df2[df2['Theme_Hat'].notnull()]
# get rid of the Theme_Hat column for final dataframe (since we're migrating that info into Theme)
df.drop('Theme_Hat', inplace = True, axis = 1)
# I'm sure there's some pandas built-in functionality that
# can handle this better than a for loop
for idx in no_nulls.index:
# reference the unchanged df2 for INC, Theme_Hat, and TRAIN_TEST info
new_row = pd.DataFrame({"INC": df2.loc[idx, 'INC'],
"Theme": df2.loc[idx, 'Theme_Hat'],
"TRAIN_TEST": df2.loc[idx, 'TRAIN_TEST']}, index = [idx+1])
print(new_row, '\n\n')
# insert the new row right after the row at the current index
df = pd.concat([df.ix[:idx], new_row, df.ix[idx+1:]]).reset_index(drop = True)
答案 0 :(得分:2)
使用pd.lreshape
默认情况下自动删除NaNs
。然后,您可以将所考虑的两个列组合在一起,将它们的值组合在一个列中。最后,根据INC
列值对这些值进行排序。
pd.lreshape(df, {'Theme': ['Theme','Theme_Hat']}).sort_values('INC').reset_index(drop=True)
答案 1 :(得分:1)
sep=;
1;2
使用melt
的解决方案,按drop
删除列,dropna
删除print (df.set_index(['INC','TRAIN_TEST'])
.stack()
.reset_index(level=2, drop=True)
.reset_index(name='Theme'))
INC TRAIN_TEST Theme
0 123 TRAIN A
1 124 TRAIN A
2 125 TRAIN A
3 126 TRAIN A
4 127 TRAIN A
5 128 TRAIN A
6 129 TRAIN A
7 130 TRAIN A
8 131 TRAIN B
9 132 TEST B
10 132 TEST B
11 133 TEST B
12 133 TEST A
13 134 TEST B
14 134 TEST A
15 135 TEST B
16 135 TEST A
:
NaN