自动化删除相似外观的列,然后在python中转置数据的过程

时间:2019-02-27 10:58:03

标签: python pandas list dataframe

我有一个包含多列的代码,我想删除某些列,然后转置其余数据。之前我曾经手动进行过操作,但是我有一个很大的数据集,因此无法手动进行操作。这是我要删除的数据和这些类型的列(将突出显示它们):

enter image description here

在删除突出显示的列之后,我想要对rule_id(我已将rule_id转换为index)和'comp'列进行转置,然后将该数据转换为数据框。我可以自动执行此过程吗?如果是,那么如何。这是我正在使用的代码:

    dfs = []
for tx in dframe2['tx_id']:
    df = pd.read_csv('%s.csv' % tx)
    df1 = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)
    #dfs.append(df1)
    m = df1.eq(df1.shift(-1, axis=1))

    arr1 = np.select([df1 ==0, m], [np.nan, df1], df1*100)

    dft4 = pd.DataFrame(arr1, index=df1.index).rename(columns=lambda x: 'comp{}'.format(x+1))

    dft5 = df1.join(dft4)
    #dfs.append(dft5)
    cols = [c for c in dft5.columns if '-' in c]
    df8 = dft5.drop(cols, axis=1)
    df9 = df8.transpose()
    #print(df9)
    dfs.append(df9)

最终结果应如下所示:

enter image description here

这是对Automate the process of comparing the values of 2 csv files if value matches read the second csv into the DataFrame的后续问题

应用您的代码@Frenchy之后,我得到以下结果:

enter image description here

但是我希望所有rule_id都位于顶部的一行中,然后是如下所示的comp值:

enter image description here

1 个答案:

答案 0 :(得分:1)

解决方案示例:

df = pd.DataFrame({'rule_id': [50014, 50238, 53139],
              'comp1': [100, np.nan, 100],
               '0f1410-0440-0123':[0,1,2],
              'comp2': [np.nan,np.nan, np.nan],
              'd10-0440-0123':[0,1,2],
              'comp3': [np.nan,100, np.nan]})

print(df)

#delete columns which contains '-' in name
cols = [c for c in df.columns if '-' in c]
df.drop(cols, axis=1, inplace=True)

df.set_index('rule_id', inplace=True)
df = df.transpose()
print(df)

初始DF:

  rule_id  comp1  0f1410-0440-0123  comp2  d10-0440-0123  comp3
0    50014  100.0                 0    NaN              0    NaN
1    50238    NaN                 1    NaN              1  100.0
2    53139  100.0                 2    NaN              2    NaN

最终DF:

rule_id  50014  50238  53139
comp1    100.0    NaN  100.0
comp2      NaN    NaN    NaN
comp3      NaN  100.0    NaN

希望有帮助!

对于第二个问题,请将所有DF分组

df1 = pd.DataFrame({'rule_id': [50014, 50238, 53139],
          'comp1': [100, 100, 100],
          'comp2': [100,100, 100],
          'comp3': [1.0,1.0, 1.0]})

df2 = pd.DataFrame({'rule_id': [50028, 50258, 53339],
          'comp1': [1.0, 1.0, 100],
          'comp2': [100,np.nan, 100]})

df3 = pd.DataFrame({'rule_id': [50030, 50259, 53340, 53342],
          'comp1': [1.0, 1.0, 100, 200],
          'comp2': [100,100, 100, 200],
          'comp3': [100,100, 100, 200],
          'comp4': [1.0,np.nan, 1.0, np.nan]})

df1.set_index('rule_id', inplace=True)
df1 = df1.transpose()
df2.set_index('rule_id', inplace=True)
df2 = df2.transpose()
df3.set_index('rule_id', inplace=True)
df3 = df3.transpose()

listofdftransposed = [df1,df2,df3] #-> list of df as my result of extend job
df_result = pd.concat(listofdftransposed)
print(df_result)

输出:

rule_id  50014  50028  50030  50238  50258  50259  53139  53339  53340  53342
comp1    100.0    NaN    NaN  100.0    NaN    NaN  100.0    NaN    NaN    NaN
comp2    100.0    NaN    NaN  100.0    NaN    NaN  100.0    NaN    NaN    NaN
comp3      1.0    NaN    NaN    1.0    NaN    NaN    1.0    NaN    NaN    NaN
comp1      NaN    1.0    NaN    NaN    1.0    NaN    NaN  100.0    NaN    NaN
comp2      NaN  100.0    NaN    NaN    NaN    NaN    NaN  100.0    NaN    NaN
comp1      NaN    NaN    1.0    NaN    NaN    1.0    NaN    NaN  100.0  200.0
comp2      NaN    NaN  100.0    NaN    NaN  100.0    NaN    NaN  100.0  200.0
comp3      NaN    NaN  100.0    NaN    NaN  100.0    NaN    NaN  100.0  200.0
comp4      NaN    NaN    1.0    NaN    NaN    NaN    NaN    NaN    1.0    NaN