将熊猫数据框上的字符串分组

时间:2020-05-22 12:28:56

标签: python string pandas group-by

我具有以下数据帧,其中包含来自气象站的信息:

      import pandas as pd
      import numpy as np

      df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                                  '2089', '2089', '8974'], 
                         'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                             'speedometer', 'Pluviometer', 'speedometer', 
                                             'Pluviometer']})

我想对每个气象站的仪器进行分组。

我尝试如下使用groupby以及sum()函数:

      df_New = df.groupby('Code Weather Station', as_index=False)['Instrumentation'].sum()

结果符合预期。但是,我希望这些乐器之间有空隙。

      print(df_New)

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analogspeedometerincidence-sun
            2089             speedometerPluviometerspeedometer
            8974             Pluviometer

我希望输出为:

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analog speedometer incidence-sun
            2089             speedometer Pluviometer speedometer
            8974             Pluviometer

谢谢。

2 个答案:

答案 0 :(得分:1)

哦!像这样reset_index()

df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x)).reset_index()

答案 1 :(得分:0)

您应避免使用apply,因为它效率低下。您可以尝试以下操作:-

import pandas as pd
import numpy as np

df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                          '2089', '2089', '8974'], 
                 'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                     'speedometer', 'Pluviometer', 'speedometer', 
                                     'Pluviometer']})

def process(x):
    return " ".join(x)

df_new = df.groupby('Code Weather Station').agg({
        'Instrumentation': [('Instrumentation', process)]
    })
df_new.columns = df_new.columns.droplevel()
df_new