如何根据标题名称在csv中移动数据并将其移动到同一列中?

时间:2019-07-08 03:06:54

标签: python pandas

我有一个csv文件,其中包含如下所示的机场数据:

Name     | State | Runway | data1 | data2 | data3 | Runway | data1 | data2 | data3 | etc
------------------------------------------------------------------------------------
Abu Dabi | UAE   | 01     | 9292  | 2229  | 8282  | 02     | 9929  | 9922  | 2828  | etc

我如何将其更改为这样:

Name     | State | Runway | data1 | data2 | data3 |
---------------------------------------------------
Abu Dabi | UAE   | 01     | 9292  | 2229  | 8282  |
                 | 02     | 9929  | 9922  | 2828  |
                 | etc    | etc   | etc   | etc   |

谢谢。

3 个答案:

答案 0 :(得分:2)

这是列名称为groupby和concat的地址:

# if you start from your csv
# pandas will rename repeated columns
# e.g. you would have Runway, Runway.1,...
df = pd.read_csv('data.csv')

# fix repeated column names:
df.columns = [col.split('.')[0] for col in df.columns]

new_df = df.set_index(['Name','State'])
pd.concat(g for x,g in new_df.groupby((new_df.columns =='Runway').cumsum(),
                                      axis=1))

输出:

                Runway  data1  data2  data3
Name     State                             
Abu Dabi UAE         1   9292   2229   8282
         UAE         2   9929   9922   2828

答案 1 :(得分:1)

这是使用unnest

的一种方法
x=df.set_index(['Name','State']).groupby(level=0,axis=1).agg(lambda x : x.tolist())
df=unnesting(x, x.columns.tolist(), axis=1)
df
Out[281]: 
                   Runway  data1  data2  data3
Name      State                               
Abu Dabi   UAE          1   9292   2229   8282
           UAE          2   9929   9922   2828

def unnesting(df, explode, axis):
        if axis==1:
            idx = df.index.repeat(df[explode[0]].str.len())
            df1 = pd.concat([
                pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
            df1.index = idx
            return df1.join(df.drop(explode, 1), how='left')
        else :
            df1 = pd.concat([
                             pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
            return df1.join(df.drop(explode, 1), how='left')

答案 2 :(得分:0)

这是不使用pandasgroupby的另一种方法:

import csv
header = []
data = []
# reading input csv file
with open('input.csv') as csvfile:
    rows = csv.reader(csvfile)
    count = 1
    for r in rows:
        if count == 1:
            header = r[:6]
            count += 1
            continue
        data.append(r[:6])
        data.extend([["",""] + r[i:i+4] for i in range(6,len(r),4)])
    print(header)
    for row in data:
        print(row)

# writing to output csv file
with open('output.csv','w') as outfile:
    csvwriter = csv.writer(outfile)
    csvwriter.writerow(header)
    for row in data:
        csvwriter.writerow(row)