我有一个csv文件,其中包含如下所示的机场数据:
Name | State | Runway | data1 | data2 | data3 | Runway | data1 | data2 | data3 | etc
------------------------------------------------------------------------------------
Abu Dabi | UAE | 01 | 9292 | 2229 | 8282 | 02 | 9929 | 9922 | 2828 | etc
我如何将其更改为这样:
Name | State | Runway | data1 | data2 | data3 |
---------------------------------------------------
Abu Dabi | UAE | 01 | 9292 | 2229 | 8282 |
| 02 | 9929 | 9922 | 2828 |
| etc | etc | etc | etc |
谢谢。
答案 0 :(得分:2)
这是列名称为groupby
和concat的地址:
# if you start from your csv
# pandas will rename repeated columns
# e.g. you would have Runway, Runway.1,...
df = pd.read_csv('data.csv')
# fix repeated column names:
df.columns = [col.split('.')[0] for col in df.columns]
new_df = df.set_index(['Name','State'])
pd.concat(g for x,g in new_df.groupby((new_df.columns =='Runway').cumsum(),
axis=1))
输出:
Runway data1 data2 data3
Name State
Abu Dabi UAE 1 9292 2229 8282
UAE 2 9929 9922 2828
答案 1 :(得分:1)
这是使用unnest
x=df.set_index(['Name','State']).groupby(level=0,axis=1).agg(lambda x : x.tolist())
df=unnesting(x, x.columns.tolist(), axis=1)
df
Out[281]:
Runway data1 data2 data3
Name State
Abu Dabi UAE 1 9292 2229 8282
UAE 2 9929 9922 2828
def unnesting(df, explode, axis):
if axis==1:
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how='left')
答案 2 :(得分:0)
这是不使用pandas
和groupby
的另一种方法:
import csv
header = []
data = []
# reading input csv file
with open('input.csv') as csvfile:
rows = csv.reader(csvfile)
count = 1
for r in rows:
if count == 1:
header = r[:6]
count += 1
continue
data.append(r[:6])
data.extend([["",""] + r[i:i+4] for i in range(6,len(r),4)])
print(header)
for row in data:
print(row)
# writing to output csv file
with open('output.csv','w') as outfile:
csvwriter = csv.writer(outfile)
csvwriter.writerow(header)
for row in data:
csvwriter.writerow(row)