Question

我有900个文件，所有文件都放在一个文件夹中。文件名类似于"0_dcef_abcd_cdef"，"1_dcef_cdef_abcd"，并且文件内部的列如下所示：文件1：

col1   col2       
1      2    
3      4

文件2：

col1    col2

5       6

7       8

我想创建一个新的csv文件，在该文件中，标题将从以前的文件中删除，数据被转置，并且在新的csv文件中，列将如下所示：

col1 col2 col3 col4 col5 col6 

0    dcef abcd cdef 1,3  2,4 

1    dcef cdef abcd 5,7  6,8

我尝试过这样：

import os

path = 'c:\\path'
for root,dirs,files in os.walk(path):
    for file in files:
        print (file)
        if file.endswith(".csv"):
            data = pd.read_csv(file,delimiter=',', encoding='latin-1') 

            st = file[0]
            st1 = file[2:6]
            st2 = file[7:11]
            st3 = file[12:16]
            print (st,st1,st2,st3)

            #  perform calculation
            with open('c:\\path\filename.csv', 'a', newline='') as csvfile:    # saving into the csv file
                saes = csv.writer(csvfile)
                saes.writerow(['col1']+["col2"]+["col3"]+["col4"]+ ['col5']+["col6"])
                saes.writerow([st]+ [st1]+[st2]+[st3]+ +data["col1"]+data["col2"])

，但是它不起作用。我不知道如何转置列。或将其他列更改为十六进制到十进制，然后将其保存到新的csv中。

有人可以帮我做这段代码吗？

Answer 1

如果我理解正确，我认为这种方法可能会有所帮助：

import pandas as pd
import glob

csv_files = glob.glob('*.csv') # get a list of all csv files in the current folder
df = pd.DataFrame(columns=['col1','col2','col3','col4','col5','col6']) 
counter = 0

for csv_file in csv_files:
    df_file = pd.read_csv(csv_file,delimiter=',',encoding='latin-1')
    file_name_parts = csv_file.split('.')[0]
    file_name_parts = file_name_parts.split('_')  
    columns_list = []
    for column in df_file.columns:  # transform all columns into a list of comma separated strings 
        columns_list.append(df_file.loc[:,column].to_csv(header=None, index=False).strip('\n').replace('\n',','))    
    df.loc[counter] = file_name_parts + columns_list  # add the new row to the dataframe
    counter += 1
df.to_csv(r'C:\path\filename.csv',index=False)

如何删除标题，转置数据并将100个文件合并为新的csv文件？

1 个答案: