Python + Pandas:复制多个CSV的第一行的特定列,并将这些行存储到单个csv中

时间:2019-03-28 13:08:59

标签: python pandas dataframe

我有大约190个CSV。每个都有相同的列名。下面共享了一个示例csv:

enter image description here

每个 csv中,我仅需 选择ItemPredicted_BelRd(D2)Predicted_Ulsoor(D2)Predicted_ChrchStrt(D2)Predicted_BlrClub(D2), 仅{strong>第一行中的Predicted_Indrangr(D1)Predicted_Krmngl(D1)Predicted_KrmnglBkry(D1)Predicted_HSR(D1) ,并且需要存储所有这些行到单独的CSV。因此,最终的CSV应为190行。

该怎么做?

编辑: 到目前为止的代码,由DavidDR建议:

path = '/home/hp/products1'
all_files = glob.glob(path + "/*.csv")
#print(all_files)

columns = ['Item', 'Predicted_BelRd(D2)', 'Predicted_Ulsoor(D2)', 'Predicted_ChrchStrt(D2)', 'Predicted_BlrClub(D2)', 'Predicted_Indrangr(D1)', 'Predicted_Krmngl(D1)', 'Predicted_KrmnglBkry(D1)', 'Predicted_HSR(D1)']

rows_list = []
for filename in all_files:
    origin_data = pd.read_csv(filename)
    my_data = origin_data[columns]
    rows_list.append(my_data.head(1))

output = pd.DataFrame(rows_list)
#output.to_csv(file_name, sep='\t', encoding='utf-8')
output.to_csv('smallys_final.csv', encoding='utf-8', index=False) 

Edit2: 原始数据框:

prod = pd.read_csv('/home/hp/products1/' + 'prod[' + str(0) + '].csv', engine='python')
print(prod)

输出:

      Category                         Item  UOM  BelRd(D2)  Ulsoor(D2)  \
0  Food/Bakery  BAKING POWDER SPARSH (1KGS)  PKT          0           0   
1  Food/Bakery  BAKING POWDER SPARSH (1KGS)  PKT          0           0   
2  Food/Bakery  BAKING POWDER SPARSH (1KGS)  PKT          0           0   
3  Food/Bakery  BAKING POWDER SPARSH (1KGS)  PKT          0           0   
4  Food/Bakery  BAKING POWDER SPARSH (1KGS)  PKT          0           0   

   ChrchStrt(D2)  BlrClub(D2)  Indrangr(D1)  Krmngl(D1)  KrmnglBkry(D1)  \
0              0            0             0           0               1   
1              0            0             0           0               0   
2              0            0             0           0               0   
3              0            0             0           0               0   
4              0            0             0           0               1   

   HSR(D1)         date  Predicted_BelRd(D2)  Predicted_Ulsoor(D2)  \
0        0    10 FEB 19                  0.0                   0.0   
1        0    17 FEB 19                  NaN                   NaN   
2        0    24 FEB 19                  NaN                   NaN   
3        0   4 MARCH 19                  NaN                   NaN   
4        0  11 MARCH 19                  NaN                   NaN   

   Predicted_ChrchStrt(D2)  Predicted_BlrClub(D2)  Predicted_Indrangr(D1)  \
0                      0.0                    0.0                     0.0   
1                      NaN                    NaN                     NaN   
2                      NaN                    NaN                     NaN   
3                      NaN                    NaN                     NaN   
4                      NaN                    NaN                     NaN   

   Predicted_Krmngl(D1)  Predicted_KrmnglBkry(D1)  Predicted_HSR(D1)  
0                   0.0                       0.0                0.0  
1                   NaN                       NaN                NaN  
2                   NaN                       NaN                NaN  
3                   NaN                       NaN                NaN  
4                   NaN                       NaN                NaN  

    3        0   4 MARCH 19  
    4        0  11 MARCH 19  

2 个答案:

答案 0 :(得分:1)

您在这里:

def function():
    firstrows = []  # to collect 190 dataframes, each only 1 row
    for filename in csvnames:
        # read CSV, filter for a subset of columns, take only first row
        df = pd.read_csv(filename) \
             .filter(["Item", "Predicted_BelRd(D2)", ...]) \
             .iloc[:1]
        firstrows.append(df)
    return pd.concat(firstrows)

答案 1 :(得分:1)

没有检查,但是应该可以。

基本上,您从同一位置读取所有csv文件,然后仅选择相关列。然后,弹出第一行并将其附加到所有第一行的列表中。最后,从第一行的列表中创建一个新的DataFrame,然后将其保存到一个csv文件中。

import glob
import pandas as pd

path = # use your path
all_files = glob.glob(path + "/*.csv")

columns = ['Item', 'Predicted_BelRd(D2)', 'Predicted_Ulsoor(D2)', 'Predicted_ChrchStrt(D2)', 'Predicted_BlrClub(D2)', 'Predicted_Indrangr(D1)', 'Predicted_Krmngl(D1)', 'Predicted_KrmnglBkry(D1)', 'Predicted_HSR(D1)']
rows_list = []
for filename in all_files:
    origin_data = pd.read_csv(filename)
    my_data = origin_data[columns]
    rows_list.append(my_data.head(1))

output = pd.DataFrame(rows_list)
output.to_csv(file_name, sep='\t', encoding='utf-8')