glob多个CSV和np.arange

时间:2019-06-22 07:20:02

标签: python pandas

我是python的初学者。循环glob.globnp.arrange循环存在一些问题。

我有一百个CSV文件,如下所示:

13oct_speed_1kmh.csv
13oct_speed_2kmh.csv
and others

所有文件的结构数据如下:

Distance ID
2.14     A
82.12    B
12.45    A
21.07    B
11.42    A

我要根据缓冲区消除距离:

np.arange(10,100,30)
array([10, 40, 70])

我使用了以下代码:

def buffer (value, threshold):
    return (value < threshold)
files = glob.glob("13oct_speed_*.csv") 
for f in files:
    df = pd.read_csv(f)
    for i in np.arange(10,100,30):
        threshold = i
        result_df = df[buffer(df["Distance"], threshold)]
        csvFileName = f + 'Buffer_' + str(threshold) + ".csv"
        result_df.to_csv(csvFileName, sep=",")

但是结果非常奇怪,因为循环永远不会停止(总是保存新文件)。

我的愿望输出是根据缓冲区阈值消除每个距离列文件。

我的预期输出如下:

13oct_speed_1kmh_buffer10.csv
13oct_speed_1kmh_buffer40.csv
13oct_speed_1kmh_buffer70.csv
13oct_speed_2kmh_buffer10.csv
13oct_speed_2kmh_buffer40.csv
13oct_speed_2kmh_buffer70.csv

如何解决?谢谢

1 个答案:

答案 0 :(得分:2)

您可以省略辅助函数,并用csvFileName更改format以获得预期的输出,带有扩展名的文件名由os.path.splitext返回:

import os

files = glob.glob("csv/13oct_speed_*.csv") 
for f in files:
    df = pd.read_csv(f)
    for threshold in np.arange(10,100,30):
        result_df = df[df["Distance"] < threshold]
        name, extension = os.path.splitext(f)
        csvFileName = "{}_Buffer{}{}".format(name, threshold, extension)
        print (csvFileName)
        result_df.to_csv(csvFileName)