使用Django Pandas和Pyplot分析CSV文件

时间:2014-12-11 20:12:57

标签: python django csv matplotlib pandas

所以我目前正在开发一个Django网络应用程序,允许用户上传CSV文件,分析这些文件,然后将图形呈现给客户端。将输入的CSV是从Matlab生成的,并且都描述了相同的一般数据类型,但每个文件的格式根据用户从Matlab导出数据的方式而有所不同。我的问题是文件的格式没有标准,所以我必须在上传时动态检查CSV文件,然后相应地生成正确的图形。我认为这是一个例子中最好的证明。

速度和力的CSV数据示例

Shock Name, 
Shock ID, 
Vehicle, 
Location, 
Compression Valving, 
Rebound Valving, 
Piston Valving, 
Other Valving, 
Compression Setting, 
Rebound Setting, 
Preload Setting, 
Notes, 
, 
Measured_Stroke, 2.00 in
Seal_Drag, 7.77 lbs
Test_Temperature, 73.63 F
Peak_Velocity, 12.47 in/sec
, 
Amplitude, 1.00 in
Test_Period, 0.00 sec
Gas_Force, 34.78 lbs
Test_Speed, 12.21 in/sec


Velocity, CO, RC, CC, RO, CA, RA
in/sec, lbs, lbs, lbs, lbs, lbs, lbs
0, -139.3172, -138.4583, 33.49831, 34.24039, -52.90947, -52.10897
1, 2.637415, -353.36, 119.1066, -98.40744, 60.87201, -225.8837
2, 92.96767, -423.1163, 136.1344, -293.0744, 114.551, -358.0953
3, 117.664, -445.5688, 144.661, -417.9908, 131.1625, -431.7798
4, 126.363, -460.8381, 151.5483, -456.5551, 138.9557, -458.6966
5, 133.3087, -474.8662, 158.4935, -473.8318, 145.9011, -474.349
6, 139.7847, -487.5624, 163.9969, -486.3072, 151.8908, -486.9348
7, 146.0275, -500.0915, 168.9006, -497.6936, 157.464, -498.8926
8, 152.5096, -512.0554, 174.573, -508.9675, 163.5413, -510.5115
9, 160.0202, -524.4933, 178.737, -519.4616, 169.3786, -521.9774
10, 166.6279, -534.5439, 182.7012, -529.475, 174.6645, -532.0095
11, 174.6142, -545.5678, 186.8209, -541.7671, 180.7175, -543.6675
12, 183.1358, -556.0939, 188.4442, -553.749, 185.79, -554.9215

Velocity框之前的所有内容都只是一个大的设置标题,根据用户在Matlab中的设置,这些标题可能因文件而异。速度应该是索引列,因为每一行都是速度步骤。 Velocity之后的每一列都标有首字母缩略词(例如CO,RC,CC等),所有这些都需要根据速度时间步长绘制。

我的尝试实施如下:

# graph input file
def graph(request):
    # graph style
    pd.set_option('display.mpl_style', 'default')
    plt.rcParams['figure.figsize'] = (15,5)

    new_file = request.session.get('docFile')
    fig = Figure()
    ax = fig.add_subplot(111)
    ax.set_xlabel("Time")
    ax.set_ylabel("Velocity")
    data_df = pd.read_csv(new_file, header=28)
    data_df = pd.DataFrame(data_df)
    data_df.plot(ax=ax, title="Roehrig Shock Data", style="-o")
    canvas = FigureCanvas(fig)
    response = HttpResponse( content_type = 'image/png')
    canvas.print_png(response)
    return response

这正确地显示了一个图表,但我将header=28硬编码为Velocity所依赖的行。

我的问题是:

  1. 有没有办法动态扫描Velocity的CSV然后在那里开始标题?
  2. 如何将每个线图标记为相应列缩写的名称?

1 个答案:

答案 0 :(得分:2)

您可以尝试使用常规open语句浏览整个文件,并在使用panda之前动态解析标题。

例如:

import re
import panda as pd

raw_data = open('your_file.csv', 'rb').read()
rows  = re.split('\n', raw_data)

for idx, row in enumerate(rows):
    cells = row.split(',')
    if 'Velocity' in cells:
        header_names = cells    # this will be something like ['Velocity', ' CO', ' RC', ...]
        header_row   = idx
        break

# Now you have the header line as well as the custom header names.
# You can start using pandas.read_csv
pd.read_csv('your_file.csv', header=header_row)

# ...

# and use `header_names` for your plots.