在Python中构建.csv

时间:2014-09-16 15:24:45

标签: python csv

我想知道如何构建具有适当结构的.csv文件。例如,我的数据格式为:

  

(指标,纬度,经度,价值)

- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389

我希望能够将这些数据保存在.csv文件中,格式为:

         | longitude | 
latitude |   value   |   

也就是说,具有相同纬度的所有值都在同一行中,所有具有相同经度的值将在同一列中。我知道如何用Python编写.csv文件,我想知道如何正确地执行这种转换。

提前谢谢。

谢谢。

4 个答案:

答案 0 :(得分:1)

我假设您在文本文件中包含此数据。让我们使用正则表达式来解析数据(尽管如果格式保持不变,字符串拆分看起来可能会有效)。

import re

data = list()

with open('path/to/data/file','r') as infile:
    for line in infile:
        matches = re.match(r".*(?<=lat=)(?P<lat>(?:\+|-)?[\d.]+).*(?<=value=)(?P<longvalue>(?:\+|-)?[\d.]+)", line)
        data.append((matches.group('lat'), matches.group('longvalue'))

展开讨厌的正则表达式:

pat = re.compile(r"""
  .*                         Match anything any number of times
  (?<=lat=)                  assert that the last 4 characters are "lat="
  (?P<lat>                   begin named capturing group "lat"
      (?:\+|-)?                allow one or none of either + or -
      [\d.]+                   and one or more digits or decimal points
  )                          end named capturing group "lat"
  .*                         Another wildcard
  (?<=value=)                assert that the last 6 characters are "value="
  (?P<longvalue>             begin named capturing group "longvalue"
      (?:\+|-)?                allow one or none of either + or -
      [\d.]+                   and one or more digits or decimal points
  )                          end named capturing group "longvalue"
""", re.X)

# and a terser way of writing the code, since we've compiled the pattern above:

with open('path/to/data/file', 'r') as infile:
    data = [(matches.group('lat'), matches.group('longvalue')) for line in infile for
            matches in (re.match(pat, line),)]

答案 1 :(得分:1)

我为你写了一个小程序:)见下文。

我现在假设您的数据存储为dicts列表,但如果它是列表列表,则代码不应该太难修复。

#!/usr/bin/env python

import csv

data = [
    dict(lat=1, lon=1, val=10),
    dict(lat=1, lon=2, val=20),
    dict(lat=2, lon=1, val=30),
    dict(lat=2, lon=2, val=40),
    dict(lat=3, lon=1, val=50),
    dict(lat=3, lon=2, val=60),
]

# get a unique list of all longitudes
headers = list({d['lon'] for d in data})
headers.sort()

# make a dict of latitudes
data_as_dict = {}
for item in data:
    # default value: a list of empty strings
    lst = data_as_dict.setdefault(item['lat'], ['']*len(headers))
    # get the longitute for this item
    lon = item['lon']
    # where in the line should it be?
    idx = headers.index(lon)
    # save value in the list
    lst[idx]=item['val']


# in the actual file, we start with an extra header for the latitude
headers.insert(0,'latitude')

with open('latitude.csv', 'w') as csvfile:
    writer = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    writer.writerow(headers)
    lats = data_as_dict.keys()
    lats.sort()
    for latitude in lats:
        # a line starts with the latitude, followed by list of values
        l = data_as_dict[latitude]
        l.insert(0, latitude)
        writer.writerow(l)

输出:

latitude 1 2
1 10 20
2 30 40
3 50 60

当然,这不是最漂亮的代码,但我希望你能得到这个想法

答案 2 :(得分:1)

根据您的输入数据,我想出了以下内容:

from __future__ import print_function


def decode(line):
    line = line.replace('- ', ' ')
    fields = line.split()
    index = fields[0]
    data = dict([_.split('=') for _ in fields[1:]])
    return index, data


def transform(filename):
    transformed = {}
    columns = set()
    for line in open(filename):
        index, data = decode(line.strip())
        element = transformed.setdefault(data['lat'], {})
        element[data['lon']] = data['value']
        columns.add(data['lon'])
    return columns, transformed


def main(filename):
    columns, transformed = transform(filename)
    columns = sorted(columns)
    print(',', ','.join(columns))
    for lat, data in transformed.items():
        print(lat, ',', ', '.join([data.get(_, 'NULL') for _ in columns]))

if __name__ == '__main__':
    main('so.txt')

以防万一,数据只包含一个纬度,我在示例中添加了一行,所以我的输入数据(so.txt)包含:

- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389
- 22 - lat=-52.490000 lon=268.324000 value=2948.090389

(注意最后一行)

使用该输入文件,上述程序将创建以下输出:

, 264.313000,264.504000,264.695000,264.886000,265.077000,265.268000,265.459000,265.650000,265.841000,266.032000,266.223000,266.414000,266.605000,266.796000,266.987000,267.178000,267.369000,267.560000,267.751000,267.942000,268.133000,268.324000
-51.490000 , 7.270077, 7.231014, 21.199764, 49.176327, 91.160702, 147.152889, 217.152889, 301.160702, 399.176327, 511.199764, 637.231014, 777.270077, 931.316952, 1099.371639, 1281.434139, 1477.504452, 1687.582577, 1911.668514, 2149.762264, 2401.863827, 2667.973202, 2948.090389
-52.490000 , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 2948.090389

答案 3 :(得分:1)

你可以使用正则表达式从每一行拉出lat / lon /值。您以后想要查找lat和lon,因此请使用d[lat][lon]=value形式的嵌套dict来跟踪它。添加一组以跟踪您看到的独特经度,并且非常直接地生成csv。

我在示例中对其进行了排序,但您可能并不关心它。

import re
import collections

data = """- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702"""

regex = re.compile(r'- \d+ - lat=([\+\-]?[\d\.]+) lon=([\+\-]?[\d\.]+) value=([\+\-]?[\d\.]+)')

# lat/lon index will hold lats[latitude][longitude] = value
lats = collections.defaultdict(dict)
# longitude columns
lonset = set()

for line in data.split('\n'):
    match = regex.match(line)
    if match:
        lat, lon, val = match.groups()
        lats[lat][lon] = val
        lonset.add(lon)

latkeys = sorted(lats.keys())
lonkeys = sorted(list(lonset))

header = ['latitude'] + lonkeys
print header

for lat in latkeys:
    lons = lats[lat]
    row = [lat] + [lons.get(lon, '') for lon in lonkeys]
    print row