我想知道如何构建具有适当结构的.csv文件。例如,我的数据格式为:
(指标,纬度,经度,价值)
- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389
我希望能够将这些数据保存在.csv文件中,格式为:
| longitude |
latitude | value |
也就是说,具有相同纬度的所有值都在同一行中,所有具有相同经度的值将在同一列中。我知道如何用Python编写.csv文件,我想知道如何正确地执行这种转换。
提前谢谢。
谢谢。
答案 0 :(得分:1)
我假设您在文本文件中包含此数据。让我们使用正则表达式来解析数据(尽管如果格式保持不变,字符串拆分看起来可能会有效)。
import re
data = list()
with open('path/to/data/file','r') as infile:
for line in infile:
matches = re.match(r".*(?<=lat=)(?P<lat>(?:\+|-)?[\d.]+).*(?<=value=)(?P<longvalue>(?:\+|-)?[\d.]+)", line)
data.append((matches.group('lat'), matches.group('longvalue'))
展开讨厌的正则表达式:
pat = re.compile(r"""
.* Match anything any number of times
(?<=lat=) assert that the last 4 characters are "lat="
(?P<lat> begin named capturing group "lat"
(?:\+|-)? allow one or none of either + or -
[\d.]+ and one or more digits or decimal points
) end named capturing group "lat"
.* Another wildcard
(?<=value=) assert that the last 6 characters are "value="
(?P<longvalue> begin named capturing group "longvalue"
(?:\+|-)? allow one or none of either + or -
[\d.]+ and one or more digits or decimal points
) end named capturing group "longvalue"
""", re.X)
# and a terser way of writing the code, since we've compiled the pattern above:
with open('path/to/data/file', 'r') as infile:
data = [(matches.group('lat'), matches.group('longvalue')) for line in infile for
matches in (re.match(pat, line),)]
答案 1 :(得分:1)
我为你写了一个小程序:)见下文。
我现在假设您的数据存储为dicts列表,但如果它是列表列表,则代码不应该太难修复。
#!/usr/bin/env python
import csv
data = [
dict(lat=1, lon=1, val=10),
dict(lat=1, lon=2, val=20),
dict(lat=2, lon=1, val=30),
dict(lat=2, lon=2, val=40),
dict(lat=3, lon=1, val=50),
dict(lat=3, lon=2, val=60),
]
# get a unique list of all longitudes
headers = list({d['lon'] for d in data})
headers.sort()
# make a dict of latitudes
data_as_dict = {}
for item in data:
# default value: a list of empty strings
lst = data_as_dict.setdefault(item['lat'], ['']*len(headers))
# get the longitute for this item
lon = item['lon']
# where in the line should it be?
idx = headers.index(lon)
# save value in the list
lst[idx]=item['val']
# in the actual file, we start with an extra header for the latitude
headers.insert(0,'latitude')
with open('latitude.csv', 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(headers)
lats = data_as_dict.keys()
lats.sort()
for latitude in lats:
# a line starts with the latitude, followed by list of values
l = data_as_dict[latitude]
l.insert(0, latitude)
writer.writerow(l)
输出:
latitude 1 2
1 10 20
2 30 40
3 50 60
当然,这不是最漂亮的代码,但我希望你能得到这个想法
答案 2 :(得分:1)
根据您的输入数据,我想出了以下内容:
from __future__ import print_function
def decode(line):
line = line.replace('- ', ' ')
fields = line.split()
index = fields[0]
data = dict([_.split('=') for _ in fields[1:]])
return index, data
def transform(filename):
transformed = {}
columns = set()
for line in open(filename):
index, data = decode(line.strip())
element = transformed.setdefault(data['lat'], {})
element[data['lon']] = data['value']
columns.add(data['lon'])
return columns, transformed
def main(filename):
columns, transformed = transform(filename)
columns = sorted(columns)
print(',', ','.join(columns))
for lat, data in transformed.items():
print(lat, ',', ', '.join([data.get(_, 'NULL') for _ in columns]))
if __name__ == '__main__':
main('so.txt')
以防万一,数据只包含一个纬度,我在示例中添加了一行,所以我的输入数据(so.txt
)包含:
- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389
- 22 - lat=-52.490000 lon=268.324000 value=2948.090389
(注意最后一行)
使用该输入文件,上述程序将创建以下输出:
, 264.313000,264.504000,264.695000,264.886000,265.077000,265.268000,265.459000,265.650000,265.841000,266.032000,266.223000,266.414000,266.605000,266.796000,266.987000,267.178000,267.369000,267.560000,267.751000,267.942000,268.133000,268.324000
-51.490000 , 7.270077, 7.231014, 21.199764, 49.176327, 91.160702, 147.152889, 217.152889, 301.160702, 399.176327, 511.199764, 637.231014, 777.270077, 931.316952, 1099.371639, 1281.434139, 1477.504452, 1687.582577, 1911.668514, 2149.762264, 2401.863827, 2667.973202, 2948.090389
-52.490000 , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 2948.090389
答案 3 :(得分:1)
你可以使用正则表达式从每一行拉出lat / lon /值。您以后想要查找lat和lon,因此请使用d[lat][lon]=value
形式的嵌套dict来跟踪它。添加一组以跟踪您看到的独特经度,并且非常直接地生成csv。
我在示例中对其进行了排序,但您可能并不关心它。
import re
import collections
data = """- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702"""
regex = re.compile(r'- \d+ - lat=([\+\-]?[\d\.]+) lon=([\+\-]?[\d\.]+) value=([\+\-]?[\d\.]+)')
# lat/lon index will hold lats[latitude][longitude] = value
lats = collections.defaultdict(dict)
# longitude columns
lonset = set()
for line in data.split('\n'):
match = regex.match(line)
if match:
lat, lon, val = match.groups()
lats[lat][lon] = val
lonset.add(lon)
latkeys = sorted(lats.keys())
lonkeys = sorted(list(lonset))
header = ['latitude'] + lonkeys
print header
for lat in latkeys:
lons = lats[lat]
row = [lat] + [lons.get(lon, '') for lon in lonkeys]
print row