从文本文件中提取数据

时间:2014-05-22 08:39:22

标签: python extract

我需要从文本文件(.txt)中提取(具体:LotLongnametype)数据并从提取中创建凸包数据。据我所知,提取的数据应该是浮点格式,而不是字符串。

文本文件有点像这样(有更多数据):

location_type, parent_station, stop_id, stop_code, stop_name, stop_desc, stop_lat, stop_lon, zone_id
0,,10000,10000,"Albany Hwy After Armadale Rd","",-32.14796,116.020217222222,4
0,,10001,10001,"Albany Hwy After Frys L","",-32.144985,116.018336666667,3
0,,10002,10002,"Albany Hwy After Clarence Rd","",-32.1420722222222,116.017182777778,3
0,,10003,10003,"Albany Hwy After Rogers L","",-32.1391138888889,116.017382222222,3
0,,10004,10004,"Albany Hwy After Galliers Av","",-32.1365533333333,116.017569444444,3
0,,10005,10005,"Albany Hwy Armadale Kelmscott Hospital","Armadale Kelmscott Hospital",-32.1348155555556,116.017707222222,3
0,,10006,10006,"Albany Hwy After Lilian Av","",-32.1304322222222,116.018038333333,3

但是直到现在我(从早上开始尝试和错误)才设法提取整个数据,而不是具体的数据。

 try:
    fp = open(filename)
    myList = []
    next(fp)
    for f in fp:
        myList.append(list(f.strip().split(",")))

    fp.close()

    return myList

需要帮助来解决这个问题。非常感谢。

3 个答案:

答案 0 :(得分:2)

http://www.coderholic.com/parsing-csv-data-in-python/ 看一下那个链接,它会告诉你如何在python中处理CSV。

以上链接中的代码:

import csv
data = csv.reader(open('data.csv'))
# Read the column names from the first line of the file
fields = data.next()
for row in data:
        # Zip together the field names and values
    items = zip(fields, row)
    item = {}
        # Add the value to our dictionary
    for (name, value) in items:
        item[name] = value.strip()

将数据放入字典中,然后您可以按名称获取所需的值,而不必记住数据在列表中的位置

它基本上看起来像这样(例子):

{"id": "0", "name": "name", "date": "2009-01-01"},
{"id": "1", "name": "another name", "date": "2009-02-01"}

在你的情况下:

{"location_type": 0, "parent_station": "", "stop_id": 10000, "stop_code": 10000, "stop_name": "Albany Hwy After Armadale Rd", "stop_desc": "", "stop_lat": -32.14796, "stop_lon": 116.020217222222, "zone_id": 4}

答案 1 :(得分:1)

您可以使用csv模块中的csv.DictReader

import csv
import pprint
pp = pprint.PrettyPrinter()
with open('filename') as file:
    dialect = csv.Sniffer().sniff(file.read(1024)) # determine the file format
    file.seek(0)                                   # rewind back to start of file
    dialect.skipinitialspace = True                # skip whitespace after delimiter
    dict_reader = csv.DictReader(file, dialect=dialect)
    for row in dict_reader:
        pp.pprint(row)

这会将.csv文件的每一行打印为字典。我正在使用pprint.PrettyPrinter以更整洁的方式打印字典。

csv.DictReader对象会自动根据您第一行的名称创建密钥。 skipinitialspace的{​​{1}}选项可确保这些名称在开头不包含任何空格。

上述代码的第一次迭代输出:

dialect

dictionary包含{'location_type': '0', 'parent_station': '', 'stop_code': '10000', 'stop_desc': '', 'stop_id': '10000', 'stop_lat': '-32.14796', 'stop_lon': '116.020217222222', 'stop_name': 'Albany Hwy After Armadale Rd', 'zone_id': '4'} 对,因此要获取特定值,请按键引用它。例如,要获得给定key: value的{​​{1}},您可以执行stop_name。如果您想从文件的每一行打印坐标,名称和类型,可以将上面的row循环更改为以下内容:

name = row['stop_name']

您可以查看for here。它基本上是一种构建包含变量的字符串的更好方法。

输出:

for row in dict_reader:
    lat = row['stop_lat']
    lon = row['stop_lon']
    name = row['stop_name']
    type = row['location_type']
    print '({},{}): {}, {}'.format(lat, lon, name, type)

修改

例如,如果您想获得所有纬度和经度的列表作为花车,您可以这样做:

str.format

答案 2 :(得分:0)

我喜欢这样做而不导入特定的lib:

d = {}
with open("file.txt") as f:
    for line in f:
        (location_type, parent_station, stop_id, stop_code, stop_name, stop_desc, stop_lat, stop_lon, zone_id) = line.split(",")
        d[stop_id] = (location_type, parent_station, stop_code, stop_name, stop_desc, stop_lat, stop_lon, zone_id)
print d

它更像pythonic!