使用字符串和数字列

时间:2018-02-13 16:45:47

标签: python string numpy format genfromtxt

我有制表符分隔文件(city-data.txt):

Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974

是否有可能以某种方式读取前两列作为字符串,最后两列作为浮点数?

我的输出应如下所示:

[(Alabama,Montgomery,32.36,-86.28),
 (Alaska,Juneau,58.30,-134.42)]

我试过了:

mylist2=np.genfromtxt(r'city-data.txt', delimiter='\t',  dtype=("<S15","
<S15", float, float)).tolist()

这给了我前两列字节类型:

[(b'Alabama', b'Montgomery', 32.361538, -86.279118),
 (b'Alaska', b'Juneau', 58.301935, -134.41974)]

我也尝试过:

with open('city-data.txt') as f:
mylist = [tuple(i.strip().split('\t')) for i in f]

这给了我所有字符串类型的列:

[('Alabama', 'Montgomery', '32.361538', '-86.279118'),
 ('Alaska', 'Juneau', '58.301935', '-134.41974')]

我无法想出如何实现我需要的东西......

3 个答案:

答案 0 :(得分:5)

您可以使用pandas public static class HttpConfigExt { public static System.Web.OData.Routing.ODataRoute CustomMapODataServiceRoute(this HttpConfiguration configuration, string routeName, string routePrefix, Microsoft.OData.Edm.IEdmModel model, IEnumerable<Type> controllers) { var routingConventions = ODataRoutingConventions.CreateDefault(); // Multiple Controllers with Multiple Custom Functions routingConventions.Insert(0, new CustomAttributeRoutingConvention(routeName, configuration, controllers)); // Custom Composite Key Convention //routingConventions.Insert(1, new CompositeKeyRoutingConvention()); return configuration.MapODataServiceRoute(routeName, routePrefix, model, new System.Web.OData.Routing.DefaultODataPathHandler(), routingConventions, defaultHandler: System.Net.Http.HttpClientFactory.CreatePipeline( innerHandler: new System.Web.Http.Dispatcher.HttpControllerDispatcher(configuration), handlers: new[] { new System.Web.OData.ODataNullValueMessageHandler() })); } } public class CustomAttributeRoutingConvention : AttributeRoutingConvention { private readonly List<Type> _controllers = new List<Type> { typeof(System.Web.OData.MetadataController) }; public CustomAttributeRoutingConvention(string routeName, HttpConfiguration configuration, IEnumerable<Type> controllers) : base(routeName, configuration) { _controllers.AddRange(controllers); } public override bool ShouldMapController(System.Web.Http.Controllers.HttpControllerDescriptor controller) { return _controllers.Contains(controller.ControllerType); } } 将文件内容读入数据框。然后使用read_csv将您的行转换为列表。

示例:

df.values.tolist()

如果你需要它们作为元组,只需使用import pandas as pd df = pd.read_csv(filename, sep="\t", header=None) print(df.values.tolist()) #[['Alabama', 'Montgomery', 32.361538, -86.27911800000001], # ['Alaska', 'Juneau', 58.301935, -134.41974]]

map()

修改

如果您想使用print(map(tuple, df.values.tolist())) #[('Alabama', 'Montgomery', 32.361538, -86.27911800000001), # ('Alaska', 'Juneau', 58.301935, -134.41974)] ,那么对现有代码的这种轻微修改应该有效。将文本字段的numpy更改为dtype

"O"

答案 1 :(得分:3)

另一种选择是使用'U'dtype,代表unicode。

>>> import numpy as np
>>> mylist = np.genfromtxt('city-data.txt', delimiter='\t', dtype=('U10','U10',float,float)).tolist()
>>> mylist
[('Alabama', 'Montgomery', 32.361538, -86.279118), ('Alaska', 'Juneau', 58.301935, -134.41974)]

答案 2 :(得分:1)

拆分线后,尝试将项目转换为浮点数,然后将新行追加到最终容器中,创建一个新行。

import io
from pprint import pprint

s = '''Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974'''
f = io.StringIO(s)
stuff = []
for line in f:
    line = line.strip()
    line = line.split()
    new_line = []
    for item in line:
        try:
            item = float(item)
        except ValueError as e:
            pass
        new_line.append(item)
    #print(f'line:{line}, new_line:{new_line}')
    stuff.append(new_line)
pprint(stuff)