Question

我有一个csv文件，其中有些列看起来像这样：

df = pd.DataFrame({'a':[['ID1','ID2','ID3'],['ID1','ID4'],[]],'b':[[8.6,1.3,2.5],[7.5,1.2],[]],'c':[[12,23,79],[42,10],[]]})

Out[1]:     a               b                c
        0   [ID1, ID2, ID3] [8.6, 1.3, 2.5] [12, 23, 79]
        1   [ID1, ID4]      [7.5, 1.2]      [42, 10]
        2   []              []              []

当我用pandas.read_csv读取它时，Python会将这些列视为字符串。有没有办法作为选项传递它是这些列中的数字列表？（可能有些dtype = something）

PS：之后我可以用ast.literal_eval进行列表理解，但这需要一段时间，所以我想在阅读csv后立即使用它。

PS2：原始的csv文件长度为600 000行（这就是literal_eval需要一些时间的原因。它的列包含：

'ID of the project'  'postcode'    'city'       'len of the lists in the last 3 columns'  'ids of other projects'   'distance from initial project'  'jetlag from initial project'
 object                int          string       int                                       list of strings           list of floats                   list of ints

Answer 1

为此，您可以使用converters函数中的pd.read_csv Documentation for read_csv：

使用您的示例，

'ID of the project'  'postcode'    'city'       'len of the lists in the last 3 columns'  'ids of other projects'   'distance from initial project'  'jetlag from initial project'
 object                int          string       int                                       list of strings           list of floats                   list of ints

可以这样做：

import pandas as pd
import ast
generic = lambda x: ast.literal_eval(x)
conv = {'ids of other projects': generic,
        'distance from initial project': generic,
        'jetlag from initial project': generic}

df = pd.read_csv('your_file.csv', converters=conv)

您必须定义要使用转化的列，但这不应该是您的问题。

转换器功能将在csv导入期间应用，如果文件太大，您可以随时读取块中的csv。

在pandas中如何读取列中列表的csv文件？

1 个答案: