Question

我有一个CSV文件，每个单元格值都有两个元素列表（对）。

    |   0       |   1        |    2    | 
----------------------------------------
0   |[87, 1.03] | [30, 4.05] |   NaN   |
1   |[34, 2.01] |   NaN      |   NaN   |
2   |[83, 0.2]  | [18, 3.4]  |   NaN   |

如何分别访问这些元素？每对的第一个元素充当另一个CSV表的索引。我已经做了类似的事情，但这总是使我烦恼于一件事情。

links = pd.read_csv('buslinks.csv', header = None)
a_list = []
for i in range(0, 100):
    l = []
    a_list.append(l)
for j in range(0, 100):
    a = busStops.iloc[j]
    df = pd.DataFrame(columns = ['id', 'Distance'])
    l = links.iloc[j]
    for i in l:
        if(pd.isnull(i)):
            continue
        else:
            x = int(i[0])
            d = busStops.iloc[x-1]
            id = d['id']
            dist = distance(d['xCoordinate'], a['xCoordinate'], d['yCoordinate'], a['yCoordinate'])
            df.loc[i] = [id, dist]
    a_list[j] = (df.sort('Distance', ascending = True)).tolist()

当每个单元格仅包含一个元素时，此方法有效。在这种情况下，将使用np.isnan（）代替pd.isnull（）

读取的CSV文件创建为：

a_list = []
for i in range(0, 100):
    l = []
    a_list.append(l)
for i in range(0, 100):
    while(len(a_list[i])<3):
        x = random.randint(1, 100)
        if(x-1 == i):
             continue
        a = busStops.iloc[i]
        b = busStops.iloc[x-1]
        dist = distance(a['xCoordinate'], b['xCoordinate'], a['yCoordinate'], b['yCoordinate'])
        if dist>3:
            continue
        if x in a_list[i]:
            continue
        a_list[i].append([b['id'], dist])
        a_list[x-1].append([a['id'], dist])
    for j in range(0, 3):
        y = random.randint(0, 1)
        while (y == 0):
            x = random.randint(1, 100)
            if(x-1 == i):
                 continue
            a = busStops.iloc[i]
            b = busStops.iloc[x-1]
            dist = distance(a['xCoordinate'], b['xCoordinate'], a['yCoordinate'], b['yCoordinate'])
            if dist>3:
                continue
            if x in a_list[i]:
                continue
            a_list[i].append([b['id'], dist])
            a_list[x-1].append([a['id'], dist])
            y = 1
dfLinks = pd.DataFrame(a_list)
dfLinks
dfLinks.to_csv('buslinks.csv', index = False, header = False)

BusStops是另一个CSV文件，其中包含ID，xCoordinate，yCoordinate，Population和Priority作为列。

Answer 1

首先，请注意将列表存储在DataFrames中会使您陷入Python速度循环。要利用快速的Pandas / NumPy例程，您需要使用本地的NumPy dtype，例如np.float64（而list则需要“ object” dtype）。

话虽这么说，这是我写的代码，目的是演示如何做到这一点，以便您可以在代码中使用类似的代码：

import pandas as pd

table = pd.DataFrame(columns=['col1', 'col2', 'col3'])
table.loc[0] = [1, 2,3]
table.loc[1] = [1, [2,3], 4]

table.loc[1].iloc[1]        # returns [2, 3]
table.loc[1].iloc[1][0]     # returns 2

Answer 2

您不应将列表放在pd.Series对象中。它效率低下，您将失去所有矢量化功能。但是，如果您确定必须以作为起点，则可以通过几个步骤将列表分解为多列。

设置

df = pd.DataFrame({0: [[87, 1.03], [34, 2.01], [83, 0.2]],
                   1: [[30, 4.05], np.nan, [18, 3.4]],
                   2: [np.nan, np.nan, np.nan]})

步骤1：确保列表大小相同

# messy way to ensure all values have length 2
df[1] = np.where(df[1].isnull(), pd.Series([[np.nan, np.nan]]*len(df[1])), df[1])

print(df)

            0           1   2
0  [87, 1.03]  [30, 4.05] NaN
1  [34, 2.01]  [nan, nan] NaN
2   [83, 0.2]   [18, 3.4] NaN

步骤2：连接拆分系列的数据框

# create list of dataframes
L = [pd.DataFrame(df[col].values.tolist()) for col in df]

# concatenate dataframes in list
df_new = pd.concat(L, axis=1, ignore_index=True)

print(df_new)

    0     1     2     3   4
0  87  1.03  30.0  4.05 NaN
1  34  2.01   NaN   NaN NaN
2  83  0.20  18.0  3.40 NaN

然后，您可以像往常一样访问值，例如df_new[2]。

使用存储在Pandas的DataFrame单元中的列表的值

2 个答案: