如何获取数据帧的信息,哪些列是字典或列表?

时间:2019-08-21 18:42:41

标签: python pandas dataframe

我有此信息,但无法获取列serviceTypescrowding的值:

id  name    modeName    disruptions lineStatuses    serviceTypes    crowding
0   piccadilly  Piccadilly  tube    []  []  [{'$type': 'Tfl.Api.Presentation.Entities.Line...   {'$type': 'Tfl.Api.Presentation.Entities.Crowd...
1   victoria    Victoria    tube    []  []  [{'$type': 'Tfl.Api.Presentation.Entities.Line...   {'$type': 'Tfl.Api.Presentation.Entities.Crowd...
2   bakerloo    Bakerloo    tube    []  []  [{'$type': 'Tfl.Api.Presentation.Entities.Line...   {'$type': 'Tfl.Api.Presentation.Entities.Crowd...
3   central Central tube    []  []  [{'$type': 'Tfl.Api.Presentation.Entities.Line...   {'$type': 'Tfl.Api.Presentation.Entities.Crowd.

我尝试了以下代码:

def split(x, index):
    try:
        return x[index]
    except:
        return None
dflines['serviceTypes'] = dflines.serviceTypes.apply(lambda x:split(x,0))
dflines['crowding'] = dflines.crowding.apply(lambda x:split(x,1))

def values(x):
    try:
        return ';'.join('{}'.format(val) for  val in x.values())
    except:
        return None
m = dflines['serviceTypes'].apply(lambda x:values(x))
dflines1 = m.str.split(';', expand=True)
dflines1.columns = dflines['serviceTypes'][0].keys()
dflines2 = dflines1[['name']]
dflines2

但是我得到了这个错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-108-8f4bb6ac731a> in <module>
     14 m = dflines['serviceTypes'].apply(lambda x:values(x))
     15 dflines1 = m.str.split(';', expand=True)
---> 16 dflines1.columns = dflines['serviceTypes'][0].keys()
     17 dflines2 = dflines1[['name']]
     18 dflines2

AttributeError: 'str' object has no attribute 'keys'

有人可以帮助我吗?

1 个答案:

答案 0 :(得分:0)

您可以像这样将pandas列拉入列表:

service_types = dflines['serviceTypes']

第一个值现在是列表service_types中的第一个值。

first_value = service_types[0]

熊猫的工作方式不同于字典。我认为您可能正在尝试将数据框视为字典。如果我误解或简化了,我深表歉意。

编辑:

好吧,看来service_types(以上)是字典的列表。要编写该列,使其只包含您需要索引到列表然后再索引到字典中的类型。

service_types = dflines['serviceTypes']
types_alone = []
for i in service_types:
    types_alone.append(i['$type'][0])
dflines['new_column'] = types_alone