我在访问字典中的数据时遇到问题。
Sys:Macbook 2012
Python:Python 3.5.1 :: Continuum Analytics,Inc。
我正在使用从csv创建的dask.dataframe。
假设我从熊猫系列开始:
df.Coordinates
130 {u'type': u'Point', u'coordinates': [-43.30175...
278 {u'type': u'Point', u'coordinates': [-51.17913...
425 {u'type': u'Point', u'coordinates': [-43.17986...
440 {u'type': u'Point', u'coordinates': [-51.16376...
877 {u'type': u'Point', u'coordinates': [-43.17986...
1313 {u'type': u'Point', u'coordinates': [-49.72688...
1734 {u'type': u'Point', u'coordinates': [-43.57405...
1817 {u'type': u'Point', u'coordinates': [-43.77649...
1835 {u'type': u'Point', u'coordinates': [-43.17132...
2739 {u'type': u'Point', u'coordinates': [-43.19583...
2915 {u'type': u'Point', u'coordinates': [-43.17986...
3035 {u'type': u'Point', u'coordinates': [-51.01583...
3097 {u'type': u'Point', u'coordinates': [-43.17891...
3974 {u'type': u'Point', u'coordinates': [-8.633880...
3983 {u'type': u'Point', u'coordinates': [-46.64960...
4424 {u'type': u'Point', u'coordinates': [-43.17986...
问题是,这不是一本真正的词典数据框。相反,它是一个充满字符串的列,看起来像字典。运行它显示它:
df.Coordinates.apply(type)
130 <class 'str'>
278 <class 'str'>
425 <class 'str'>
440 <class 'str'>
877 <class 'str'>
1313 <class 'str'>
1734 <class 'str'>
1817 <class 'str'>
1835 <class 'str'>
2739 <class 'str'>
2915 <class 'str'>
3035 <class 'str'>
3097 <class 'str'>
3974 <class 'str'>
3983 <class 'str'>
4424 <class 'str'>
我的目标:访问词典中的coordinates
键和值。而已。但它是str
我使用eval
将字符串转换为字典。
new = df.Coordinates.apply(eval)
130 {'coordinates': [-43.301755, -22.990065], 'typ...
278 {'coordinates': [-51.17913026, -30.01201896], ...
425 {'coordinates': [-43.17986794, -22.91000096], ...
440 {'coordinates': [-51.16376782, -29.95488677], ...
877 {'coordinates': [-43.17986794, -22.91000096], ...
1313 {'coordinates': [-49.72688407, -29.33757253], ...
1734 {'coordinates': [-43.574057, -22.928059], 'typ...
1817 {'coordinates': [-43.77649254, -22.86940539], ...
1835 {'coordinates': [-43.17132318, -22.90895217], ...
2739 {'coordinates': [-43.1958313, -22.98755333], '...
2915 {'coordinates': [-43.17986794, -22.91000096], ...
3035 {'coordinates': [-51.01583481, -29.63593292], ...
3097 {'coordinates': [-43.17891379, -22.96476163], ...
3974 {'coordinates': [-8.63388008, 41.14594453], 't...
3983 {'coordinates': [-46.64960938, -23.55902666], ...
4424 {'coordinates': [-43.17986794, -22.91000096], ...
接下来我发短信给对象的类型并获取:
130 <class 'dict'>
278 <class 'dict'>
425 <class 'dict'>
440 <class 'dict'>
877 <class 'dict'>
1313 <class 'dict'>
1734 <class 'dict'>
1817 <class 'dict'>
1835 <class 'dict'>
2739 <class 'dict'>
2915 <class 'dict'>
3035 <class 'dict'>
3097 <class 'dict'>
3974 <class 'dict'>
3983 <class 'dict'>
4424 <class 'dict'>
如果我尝试访问我的词典: new.apply(lambda x:x ['coordinates']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-71-c0ad459ed1cc> in <module>()
----> 1 dfCombined.Coordinates.apply(coord_getter)
/Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2218 else:
2219 values = self.asobject
-> 2220 mapped = lib.map_infer(values, f, convert=convert_dtype)
2221
2222 if len(mapped) and isinstance(mapped[0], Series):
pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)()
<ipython-input-68-748ce2d8529e> in coord_getter(row)
1 import ast
2 def coord_getter(row):
----> 3 return (ast.literal_eval(row))['coordinates']
TypeError: 'bool' object is not subscriptable
这是某种类型的类,因为当我运行dir
时,我得到一个对象:
new.apply(lambda x: dir(x))[130]
130 __class__
130 __contains__
130 __delattr__
130 __delitem__
130 __dir__
130 __doc__
130 __eq__
130 __format__
130 __ge__
130 __getattribute__
130 __getitem__
130 __gt__
130 __hash__
130 __init__
130 __iter__
130 __le__
130 __len__
130 __lt__
130 __ne__
130 __new__
130 __reduce__
130 __reduce_ex__
130 __repr__
130 __setattr__
130 __setitem__
130 __sizeof__
130 __str__
130 __subclasshook__
130 clear
130 copy
130 fromkeys
130 get
130 items
130 keys
130 pop
130 popitem
130 setdefault
130 update
130 values
Name: Coordinates, dtype: object
我的问题:我只想访问字典。但是,对象是<class 'dict'>
。如何将其转换为常规字典或只访问键:值对?
任何想法??
答案 0 :(得分:4)
我的第一直觉是使用json.loads
将字符串转换成dicts。但是您发布的示例并不遵循json标准,因为它使用单引号而不是双引号。所以你必须先转换字符串。
第二种选择是使用正则表达式来解析字符串。如果实际DataFrame中的dict字符串与我的示例不完全匹配,我希望正则表达式方法更加健壮,因为lat / long coords是相当标准的。
import re
import pandasd as pd
df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",
"{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],
'idx': [130, 278]})
##
# Solution 1- use json.loads
##
def string_to_dict(dict_string):
# Convert to proper json format
dict_string = dict_string.replace("'", '"').replace('u"', '"')
return json.loads(dict_string)
df.CoordDicts = df.Coordinates.apply(string_to_dict)
df.CoordDicts[0]['coordinates']
#>>> [-43.30175, 123.45]
##
# Solution 2 - use regex
##
def get_lat_lon(dict_string):
# Get the coordinates string with regex
rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()
# Cast to floats
coords = [float(x) for x in rs.split(',')]
return coords
df.Coords = df.Coordinates.apply(get_lat_lon)
df.Coords[0]
#>>> [-43.30175, 123.45]
答案 1 :(得分:0)
看起来你最终会得到像这样的东西
s = pd.Series([
dict(type='Point', coordinates=[1, 1]),
dict(type='Point', coordinates=[1, 2]),
dict(type='Point', coordinates=[1, 3]),
dict(type='Point', coordinates=[1, 4]),
dict(type='Point', coordinates=[1, 5]),
dict(type='Point', coordinates=[2, 1]),
dict(type='Point', coordinates=[2, 2]),
dict(type='Point', coordinates=[2, 3]),
])
s
0 {u'type': u'Point', u'coordinates': [1, 1]}
1 {u'type': u'Point', u'coordinates': [1, 2]}
2 {u'type': u'Point', u'coordinates': [1, 3]}
3 {u'type': u'Point', u'coordinates': [1, 4]}
4 {u'type': u'Point', u'coordinates': [1, 5]}
5 {u'type': u'Point', u'coordinates': [2, 1]}
6 {u'type': u'Point', u'coordinates': [2, 2]}
7 {u'type': u'Point', u'coordinates': [2, 3]}
dtype: object
df = s.apply(pd.Series)
df
然后访问坐标
df.coordinates
0 [1, 1]
1 [1, 2]
2 [1, 3]
3 [1, 4]
4 [1, 5]
5 [2, 1]
6 [2, 2]
7 [2, 3]
Name: coordinates, dtype: object
甚至
df.coordinates.apply(pd.Series)
答案 2 :(得分:0)
只是遇到了这个问题。我的解决方案:
import ast
import pandas as pd
df = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"])
df = df["Coordinates"].astype('str')
df = df.apply(lambda x: ast.literal_eval(x))
df = df.apply(pd.Series)
答案 3 :(得分:0)
假设从一系列字典开始,则可以使用.tolist()
方法来创建字典列表,并将其用作DataFrame的输入。这种方法会将每个不同的键映射到一列。
您可以通过在pd.DataFrame()
中设置columns
自变量来按创建时的键进行过滤,从而在下面为您提供简洁的单线。希望有帮助。
# Starting assumption:
data = ["{'coordinates': [-43.301755, -22.990065], 'type': 'Point', 'elevation': 1000}",
"{'coordinates': [-51.17913026, -30.01201896], 'type': 'Point'}"]
s = pd.Series(data).apply(eval)
# Create a DataFrame with a list of dicts with a selection of columns
pd.DataFrame(s.tolist(), columns=['coordinates'])
Out[1]:
coordinates
0 [-43.301755, -22.990065]
1 [-51.17913026, -30.01201896]