我有一个information
数据帧,该数据帧是通过以下方式获得的:
information = pd.DataFrame.from_dict(docs.json()["hits"]["hits"])
information
包含类型news
的对象。对于每个新闻,我只想要_source
:
_id _index _score _source _type
0 c0b0773f94fc91938709edccf1ec4e3039e7576b luxurynsight_v2 6.023481 {'importer': 'APItay', 'releasedAt': 147621242... news
1 9ce6d7e015dc28497ff8ccd4915cf4104188107d luxurynsight_v2 6.015883 {'importer': 'APItay', 'releasedAt': 152717820... news
...
在每个_source
中,我只想要name
和createAt
例如,这里是news
之一:
_index _type _id _score _source
_headers luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [{'header': 'date', 'value': 'Fri, 23 Feb 2018...
_opengraph luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [{'header': 'og_locale', 'value': 'en_US'}, {'...
_sums luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [{'sum': 'decfedbfae938da88e93e75c7ebb4dc9', '...
_tags luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [{'visible': True, 'name': 'Gucci', 'count': 3...
_users luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [{'permission': 'public', 'id': 0}]
archive luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 True
authors luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 []
catalogs luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [Luxurynsight]
cleanUrl luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 http://www.cpp-luxury.com/gucci-debuts-art-ins...
contentType luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 text/html
createdAt luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 1508510973592
domain luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 www.cpp-luxury.com
excerpt luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 Gucci debuts art installation at its Ginza sto...
foundOn luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [excerpt, name]
iframe luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 True
importer luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 APItay
language luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 en-US
name luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 Gucci debuts art installation at its Ginza sto...
plainCategories luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [AutomaticBrands, Market, AutomaticPeople, Tag]
plainTags luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 [Gucci, Market_Japan, Alessandro Michele, Tag_...
previewImage luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 http://www.cpp-luxury.com/wp-content/uploads/2...
publishedAt luxurynsight_v2 news c0b0773f94fc91938709edccf1ec4e3039e7576b 6.023481 1476212420000
预期结果是:
createAt names
2007-01-01 What Sticks from '06. Somalia Orders Islamist...
2007-01-02 Heart Health: Vitamin Does Not Prevent Death ...
2007-01-03 Google Answer to Filling Jobs Is an Algorithm...
>>> information._source
0 {'importer': 'APItay', 'releasedAt': 147621242...
1 {'importer': 'APItay', 'releasedAt': 152717820...
2 {'importer': 'APItay', 'releasedAt': 152418240...
问题是我们得到了一个字典数据框。如何将其转换为数据框?也许还有其他方法?
import ast
information._source = information._source.apply(lambda x: ast.literal_eval(x))
# Store in a new column
df['name'] = information._source.apply(lambda x: x['name'])
# Store in a new column
df['createAt'] = information._source.apply(lambda x: x['createAt'])
但是它给了我ValueError:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-194-968302937df5> in <module>
1 import ast
----> 2 information._source = information._source.apply(lambda x: ast.literal_eval(x))
3
4 # Store in a new column
5 df['name'] = information._source.apply(lambda x: x['name'])
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3192 else:
3193 values = self.astype(object).values
-> 3194 mapped = lib.map_infer(values, f, convert=convert_dtype)
3195
3196 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-194-968302937df5> in <lambda>(x)
1 import ast
----> 2 information._source = information._source.apply(lambda x: ast.literal_eval(x))
3
4 # Store in a new column
5 df['name'] = information._source.apply(lambda x: x['name'])
C:\ProgramData\Anaconda3\lib\ast.py in literal_eval(node_or_string)
83 return left - right
84 raise ValueError('malformed node or string: ' + repr(node))
---> 85 return _convert(node_or_string)
86
87
C:\ProgramData\Anaconda3\lib\ast.py in _convert(node)
82 else:
83 return left - right
---> 84 raise ValueError('malformed node or string: ' + repr(node))
85 return _convert(node_or_string)
86
ValueError: malformed node or string: {'importer': 'APItay', 'releasedAt': 1476212420000, '_tags': [{'visible': True, 'name': 'Gucci', 'count': 39, 'id': 'Gucci', 'category': ['AutomaticBrands']}, {'visible': False, 'name': 'MLI1', 'count': 39, 'id': 'staffTagging_MLI1', 'category': ['staffTagging']}, {'visible': True, 'name': 'Japan', 'count': 19, 'id': 'Market_Japan', 'category': ['Market']}, {'visible': False, 'name': 'KBN', 'count': 4, 'id': 'staffTagging_KBN', 'category': ['staffTagging']}, {'visible': False, 'name': 'JLE',
def create_doc(uri, doc_data={}):
"""Create new document."""
query = json.dumps(doc_data)
response = requests.post(uri, data = query)#data=json.dumps({"size":10}))
print(type(response))
return(response)
doc_data = {
"size": 10,
"query": {
"bool": {
"must" : [
{"term":{"text":"gucci"}}
]
}
}
}
docs = create_doc("https://elastic:rKzWu2WbXI@db.luxurynsight.com/luxurynsight_v2/news/_search",doc_data)
答案 0 :(得分:2)
已验证问题的答案-
# Reading the JSON file
df = pd.read_json('file.json')
# Converting the element wise _source feature datatype to dictionary
df._source = df._source.apply(lambda x: dict(x))
# Creating name column
df['name'] = df._source.apply(lambda x: x['name'])
# Creating createdAt column
df['createdAt'] = df._source.apply(lambda x: x['createdAt'])