我有这个数据集。
{
"date": "2018-01-01",
"body": "some txt",
"id": 111,
"sentiment": null
},
{
"date": "2018-01-02",
"body": "some txt",
"id": 112,
"sentiment": {
"basic": "Bearish"
}
}
我想和熊猫一起读这篇文章,并更改与null不同的每一行的列情绪。
当我这样做时:
pd.read_json(path)
这是我得到的结果:
body ... sentiment
0 None
1 {u'basic': u'Bullish'}
我不想拥有{u'basic': u'Bullish'}
,而只希望拥有basic的价值。
所以要找到我使用的正确行
df.loc[self.df['sentiment'].isnull() != True, 'sentiment'] = (?)
它有效,但是我不知道要代替(?)
加什么我已经尝试过了,但是没用
df.loc[self.df['sentiment'].isnull() != True, 'sentiment'] = df['sentiment']['basic]
有什么想法吗?谢谢
答案 0 :(得分:3)
您可以尝试:
mask = df['sentiment'].notnull()
df.loc[mask, 'sentiment'] = df.loc[mask, 'sentiment'].apply(lambda x: x['basic'])
答案 1 :(得分:2)
您可以这样做:
df = pd.read_json(path) # creates the dataframe with dict objects in sentiment column
pd.concat([df.drop(['sentiment'], axis=1), df['sentiment'].apply(pd.Series)], axis=1) # create new columns for each sentiment type
例如,如果您的json是:
[{
"date": "2018-01-01",
"body": "some txt",
"id": 111,
"sentiment": null
},
{
"date": "2018-01-02",
"body": "some txt",
"id": 112,
"sentiment": {
"basic": "Bearish"
}
},
{
"date": "2018-01-03",
"body": "some other txt",
"id": 113,
"sentiment": {
"basic" : "Bullish",
"non_basic" : "Bearish"
}
}]
第1行后的df
body date id sentiment
0 some txt 2018-01-01 111 None
1 some txt 2018-01-02 112 {'basic': 'Bearish'}
2 some other txt 2018-01-03 113 {'basic': 'Bullish', 'non_basic': 'Bearish'}
第2行后的df
body date id basic non_basic
0 some txt 2018-01-01 111 NaN NaN
1 some txt 2018-01-02 112 Bearish NaN
2 some other txt 2018-01-03 113 Bullish Bearish
HTH。
答案 2 :(得分:0)
fillna
+ pop
+ join
这是一个可扩展的解决方案,它避免了逐行apply
并将任意数量的键转换为序列:
df = pd.DataFrame({'body': [0, 1],
'sentiment': [None, {u'basic': u'Bullish'}]})
df['sentiment'] = df['sentiment'].fillna(pd.Series([{}]*len(df.index), index=df.index))
df = df.join(pd.DataFrame(df.pop('sentiment').values.tolist()))
print(df)
body basic
0 0 NaN
1 1 Bullish