我的数据框类似于一个波纹管
我要删除文本并仅从该数据框中的每个列中保留数字
预期的输出类似这样
到目前为止,我已经尝试过
import json
import requests
import pandas as pd
URL = 'https://xxxxx.com'
req = requests.get(URL,auth=('xxx', 'xxx') )
text_data= req.text
json_dict= json.loads(text_data)
df = pd.DataFrame.from_dict(json_dict["measurements"])
cols_to_keep =['source','battery','c8y_TemperatureMeasurement','time','c8y_DistanceMeasurement']
df_final = df[cols_to_keep]
df_final = df_final.rename(columns={'c8y_TemperatureMeasurement': 'Temperature Or T','c8y_DistanceMeasurement':'Distance'})
for col in df_final:
df_final[col] = [''.join(re.findall("\d*\.?\d+", item)) for item in df_final[col]]
答案 0 :(得分:1)
您的代码丢失了以pd格式导入大熊猫,并且数据无法访问,因为它需要凭据。
您可以使用pandas.DataFrame.replace:
示例数据:
df = pd.DataFrame({'a':['abc123abc', 'def456678'], 'b':['123a', 'b456']})
数据框:
a b
0 abc123abc 123a
1 def456678 b456
[^ 0-9。]替换所有非数字字符。
df.replace('[^0-9.]', '', regex=True)
输出:
a b
0 123 123
1 456678 456
编辑: 这里的问题实际上是关于嵌套JSON的,而不是关于替换数据帧中的值的。上面的语句不起作用的原因是因为数据已作为dict保存在数据框中。但是,由于上述解决方案通常是正确的,因此不会对其进行编辑。
修订后的答案:
from pandas.io.json import json_normalize
import requests
import pandas as pd
URL = 'https://wastemanagement.post-iot.lu/measurement/measurements?source=83512& pageSize=1000000000&dateFrom=2019-10-26&dateTo=2019-10-28'
req = requests.get(URL,auth=('xxxx', 'xxxx') )
text_data= req.text
json_dict= json.loads(text_data)
df= json_normalize(json_dict['measurements'])
df = df_final.rename(columns={'source.id': 'source', 'battery.percent.value': 'battery', 'c8y_TemperatureMeasurement.T.value': 'Temperature Or T','c8y_DistanceMeasurement.distance.value':'Distance'})
cols_to_keep =['source' ,'battery', 'Temperature Or T', 'time', 'Distance']
df_final = df[cols_to_keep]
输出:
source battery Temperature Or T time Distance
0 83512 98.0 NaN 2019-10-26T00:00:06.494Z NaN
1 83512 NaN 23.0 2019-10-26T00:00:06.538Z NaN
2 83512 NaN NaN 2019-10-26T00:00:06.577Z 21.0
3 83512 98.0 NaN 2019-10-26T00:30:06.702Z NaN
4 83512 NaN 23.0 2019-10-26T00:30:06.743Z NaN