删除文本并在熊猫中保留数字

时间:2019-11-18 10:49:08

标签: python pandas dataframe

我的数据框类似于一个波纹管

enter image description here

我要删除文本并仅从该数据框中的每个列中保留数字

预期的输出类似这样

enter image description here

到目前为止,我已经尝试过

import json
import requests
import pandas as pd
URL = 'https://xxxxx.com'
req = requests.get(URL,auth=('xxx', 'xxx') )
text_data= req.text
json_dict= json.loads(text_data)
df = pd.DataFrame.from_dict(json_dict["measurements"])
cols_to_keep =['source','battery','c8y_TemperatureMeasurement','time','c8y_DistanceMeasurement']
df_final = df[cols_to_keep]
df_final = df_final.rename(columns={'c8y_TemperatureMeasurement': 'Temperature Or T','c8y_DistanceMeasurement':'Distance'})
for col in df_final:
 df_final[col] = [''.join(re.findall("\d*\.?\d+", item)) for item in df_final[col]]

1 个答案:

答案 0 :(得分:1)

您的代码丢失了以pd格式导入大熊猫,并且数据无法访问,因为它需要凭据。

您可以使用pandas.DataFrame.replace

示例数据:

df = pd.DataFrame({'a':['abc123abc', 'def456678'], 'b':['123a', 'b456']})

数据框:

    a           b
0   abc123abc   123a
1   def456678   b456

[^ 0-9。]替换所有非数字字符。

df.replace('[^0-9.]', '', regex=True)

输出:

    a       b
0   123     123
1   456678  456

编辑: 这里的问题实际上是关于嵌套JSON的,而不是关于替换数据帧中的值的。上面的语句不起作用的原因是因为数据已作为dict保存在数据框中。但是,由于上述解决方案通常是正确的,因此不会对其进行编辑。

修订后的答案:

from pandas.io.json import json_normalize
import requests
import pandas as pd

URL = 'https://wastemanagement.post-iot.lu/measurement/measurements?source=83512& pageSize=1000000000&dateFrom=2019-10-26&dateTo=2019-10-28'
req = requests.get(URL,auth=('xxxx', 'xxxx') )
text_data= req.text
json_dict= json.loads(text_data)
df= json_normalize(json_dict['measurements'])
df = df_final.rename(columns={'source.id': 'source', 'battery.percent.value': 'battery', 'c8y_TemperatureMeasurement.T.value': 'Temperature Or T','c8y_DistanceMeasurement.distance.value':'Distance'})
cols_to_keep =['source' ,'battery', 'Temperature Or T', 'time', 'Distance']
df_final = df[cols_to_keep] 

输出:

    source  battery Temperature Or T    time                        Distance
0   83512   98.0    NaN                 2019-10-26T00:00:06.494Z    NaN
1   83512   NaN     23.0                2019-10-26T00:00:06.538Z    NaN
2   83512   NaN     NaN                 2019-10-26T00:00:06.577Z    21.0
3   83512   98.0    NaN                 2019-10-26T00:30:06.702Z    NaN
4   83512   NaN     23.0                2019-10-26T00:30:06.743Z    NaN