我有一个数据集,该数据集已加载到pandas数据框中,其中的一列似乎是JSON格式(不确定),我想提取该列的信息并将其放在同一数据框的其他列中。
我已经尝试过read_json
,规范化和其他python函数,但无法实现自己的目标...
这就是我尝试过的:
x = {'latitude': '47.61219025', 'needs_recoding': False, 'human_address': '{""address"":""405 OLIVE WAY"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33799744'}
print (x.get('latitude'))
print (x.get('longitude')) this works for one line only.
也尝试过:
s = data2015.groupby('OSEBuildingID')['Location'].apply(lambda x: x.tolist())
print(s)
pd.read_json(s,typ='series',orient='records')
但我收到此错误:
ValueError:无效的文件路径或缓冲区对象类型
加载数据框:
data2015 = pd.read_csv(filepath_or_buffer=r'C:\Users\mehdi\OneDrive\Documents\OpenClassRooms\Projet 3\2015-building-energy-benchmarking\2015-building-energy-benchmarking.csv', delimiter=",",low_memory=False)
文件内容示例:
OSEBuildingID,DataYear,BuildingType,PrimaryPropertyType,PropertyName,TaxParcelIdentificationNumber,Location,CouncilDistrictCode,Neighborhood,YearBuilt,NumberofBuildings,NumberofFloors,PropertyGFATotal,PropertyGFAParking,PropertyGFABuilding(s),ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA,YearsENERGYSTARCertified,ENERGYSTARScore,SiteEUI(kBtu/sf),SiteEUIWN(kBtu/sf),SourceEUI(kBtu/sf),SourceEUIWN(kBtu/sf),SiteEnergyUse(kBtu),SiteEnergyUseWN(kBtu),SteamUse(kBtu),Electricity(kWh),Electricity(kBtu),NaturalGas(therms),NaturalGas(kBtu),OtherFuelUse(kBtu),GHGEmissions(MetricTonsCO2e),GHGEmissionsIntensity(kgCO2e/ft2),DefaultData,Comment,ComplianceStatus,Outlier
1,2015,NonResidential,Hotel,MAYFLOWER PARK HOTEL,659000030,"{'latitude': '47.61219025', 'needs_recoding': False, 'human_address': '{""address"":""405 OLIVE WAY"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33799744'}",7,DOWNTOWN,1927,1,12,88434,0,88434,Hotel,Hotel,88434,,,,,,65,78.90,80.30,173.50,175.10,6981428,7097539,2023032,1080307,3686160,12724,1272388,0,249.43,2.64,No,,Compliant,
2,2015,NonResidential,Hotel,PARAMOUNT HOTEL,659000220,"{'latitude': '47.61310583', 'needs_recoding': False, 'human_address': '{""address"":""724 PINE ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33335756'}",7,DOWNTOWN,1996,1,11,103566,15064,88502,"Hotel, Parking, Restaurant",Hotel,83880,Parking,15064,Restaurant,4622,,51,94.40,99.00,191.30,195.20,8354235,8765788,0,1144563,3905411,44490,4448985,0,263.51,2.38,No,,Compliant,
3,2015,NonResidential,Hotel,WESTIN HOTEL,659000475,"{'latitude': '47.61334897', 'needs_recoding': False, 'human_address': '{""address"":""1900 5TH AVE"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33769944'}",7,DOWNTOWN,1969,1,41,961990,0,961990,"Hotel, Parking, Swimming Pool",Hotel,757243,Parking,100000,Swimming Pool,0,,18,96.60,99.70,242.70,246.50,73130656,75506272,19660404,14583930,49762435,37099,3709900,0,2061.48,1.92,Yes,,Compliant,
5,2015,NonResidential,Hotel,HOTEL MAX,659000640,"{'latitude': '47.61421585', 'needs_recoding': False, 'human_address': '{""address"":""620 STEWART ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33660889'}",7,DOWNTOWN,1926,1,10,61320,0,61320,Hotel,Hotel,61320,,,,,,1,460.40,462.50,636.30,643.20,28229320,28363444,23458518,811521,2769023,20019,2001894,0,1936.34,31.38,No,,Compliant,High Outlier
8,2015,NonResidential,Hotel,WARWICK SEATTLE HOTEL,659000970,"{'latitude': '47.6137544', 'needs_recoding': False, 'human_address': '{""address"":""401 LENORA ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98121""}', 'longitude': '-122.3409238'}",7,DOWNTOWN,1980,1,18,119890,12460,107430,"Hotel, Parking, Swimming Pool",Hotel,123445,Parking,68009,Swimming Pool,0,,67,120.10,122.10,228.80,227.10,14829099,15078243,0,1777841,6066245,87631,8763105,0,507.7,4.02,No,,Compliant,
9,2015,Nonresidential COS,Other,WEST PRECINCT (SEATTLE POLICE),660000560,"{'latitude': '47.6164389', 'needs_recoding': False, 'human_address': '{""address"":""810 VIRGINIA ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33676431'}",7,DOWNTOWN,1999,1,2,97288,37198,60090,Police Station,Police Station,88830,,,,,,,135.70,146.90,313.50,321.60,12051984,13045258,0,2130921,7271004,47813,4781283,0,304.62,2.81,No,,Compliant,
10,2015,NonResidential,Hotel,CAMLIN WORLDMARK HOTEL,660000825,"{'latitude': '47.6141141', 'needs_recoding': False, 'human_address': '{""address"":""1619 9TH AVE"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33274086'}",7,DOWNTOWN,1926,1,11,83008,0,83008,Hotel,Hotel,81352,,,,,,25,76.90,79.60,149.50,158.20,6252842,6477493,0,785342,2679698,35733,3573255,0,208.46,2.37,No,,Compliant,
11,2015,NonResidential,Other,PARAMOUNT THEATER,660000955,"{'latitude': '47.61290234', 'needs_recoding': False, 'human_address': '{""address"":""901 PINE ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33130949'}",7,DOWNTOWN,1926,1,8,102761,0,102761,Other - Entertainment/Public Assembly,Other - Entertainment/Public Assembly,102761,,,,,,,62.50,71.80,152.20,160.40,6426022,7380086,2003108,1203937,4108004,3151,315079,0,199.99,1.77,No,,Compliant,
数据框:
我希望至少有另一个数据框,其中包含以下列:纬度,needs_recoding,human_address和经度。
答案 0 :(得分:0)
也许有更好的方法,但是我只是遍历行并将json字符串解析为单个数据部分,然后放回一个数据帧中。然后,您可以使用.to_csv()
进行保存:
import pandas as pd
import json
import ast
data2015 = pd.read_csv('C:/test.csv', delimiter=",",low_memory=False)
results = pd.DataFrame()
for idx, row in data2015.iterrows():
data_dict = ast.literal_eval(row['Location'])
lat = data_dict['latitude']
lon = data_dict['longitude']
need_recode = data_dict['needs_recoding']
normalize = pd.Series(json.loads(data_dict['human_address']))
row = row.drop('Location')
cols = list(row.index) + ['latitude', 'longitude', 'need_recoding'] + list(normalize.index)
temp_df = pd.DataFrame([list(row) + [lat, lon, need_recode] + list(normalize)], columns = cols )
results = results.append(temp_df).reset_index(drop=True)
输出:
print (results.to_string())
OSEBuildingID DataYear BuildingType PrimaryPropertyType PropertyName TaxParcelIdentificationNumber CouncilDistrictCode Neighborhood YearBuilt NumberofBuildings NumberofFloors PropertyGFATotal PropertyGFAParking PropertyGFABuilding(s) ListOfAllPropertyUseTypes LargestPropertyUseType LargestPropertyUseTypeGFA SecondLargestPropertyUseType SecondLargestPropertyUseTypeGFA ThirdLargestPropertyUseType ThirdLargestPropertyUseTypeGFA YearsENERGYSTARCertified ENERGYSTARScore SiteEUI(kBtu/sf) SiteEUIWN(kBtu/sf) SourceEUI(kBtu/sf) SourceEUIWN(kBtu/sf) SiteEnergyUse(kBtu) SiteEnergyUseWN(kBtu) SteamUse(kBtu) Electricity(kWh) Electricity(kBtu) NaturalGas(therms) NaturalGas(kBtu) OtherFuelUse(kBtu) GHGEmissions(MetricTonsCO2e) GHGEmissionsIntensity(kgCO2e/ft2) DefaultData Comment ComplianceStatus Outlier latitude longitude need_recoding address city state zip
0 1 2015 NonResidential Hotel MAYFLOWER PARK HOTEL 659000030 7 DOWNTOWN 1927 1 12 88434 0 88434 Hotel Hotel 88434 NaN NaN NaN NaN NaN 65.0 78.9 80.3 173.5 175.1 6981428 7097539 2023032 1080307 3686160 12724 1272388 0 249.43 2.64 No NaN Compliant NaN 47.61219025 -122.33799744 False 405 OLIVE WAY SEATTLE WA 98101
1 2 2015 NonResidential Hotel PARAMOUNT HOTEL 659000220 7 DOWNTOWN 1996 1 11 103566 15064 88502 Hotel, Parking, Restaurant Hotel 83880 Parking 15064.0 Restaurant 4622.0 NaN 51.0 94.4 99.0 191.3 195.2 8354235 8765788 0 1144563 3905411 44490 4448985 0 263.51 2.38 No NaN Compliant NaN 47.61310583 -122.33335756 False 724 PINE ST SEATTLE WA 98101
2 3 2015 NonResidential Hotel WESTIN HOTEL 659000475 7 DOWNTOWN 1969 1 41 961990 0 961990 Hotel, Parking, Swimming Pool Hotel 757243 Parking 100000.0 Swimming Pool 0.0 NaN 18.0 96.6 99.7 242.7 246.5 73130656 75506272 19660404 14583930 49762435 37099 3709900 0 2061.48 1.92 Yes NaN Compliant NaN 47.61334897 -122.33769944 False 1900 5TH AVE SEATTLE WA 98101
3 5 2015 NonResidential Hotel HOTEL MAX 659000640 7 DOWNTOWN 1926 1 10 61320 0 61320 Hotel Hotel 61320 NaN NaN NaN NaN NaN 1.0 460.4 462.5 636.3 643.2 28229320 28363444 23458518 811521 2769023 20019 2001894 0 1936.34 31.38 No NaN Compliant High Outlier 47.61421585 -122.33660889 False 620 STEWART ST SEATTLE WA 98101
4 8 2015 NonResidential Hotel WARWICK SEATTLE HOTEL 659000970 7 DOWNTOWN 1980 1 18 119890 12460 107430 Hotel, Parking, Swimming Pool Hotel 123445 Parking 68009.0 Swimming Pool 0.0 NaN 67.0 120.1 122.1 228.8 227.1 14829099 15078243 0 1777841 6066245 87631 8763105 0 507.70 4.02 No NaN Compliant NaN 47.6137544 -122.3409238 False 401 LENORA ST SEATTLE WA 98121
5 9 2015 Nonresidential COS Other WEST PRECINCT (SEATTLE POLICE) 660000560 7 DOWNTOWN 1999 1 2 97288 37198 60090 Police Station Police Station 88830 NaN NaN NaN NaN NaN NaN 135.7 146.9 313.5 321.6 12051984 13045258 0 2130921 7271004 47813 4781283 0 304.62 2.81 No NaN Compliant NaN 47.6164389 -122.33676431 False 810 VIRGINIA ST SEATTLE WA 98101
6 10 2015 NonResidential Hotel CAMLIN WORLDMARK HOTEL 660000825 7 DOWNTOWN 1926 1 11 83008 0 83008 Hotel Hotel 81352 NaN NaN NaN NaN NaN 25.0 76.9 79.6 149.5 158.2 6252842 6477493 0 785342 2679698 35733 3573255 0 208.46 2.37 No NaN Compliant NaN 47.6141141 -122.33274086 False 1619 9TH AVE SEATTLE WA 98101
7 11 2015 NonResidential Other PARAMOUNT THEATER 660000955 7 DOWNTOWN 1926 1 8 102761 0 102761 Other - Entertainment/Public Assembly Other - Entertainment/Public Assembly 102761 NaN NaN NaN NaN NaN NaN 62.5 71.8 152.2 160.4 6426022 7380086 2003108 1203937 4108004 3151 315079 0 199.99 1.77 No NaN Compliant NaN 47.61290234 -122.33130949 False 901 PINE ST SEATTLE WA 98101