将给定的熊猫数据框转换为另一个数据框

时间:2018-11-15 13:26:24

标签: python pandas dataframe

我下面有一个熊猫数据框。这为我提供了从各个点到以下城市的距离(以度为单位),即viz,Fargo,Orange和Jersey City。但是下方数据框中的每一列(例如“ Fargo”)的行号0到3填充了到任何点的最短4个距离,而其余8行则填充了该行,因为我们正在查找4个最短距离到另一个城市“橙色”,依此类推。从下面的数据框中总结

Points = ['Point1','Point4','Point5','Point2','Point2','Point5','Point1','Point4','Point3','Point6','Point4','Point1']
Fargo = [2.90300755828,3.91961324034,21.9825588597,24.3141420303,24.3141420303,21.9825588597,2.90300755828,3.91961324034,25.3599772676,25.8509998739,3.91961324034,2.90300755828]
Orange = [25.5464458592,27.1527975618,6.17298387907,4.80214941294,4.80214941294,6.17298387907,25.5464458592,27.1527975618,46.4066249652,45.8853687976,27.1527975618,25.5464458592]
Jersey_City = [21.1030418227,19.6763385681,39.3194029761,41.8121131045,41.8121131045,39.3194029761,21.1030418227,19.6763385681,2.09632277264,2.67885042284,19.6763385681,21.1030418227]
toy_data=pd.DataFrame(index=Points,columns=['Fargo','Orange','Jersey_City'])
toy_data['Fargo']= Fargo
toy_data['Orange']=Orange
toy_data['Jersey_City']=Jersey_City

让我们说对于列Fargo的前4行:第0行到第3行代表与Fargo的距离最短的点。同样,在Orange列中,第4至7行代表距Orange的距离最短的点,现在在第4至7行中,Fargo列填充了距最近四位的距离指向Orange。但是我想要一个框架,在一个数据框中可以得到到每个城市的距离最短的4个点。因此,您在Fargo列的第0-3行中看到的是其最近的4个点,在Orange列中,第4-7行是其最近的4个点,在Jersey City列中的行8-11是它的4个最近点。我想保留每个城市的这4个最近点,并按照下面的步骤删除其余的点。 我想要的是这个

Fargo = [2.9030075582789885,3.919613240342197,21.982558859743925,24.314142030334484,'NAN','NAN','NAN','NAN','NAN','NAN','NAN','NAN']
Orange = ['NAN','NAN','NAN','NAN',4.802149412942695,6.172983879065276,25.546445859236265,27.15279756182145,'NAN','NAN','NAN','NAN']
Jersey_City = ['NAN','NAN','NAN','NAN','NAN','NAN','NAN','NAN',2.096322772642856,2.67885042283533,19.676338568056806,21.10304182269932]
result_wanted_data =pd.DataFrame(index= Points,columns = ['Fargo','Orange','Jersey_City'])
result_wanted_data['Fargo']=Fargo
result_wanted_data['Orange']=Orange
result_wanted_data['Jersey_City']=Jersey_City

3 个答案:

答案 0 :(得分:1)

您可以做的不完全是我想您想要的,但是我认为这可以解决目的:

newdf=np.empty([12])

for i in range(12):
    newdf[i]=data.iloc[i,[(math.ceil((i+1)/4))]]
newdf1=[]
cities=list(data.columns.values[1:])
for i in range(12):
     newdf1.append(cities[(math.ceil((i+1)/4)-1)])
strs = ["" for x in range(12)]  
for i in range(12):

    strs[i]=data.iloc[i,0]

final_data=pd.DataFrame(columns=['city','point','distance' ])
final_data['city']=newdf1
final_data['distance']=newdf
final_data['point']=strs 

答案 1 :(得分:1)

您可以使用docker run -v和for循环:

np.split()

答案 2 :(得分:1)

您可以使用以下内容:

intervals = np.array_split(np.arange(toy_data.shape[0]), 3)
df = pd.DataFrame(columns=['Distances'], index=toy_data.reset_index().index)
for i, j in zip(range(toy_data.shape[1]), intervals):
    df.loc[j,'Distances'] = toy_data.reset_index(drop=True).iloc[j,i]

print(df)

    Distances
0    2.90301
1    3.91961
2    21.9826
3    24.3141
4    4.80215
5    6.17298
6    25.5464
7    27.1528
8    2.09632
9    2.67885
10   19.6763
11    21.103