来自df

时间:2018-03-06 12:11:55

标签: python pandas numpy matplotlib scipy

我是一般的编程新手,所以请解释一下。 总体目标:我正在处理x,y,z数据。我想减少每个单元格中的点数(可以在项目上进行可变大小的描述),让我们说50而不影响平均值。 问题:我有df与x,y,z,binnumber,我想要生成任一字典(ex binnumber:[x,y,z],[x,y,z] .....这是在这个bin里面),或者我可以使用df作为子数据集的一些方式,以便我可以使用。 我做了什么:

    `# import the data
import pandas as pd
import numpy as np
from scipy.stats import binned_statistic_2d
inputpath=input("write the file path:")
Data = pd.read_csv(inputpath,  index_col=False, header= None, names =
['X','Y', 'Z'],skip_blank_lines=True) # file name , index =False means 
without index , names are the columns names
Data = pd.DataFrame(Data)

# creating the grid cells
min_x = int(min(Data['X'])) 
max_x = int(max(Data['X'])+1)
min_y = int(min(Data['Y']))
max_y = int(max(Data['Y'])+1)
bin_size = float(input('write the cell size:'))
bx= int(((max_x-min_x)//bin_size)+1) 
by=int(((max_y-min_y)//bin_size)+1)
xedges = np.linspace(min_x, max_x, bx, dtype=int) 
yedges = np.linspace(min_y, max_y, by, dtype=int) 

# assign the data to the cells
count, x_edge,y_edge,binnumber= binned_statistic_2d(Data['X'], Data['Y'], 
Data['Z'],bins=(xedges, yedges))
Data['binnumber']= binnumber
# sub sets
subsets = dict(Data.groupby('binnumber'))
print (subsets)

这不起作用...... 另一种解决方案是处理细胞本身,但它也不起作用。

cells= {}
for i in xedges:
    for j in yedges:
        cells[str(i),str(j)]=[]
print(cells.keys())
for x in Data.X:
for y in Data.Y:
    for z in Data.Z:
        for k,v in cells.keys():
            if x>= int(k[0]) and x < int(k[0]) +1 and y>= int(k[1]) and y 
                 < int(k[1]) +1:
                k=(x,y,z)
            else:
                cells=('0')


print(cells) 

感谢您提供任何帮助。

1 个答案:

答案 0 :(得分:0)

import the data
import pandas as pd
import numpy as np
from scipy.stats import binned_statistic_2d

inputpath=input("write the file path:")
Data = pd.read_csv(inputpath,  index_col=False, header= None, names =
['X','Y', 'Z'],skip_blank_lines=True) # file name , index =False means 
without index , names are the columns names
Data = pd.DataFrame(Data)

# creating the grid cells
min_x = int(min(Data['X'])) 
max_x = int(max(Data['X'])+1)
min_y = int(min(Data['Y']))
max_y = int(max(Data['Y'])+1)
bin_size = float(input('write the cell size:'))
bx= int(((max_x-min_x)//bin_size)+1) 
by=int(((max_y-min_y)//bin_size)+1)
xedges = np.linspace(min_x, max_x, bx, dtype=int) 
yedges = np.linspace(min_y, max_y, by, dtype=int) 

# assign the data to the cells
count, x_edge,y_edge,binnumber= binned_statistic_2d(Data['X'], Data['Y'], 
Data['Z'],bins=(xedges, yedges))
Data['binnumber']= binnumber

# making dictionary with >>> binnumber: all associated points......
Data['value'] = list(zip(Data['X'], Data['Y'], Data['Z']))
d = defaultdict(list)
for idx, row in Data.iterrows():
    d[row['binnumber']].append(row['value'])