我是一般的编程新手,所以请解释一下。 总体目标:我正在处理x,y,z数据。我想减少每个单元格中的点数(可以在项目上进行可变大小的描述),让我们说50而不影响平均值。 问题:我有df与x,y,z,binnumber,我想要生成任一字典(ex binnumber:[x,y,z],[x,y,z] .....这是在这个bin里面),或者我可以使用df作为子数据集的一些方式,以便我可以使用。 我做了什么:
`# import the data
import pandas as pd
import numpy as np
from scipy.stats import binned_statistic_2d
inputpath=input("write the file path:")
Data = pd.read_csv(inputpath, index_col=False, header= None, names =
['X','Y', 'Z'],skip_blank_lines=True) # file name , index =False means
without index , names are the columns names
Data = pd.DataFrame(Data)
# creating the grid cells
min_x = int(min(Data['X']))
max_x = int(max(Data['X'])+1)
min_y = int(min(Data['Y']))
max_y = int(max(Data['Y'])+1)
bin_size = float(input('write the cell size:'))
bx= int(((max_x-min_x)//bin_size)+1)
by=int(((max_y-min_y)//bin_size)+1)
xedges = np.linspace(min_x, max_x, bx, dtype=int)
yedges = np.linspace(min_y, max_y, by, dtype=int)
# assign the data to the cells
count, x_edge,y_edge,binnumber= binned_statistic_2d(Data['X'], Data['Y'],
Data['Z'],bins=(xedges, yedges))
Data['binnumber']= binnumber
# sub sets
subsets = dict(Data.groupby('binnumber'))
print (subsets)
这不起作用...... 另一种解决方案是处理细胞本身,但它也不起作用。
cells= {}
for i in xedges:
for j in yedges:
cells[str(i),str(j)]=[]
print(cells.keys())
for x in Data.X:
for y in Data.Y:
for z in Data.Z:
for k,v in cells.keys():
if x>= int(k[0]) and x < int(k[0]) +1 and y>= int(k[1]) and y
< int(k[1]) +1:
k=(x,y,z)
else:
cells=('0')
print(cells)
感谢您提供任何帮助。
答案 0 :(得分:0)
import the data
import pandas as pd
import numpy as np
from scipy.stats import binned_statistic_2d
inputpath=input("write the file path:")
Data = pd.read_csv(inputpath, index_col=False, header= None, names =
['X','Y', 'Z'],skip_blank_lines=True) # file name , index =False means
without index , names are the columns names
Data = pd.DataFrame(Data)
# creating the grid cells
min_x = int(min(Data['X']))
max_x = int(max(Data['X'])+1)
min_y = int(min(Data['Y']))
max_y = int(max(Data['Y'])+1)
bin_size = float(input('write the cell size:'))
bx= int(((max_x-min_x)//bin_size)+1)
by=int(((max_y-min_y)//bin_size)+1)
xedges = np.linspace(min_x, max_x, bx, dtype=int)
yedges = np.linspace(min_y, max_y, by, dtype=int)
# assign the data to the cells
count, x_edge,y_edge,binnumber= binned_statistic_2d(Data['X'], Data['Y'],
Data['Z'],bins=(xedges, yedges))
Data['binnumber']= binnumber
# making dictionary with >>> binnumber: all associated points......
Data['value'] = list(zip(Data['X'], Data['Y'], Data['Z']))
d = defaultdict(list)
for idx, row in Data.iterrows():
d[row['binnumber']].append(row['value'])