Question

我有一组大约2万行的数据框。标题为X，Y，Z，I，R，G，B。（是的是它的点云）

我想通过在根据X列进行排序后将数据分组为100行来创建许多子数据帧。随后，我想根据Y列对所有子数据帧进行排序，并将它们进一步细分为50行。（将每个子数据帧进一步细分）最终结果是，我应该在50行中有一组子数据帧，并且我想挑选出每个子数据帧中Z值最高的所有行，并将它们写到CSV文件中。

我的代码已达到以下方法。但是我不确定如何继续。

import pandas as pd
headings = ['x', 'y', 'z']
data = pd.read_table('file.csv', sep=',', skiprows=[0], names=headings)

points = data.sort_values(by=['x'])

Answer 1

考虑一个1000行的虚拟数据框，

from newspaper import Article
from newspaper import fulltext
import requests

with open('myfile.txt',r) as f:
    for line in f:
        #do not forget to strip the trailing new line
        url = line.rstrip("\n")
        a = Article(url, language='pt')
        html = requests.get(url).text
        text = fulltext(html)
        download = a.download()
        parse = a.parse()
        nlp = a.nlp()
        title = a.title
        publish_date = a.publish_date
        authors = a.authors
        keywords = a.keywords
        summary = a.summary

首先，从数据框中提取df.head() # first 5 rows X Y Z I R G B 0 6 6 0 3 7 0 2 1 0 8 3 6 5 9 7 2 8 9 7 3 0 4 5 3 9 6 8 5 1 0 0 4 9 0 3 0 9 2 9的最大值，

{z_max = df['Z'].max() df = df.sort_values('X') # list of dataframes dfs_X = np.split(df, len(df)/ 100) results = pd.DataFrame() for idx, df_x in enumerate(dfs_X): dfs_X[idx] = df_x.sort_values('Y') dfs_Y = np.split(dfs_X[idx], len(dfs_X[idx]) / 50) for idy, df_y in enumerate(dfs_Y): rows = df_y[df_y['Z'] == z_max] results = results.append(rows) results.head()将包含来自所有数据帧的，具有最高值results的行。

输出：前5行

现在，使用X Y Z I R G B 541 0 0 9 0 3 6 2 610 0 2 9 3 0 7 6 133 0 4 9 3 3 9 9 731 0 5 9 5 1 0 2 629 0 5 9 0 9 7 7将此数据帧写入csv。

根据数据框中的1列选择具有最高值的行

1 个答案: