Question

从R更改为Python我在使用多个DataFrame列表中的pandas编写多个csv时遇到了一些困难：

import pandas
from dplython import (DplyFrame, X, diamonds, select, sift, sample_n,
                  sample_frac, head, arrange, mutate, group_by, summarize,
                  DelayFunction)

diamonds = [diamonds, diamonds, diamonds]
path = "/user/me/" 

def extractDiomands(path, diamonds):
    for each in diamonds:
    df = DplyFrame(each) >> select(X.carat, X.cut, X.price) >> head(5)
    df = pd.DataFrame(df) # not sure if that is required
    df.to_csv(os.path.join('.csv', each))

extractDiomands(path,diamonds)

然而，这会产生错误。感谢任何建议！

Answer 1

欢迎使用Python！首先，我将加载一些库并下载示例数据集。

import os
import pandas as pd

example_data =  pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
print(example_data.head(5))

我们的示例数据的前几行：

   admit  gre   gpa  rank
0      0  380  3.61     3
1      1  660  3.67     3
2      1  800  4.00     1
3      1  640  3.19     4
4      0  520  2.93     4

现在我想你想做什么：

# spawn a few datasets to loop through
df_1, df_2, df_3 = example_data.head(20), example_data.tail(20), example_data.head(10)
list_of_datasets = [df_1, df_2, df_3]

output_path = 'scratch'
# in Python you can loop through collections of items directly, its pretty cool.
# with enumerate(), you get the index and the item from the sequence, each step through
for index, dataset in enumerate(list_of_datasets):

    # Filter to keep just a couple columns
    keep_columns =   ['gre', 'admit']
    dataset = dataset[keep_columns]

    # Export to CSV
    filepath = os.path.join(output_path, 'dataset_'+str(index)+'.csv')
    dataset.to_csv(filepath)

最后，我的文件夹'scratch'有三个新的csv，名为dataset_0.csv，dataset_1.csv和dataset_2.csv

如何循环多个DataFrame并生成多个csv？

1 个答案: