如何循环多个DataFrame并生成多个csv?

时间:2017-05-03 07:55:47

标签: python

从R更改为Python我在使用多个DataFrame列表中的pandas编写多个csv时遇到了一些困难:

import pandas
from dplython import (DplyFrame, X, diamonds, select, sift, sample_n,
                  sample_frac, head, arrange, mutate, group_by, summarize,
                  DelayFunction)

diamonds = [diamonds, diamonds, diamonds]
path = "/user/me/" 

def extractDiomands(path, diamonds):
    for each in diamonds:
    df = DplyFrame(each) >> select(X.carat, X.cut, X.price) >> head(5)
    df = pd.DataFrame(df) # not sure if that is required
    df.to_csv(os.path.join('.csv', each))

extractDiomands(path,diamonds)

然而,这会产生错误。感谢任何建议!

1 个答案:

答案 0 :(得分:1)

欢迎使用Python!首先,我将加载一些库并下载示例数据集。

import os
import pandas as pd

example_data =  pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
print(example_data.head(5))

我们的示例数据的前几行:

   admit  gre   gpa  rank
0      0  380  3.61     3
1      1  660  3.67     3
2      1  800  4.00     1
3      1  640  3.19     4
4      0  520  2.93     4

现在我想你想做什么:

# spawn a few datasets to loop through
df_1, df_2, df_3 = example_data.head(20), example_data.tail(20), example_data.head(10)
list_of_datasets = [df_1, df_2, df_3]

output_path = 'scratch'
# in Python you can loop through collections of items directly, its pretty cool.
# with enumerate(), you get the index and the item from the sequence, each step through
for index, dataset in enumerate(list_of_datasets):

    # Filter to keep just a couple columns
    keep_columns =   ['gre', 'admit']
    dataset = dataset[keep_columns]

    # Export to CSV
    filepath = os.path.join(output_path, 'dataset_'+str(index)+'.csv')
    dataset.to_csv(filepath)

最后,我的文件夹'scratch'有三个新的csv,名为dataset_0.csvdataset_1.csvdataset_2.csv