Question

我正在尝试使用以下代码将CSV文件的随机子集导出到新的CSV文件：

with open("DepressionEffexor.csv", "r") as effexor:
    lines = [line for line in effexor]
    random_choice = random.sample(lines, 229)

with open("effexorSample.csv", "w") as sample:
   sample.write("\n".join(random_choice))

但问题是输出CSV文件非常混乱。例如，字段中的某些数据部分在下一行中打印出来。我该如何解决这个问题？另外，我想知道如何使用pandas来解决这个问题而不是CSV。谢谢！

Answer 1

假设你有一个CSV读入熊猫：

df = pandas.read_csv("csvfile.csv")
sample = df.sample(n)
sample.to_csv("sample.csv")

你可以缩短它：

df.sample(n).to_csv("csvfile.csv")

Pandas IO docs提供了更多信息和选项，dataframe.sample方法也是如此。

Answer 2

使用pandas，转换为：

import pandas as pd

#Read the csv file and store it as a dataframe
df = pd.read_csv('DepressionEffexor.csv')

#Shuffle the dataframe and store it
df_shuffled = df.iloc[np.random.permutation(len(df))]

#You can reset the index with the following
df_shuffled.reset_index(drop=True)

您可以稍后拼接数据帧以选择所需内容。

将随机样本从CSV文件导出到新的CSV文件 - 输出很混乱

2 个答案: