我有一个大熊猫系列,每行都有不同的调查文字。例如:
df = df.read_csv('survey_data.csv', header=None)
0 a comment
1 another comment
2 this what the person thought
3 what they felt
4 some more
因为我想将系列更改为具有三列的dataframe
,并将其另存为csv
文件。
因此新的df为:
a comment another comment this what the person thought
what they felt some more
我实际上不在乎订单是否混乱。然后,我将其输出到csv
文件中。
我尝试了许多不同的方法,而目前的方法是:
col_cnt = 1
df.dropna(inplace = True) # removing null values to avoid errors
new_df = pd.DataFrame()
data = []
for index, row in df.iterrows():
data.append(row)
if col_cnt == 3: # we have done the three rows
new_df.loc[len(new_df)]=list(data[1], data[2], data[3])
col_cnt = 0
data = [] # clear the list now that you have written it to the new df
col_cnt = col_cnt + 1 #increment col counter for next row
# need to write the remainder somehow
我收到错误消息:IndexError: list index out of range
我找到并修改的这段代码有效!但是我只能以正确的顺序获得两列。不是我想要的三个。将范围内的2更改为3只会返回一列。
new_df = pd.DataFrame()
index = 1
for i in range(0, len(df), 2):
new_df['Column' + str(index)] = df[0].iloc[i:i+3].reset_index(drop=True)
index += 1
答案 0 :(得分:0)
如果您的数据框有6行,如下所示。
GraphSONSerializersV3d0
然后您可以执行此操作以获取所需的0 a comment
1 another comment
2 this what the person thought
3 what they felt
4 some more
5 some more
您可以使用此方法获得带有列名称的数据框
np.reshape(df.values,(-1,3)))