我想使用熊猫在文本文件上移动数据,以便为用户轻松解析数据。到目前为止,我已经能够导入几个文本文件,并将数据添加到数据框以及添加标题。我想做的是将数据移到正确的列,但是问题是所有数据都在同一列上。
这是我的数据:
test2218
math-science-physics
00:00:00:00
00:00:30:00
03-21 04:00:00
28
test2228
math
00:00:00:00
00:00:30:00
03-21 04:00:00
26
test2317
reading-comprehension
00:00:00:00
00:00:30:00
03-21 20:02:00
这就是我希望输出显示的样子:
Test ID Test Info Duration_A Duration_B Next Use Participants
test2218 math-science-physics 00:00:00:00 00:00:30:00 03-21 14:00:00 28
test2228 math 00:00:00:00 00:00:30:00 03-21 14:00:00 26
test2317 reading-comprehension 00:00:00:00 00:00:30:00 04-11 13:30:00 2
我到处都是,找不到清晰的答案。有人可以协助吗?
到目前为止,这是我的代码:
import os, glob, pandas as pd
d_frame = []
c_names = ['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next
Use', 'Participants']
files_list = glob.glob(os.path.join('C:\\test', '*.txt'))
for file in files_list:
if os.stat(file).st_size != 0:
df = pd.read_csv(file, delimiter='\t',header=None, names = c_names)
任何对此的见解将不胜感激。预先感谢!
答案 0 :(得分:3)
假设您的数据是一个pandas.DataFrame
对象,并且那6条信息始终以该特定顺序显示,则您可以尝试:
df = pd.DataFrame({0: ['test2218', 'math-science-physics', '00:00:00:00', '00:00:30:00', '03-21 04:00:00', '28', 'test2228', 'math', '00:00:00:00', '00:00:30:00', '03-21 04:00:00', '26', 'test2317', 'reading-comprehension', '00:00:00:00', '00:00:30:00', '03-21 20:02:00']})
columns = ['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next Use', 'Participants']
df_new = pd.DataFrame(df.groupby(df.index // len(columns))[0].apply(list).values.tolist(), columns=columns)
print(df_new)
Test ID Test Info Duration_A Duration_B Next Use Participants
0 test2218 math-science-physics 00:00:00:00 00:00:30:00 03-21 04:00:00 28
1 test2228 math 00:00:00:00 00:00:30:00 03-21 04:00:00 26
2 test2317 reading-comprehension 00:00:00:00 00:00:30:00 03-21 20:02:00 None
或者
df_new = pd.DataFrame(df.values.reshape(-1, len(columns)), columns=columns)
答案 1 :(得分:3)
这是使用numpy.reshape
的一种简单方法:
import numpy as np
import pandas as pd
pd.DataFrame(np.reshape(df.values, (len(df) // 6, 6)),
columns=['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next Use', 'Participants'])
Test ID Test Info Duration_A Duration_B Next Use Participants
0 test2218 math-science-physics 00:00:00:00 00:00:30:00 03-21 04:00:00 28
1 test2228 math 00:00:00:00 00:00:30:00 03-21 04:00:00 26
2 test2317 reading-comprehension 00:00:00:00 00:00:30:00 03-21 20:02:00 2
答案 2 :(得分:1)
import pandas as pd
x= pd.Series(['test2218',
'math-science-physics',
'00:00:00:00',
'00:00:30:00',
'03-21 04:00:00',
'28',
'test2228',
'math',
'00:00:00:00',
'00:00:30:00',
'03-21 04:00:00',
'26',
'test2317',
'reading-comprehension',
'00:00:00:00',
'00:00:30:00',
'03-21 20:02:00',
'55'])
浏览以找到所需的索引
indices = []
for i in range(6):
indices.append(list(range(i, len(x), 6)))
创建一个列列表和空的数据框,然后循环遍历以索引的子集,并分配给该数据框。
columns=['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next Use', 'Participants']
df = pd.DataFrame({})
for col, ixs in zip(columns, indices):
df[col] = x[ixs].reset_index(drop=True)