我有一个csv文件,所有数据都列在该列中,我想将该列中的数字数据分成几列。 我拥有的数据(读取到数据框之后)如下所示:
0
0 13:25:09 -> mm [ -5, 4, 15 ] dd [ 4, 77, 8 ]
1 13:25:09 -> mm [ -4, 9, 10 ] dd [ 8, 6, 10 ]
2 13:25:09 -> mm [ 0, -4, 19 ] dd [ 3, 1, 66 ]
我该怎么办?
答案 0 :(得分:0)
我相信您需要Series.str.extractall
和Series.unstack
:
df = df[0].str.extractall('(\d+)')[0].unstack()
print (df)
match 0 1 2 3 4 5 6 7 8
0 13 25 09 5 4 15 4 77 8
1 13 25 09 4 9 10 8 6 10
2 13 25 09 0 4 19 3 1 66
答案 1 :(得分:0)
具有此csv文件
csvfile = '''13:25:09 -> mm [ -5, 4, 15 ] dd [ 4, 77, 8 ]
13:25:09 -> mm [ -4, 9, 10 ] dd [ 8, 6, 10 ]
13:25:09 -> mm [ 0, -4, 19 ] dd [ 3, 1, 66 ]'''
这样做
import pandas as pd
lines = csvfile.split('\n')
df = pd.DataFrame(lines)
您得到错误的结果:
0
0 13:25:09 -> mm [ -5, 4, 15 ] dd [ 4, 77, 8 ]
1 13:25:09 -> mm [ -4, 9, 10 ] dd [ 8, 6, 10 ]
2 13:25:09 -> mm [ 0, -4, 19 ] dd [ 3, 1, 66 ]
您应该这样做:
import pandas as pd
lines = csvfile.split('\n')
df = pd.DataFrame({'id': [1,2,3],
'time': [line[:8] for line in lines],
'mm': [line[15:30] for line in lines],
'dd': [line[34:50] for line in lines]})
你会得到
id time mm dd
0 1 13:25:09 [ -5, 4, 15 ] [ 4, 77, 8 ]
1 2 13:25:09 [ -4, 9, 10 ] [ 8, 6, 10 ]
2 3 13:25:09 [ 0, -4, 19 ] [ 3, 1, 66 ]
请注意, mm 将是一个字符串
print(type(df['mm'][0]))
<class 'str'>
最好有一个整数列表
df['mm_list'] = df['mm'].str.replace('[', '').str.replace(']', '').str.split(',').values.tolist()
df['mm_list_int'] = [[int(i) for i in x] for x in df['mm_list']]
df
导致一个新列 mm_list_int
id time mm dd mm_list mm_list_int
0 1 13:25:09 [ -5, 4, 15 ] [ 4, 77, 8 ] [ -5, 4, 15 ] [-5, 4, 15]
1 2 13:25:09 [ -4, 9, 10 ] [ 8, 6, 10 ] [ -4, 9, 10 ] [-4, 9, 10]
2 3 13:25:09 [ 0, -4, 19 ] [ 3, 1, 66 ] [ 0, -4, 19 ] [0, -4, 19]
类型正确
print(type(df['mm_list_int'][0]))
<class 'list'>
print(type(df['mm_list_int'][0][0]))
<class 'int'>
这是整数列表
使用
objs = [df, pd.DataFrame(df['mm_list_int'].tolist(), columns=['mm_x', 'mm_y', 'mm_z'])]
df_final = pd.concat(objs, axis=1)
df_final = df_final[['id', 'time', 'mm', 'dd', 'mm_x', 'mm_y', 'mm_z']]
获取
id time mm dd mm_x mm_y mm_z
0 1 13:25:09 [ -5, 4, 15 ] [ 4, 77, 8 ] -5 4 15
1 2 13:25:09 [ -4, 9, 10 ] [ 8, 6, 10 ] -4 9 10
2 3 13:25:09 [ 0, -4, 19 ] [ 3, 1, 66 ] 0 -4 19
对 dd 做同样的操作,
df['dd_list'] = df['dd'].str.replace('[', '').str.replace(']', '').str.split(',').values.tolist()
df['dd_list_int'] = [[int(i) for i in x] for x in df['dd_list']]
objs = [df,
pd.DataFrame(df['mm_list_int'].tolist(), columns=['mm_x', 'mm_y', 'mm_z']),
pd.DataFrame(df['dd_list_int'].tolist(), columns=['dd_x', 'dd_y', 'dd_z'])]
df_final = pd.concat(objs, axis=1)
df_final = df_final[['id', 'time', 'mm_x', 'mm_y', 'mm_z', 'dd_x', 'dd_y', 'dd_z']]
最终结果
id time mm_x mm_y mm_z dd_x dd_y dd_z
0 1 13:25:09 -5 4 15 4 77 8
1 2 13:25:09 -4 9 10 8 6 10
2 3 13:25:09 0 -4 19 3 1 66