我是Python的新手,正在玩数据集。我需要帮助尝试: 1.删除括号内日期的单引号 2.将括号分成数组(31312 x 4)
代码: 导入csv 导入numpy为np 将pandas导入为pd
text_file = open("Claims1.txt", "r")
dfl = DataFrameList = list(text_file)
text_file.close()
dfl_string = "\n".join(str(e) for e in dfl)
dfl_split = dfl_string.replace('),', ')//').split('//')
my_df = pd.DataFrame(dfl_split)
#Output into CSV file
my_df.to_csv('output.csv')
Current Result:
0
0 (1,'2000-01-04',328647,5000)
1 (2,'2000-01-09',465858,5000)
2 (3,'2000-01-09',378115,5000)
3 (4,'2000-01-14',121895,5000)
4 (5,'2000-01-16',325172,5000)
5 (6,'2000-01-16',156062,5000)
6 (7,'2000-01-17',472142,5000)
...............................
31312 (31312, '2004-05-30',340406, 5000)
Desired Result:
0 1 2 3
0 1 2000-01-04 328647 5000
1 2 2000-01-09 465858 5000
2 3 2000-01-09 378115 5000
3 4 2000-01-14 121895 5000
4 5 2000-01-16 325172 5000
5 6 2000-01-16 156062 5000
6 7 2000-01-17 472142 5000
..............................
31312 31312 '2004-05-30'340406 5000
答案 0 :(得分:2)
假设您在数据框中包含数据,可以使用pd.Series.apply
拆分成列:
import pandas as pd
df = pd.DataFrame({0:[(1,'2000-01-04',328647,5000),
(2,'2000-01-09',465858,5000),
(3,'2000-01-09',378115,5000),
(4,'2000-01-14',121895,5000),
(5,'2000-01-16',325172,5000),
(6,'2000-01-16',156062,5000),
(7,'2000-01-17',472142,5000)]})
df[[0, 1, 2, 3]] = df[0].apply(pd.Series)
# 0 1 2 3
# 0 1 2000-01-04 328647 5000
# 1 2 2000-01-09 465858 5000
# 2 3 2000-01-09 378115 5000
# 3 4 2000-01-14 121895 5000
# 4 5 2000-01-16 325172 5000
# 5 6 2000-01-16 156062 5000
# 6 7 2000-01-17 472142 5000
答案 1 :(得分:2)
来自jpp的数据应该很快
pd.DataFrame(df[0].tolist())
Out[779]:
0 1 2 3
0 1 2000-01-04 328647 5000
1 2 2000-01-09 465858 5000
2 3 2000-01-09 378115 5000
3 4 2000-01-14 121895 5000
4 5 2000-01-16 325172 5000
5 6 2000-01-16 156062 5000
6 7 2000-01-17 472142 5000
答案 2 :(得分:1)
您可以将结果转换为这样的数据框(注意data
这是您当前的结果):
import pandas as pd
data = [(1,'2000-01-04',328647,5000),
(2,'2000-01-09',465858,5000),
(3,'2000-01-09',378115,5000),
(4,'2000-01-14',121895,5000),
(5,'2000-01-16',325172,5000),
(6,'2000-01-16',156062,5000),
(7,'2000-01-17',472142,5000)]
df = pd.DataFrame(data, columns=[0, 1, 2, 3])
print(df)
# 0 1 2 3
# 0 1 2000-01-04 328647 5000
# 1 2 2000-01-09 465858 5000
# 2 3 2000-01-09 378115 5000
# 3 4 2000-01-14 121895 5000
# 4 5 2000-01-16 325172 5000
# 5 6 2000-01-16 156062 5000
# 6 7 2000-01-17 472142 5000
答案 3 :(得分:0)
您可能获得ValueError
的原因可能是因为您的每个输入都是string
。因此,您将获得列表而不是列表列表。
尝试:
import pandas as pd
import ast
y = [ast.literal_eval(i) for i in dfl_split]
df = pd.DataFrame(y, columns=[0, 1, 2, 3])
print(df)
这样可行。