删除单引号&分开的括号

时间:2018-04-03 15:25:23

标签: python python-3.x pandas csv dataframe

我是Python的新手,正在玩数据集。我需要帮助尝试: 1.删​​除括号内日期的单引号 2.将括号分成数组(31312 x 4)

代码:     导入csv     导入numpy为np     将pandas导入为pd

text_file = open("Claims1.txt", "r")
dfl = DataFrameList = list(text_file)
text_file.close()


dfl_string = "\n".join(str(e) for e in dfl)
dfl_split = dfl_string.replace('),', ')//').split('//')


my_df = pd.DataFrame(dfl_split)
#Output into CSV file
my_df.to_csv('output.csv')



Current Result:
     0
0   (1,'2000-01-04',328647,5000)
1   (2,'2000-01-09',465858,5000)
2   (3,'2000-01-09',378115,5000)
3   (4,'2000-01-14',121895,5000)
4   (5,'2000-01-16',325172,5000)
5   (6,'2000-01-16',156062,5000)
6   (7,'2000-01-17',472142,5000)
...............................
31312 (31312, '2004-05-30',340406, 5000)

Desired Result:
    0       1         2      3
0   1  2000-01-04  328647  5000
1   2  2000-01-09  465858  5000
2   3  2000-01-09  378115  5000
3   4  2000-01-14  121895  5000
4   5  2000-01-16  325172  5000
5   6  2000-01-16  156062  5000
6   7  2000-01-17  472142  5000
..............................
31312 31312 '2004-05-30'340406 5000

4 个答案:

答案 0 :(得分:2)

假设您在数据框中包含数据,可以使用pd.Series.apply拆分成列:

import pandas as pd

df = pd.DataFrame({0:[(1,'2000-01-04',328647,5000),
                      (2,'2000-01-09',465858,5000),
                      (3,'2000-01-09',378115,5000),
                      (4,'2000-01-14',121895,5000),
                      (5,'2000-01-16',325172,5000),
                      (6,'2000-01-16',156062,5000),
                      (7,'2000-01-17',472142,5000)]})

df[[0, 1, 2, 3]] = df[0].apply(pd.Series)

#    0           1       2     3
# 0  1  2000-01-04  328647  5000
# 1  2  2000-01-09  465858  5000
# 2  3  2000-01-09  378115  5000
# 3  4  2000-01-14  121895  5000
# 4  5  2000-01-16  325172  5000
# 5  6  2000-01-16  156062  5000
# 6  7  2000-01-17  472142  5000

答案 1 :(得分:2)

来自jpp的数据应该很快

pd.DataFrame(df[0].tolist())
Out[779]: 
   0           1       2     3
0  1  2000-01-04  328647  5000
1  2  2000-01-09  465858  5000
2  3  2000-01-09  378115  5000
3  4  2000-01-14  121895  5000
4  5  2000-01-16  325172  5000
5  6  2000-01-16  156062  5000
6  7  2000-01-17  472142  5000

答案 2 :(得分:1)

您可以将结果转换为这样的数据框(注意data这是您当前的结果):

import pandas as pd

data = [(1,'2000-01-04',328647,5000),
        (2,'2000-01-09',465858,5000),
        (3,'2000-01-09',378115,5000),
        (4,'2000-01-14',121895,5000),
        (5,'2000-01-16',325172,5000),
        (6,'2000-01-16',156062,5000),
        (7,'2000-01-17',472142,5000)]

df = pd.DataFrame(data, columns=[0, 1, 2, 3])
print(df)

#   0          1      2    3 
# 0 1 2000-01-04 328647 5000 
# 1 2 2000-01-09 465858 5000 
# 2 3 2000-01-09 378115 5000 
# 3 4 2000-01-14 121895 5000 
# 4 5 2000-01-16 325172 5000 
# 5 6 2000-01-16 156062 5000 
# 6 7 2000-01-17 472142 5000

答案 3 :(得分:0)

您可能获得ValueError的原因可能是因为您的每个输入都是string。因此,您将获得列表而不是列表列表

尝试:

import pandas as pd
import ast
y = [ast.literal_eval(i) for i in dfl_split]
df = pd.DataFrame(y, columns=[0, 1, 2, 3])
print(df)

这样可行。