用户在CSV文件的每列末尾总是有额外的空白。喜欢这个CSV:
847,73.3,809,74.9,655,80.6,694,45.5,647,47.8
848,24.3,810,23.1,656,18.2,695,48.6,648,47.3
566,26.1,541,7.8,438,19.1,463,45.5,433,18.2
567,0.5,542,0.1,439,0.2,464,53.1,434,0.2
426,0.0,407,0.0,330,0.0,348,98.6,326,0.0
...
339,37.9,324,74.9,,,349,1.4,,
340,62.0,325,25.1,,,,,,
341,0.1,326,0.0,,,,,,
使用pandas后转为NaN
pd.read_csv(ref_file)
结果
0 694.0 45.5 647.0 47.8
1 695.0 48.6 648.0 47.3
2 696.0 5.6 649.0 4.8
3 697.0 0.3 650.0 0.2
4 698.0 0.0 432.0 81.6
5 463.0 45.5 433.0 18.2
6 464.0 53.1 434.0 0.2
7 465.0 1.4 324.0 81.6
8 466.0 0.0 325.0 18.4
9 348.0 98.6 326.0 0.0
10 349.0 1.4 NaN NaN
11 NaN NaN NaN NaN
12 NaN NaN NaN NaN
我试过
df.last_valid_index()
但它仅检查第一列。所有这一列都有不同数量的NaN,在这种情况下如何去除NaN?
编辑:我试过.dropna()。根据NaN列的最大数量切割所有行,它不起作用。我想将每个列的数字切割成NaN,并且最后应该有不同的行数。
答案 0 :(得分:2)
如果您希望每列都作为列表,并将这些列表作为系列
df.T.stack().groupby(level=0).apply(list)
0 [847.0, 848.0, 566.0, 567.0, 426.0, 339.0, 340...
1 [73.3, 24.3, 26.1, 0.5, 0.0, 37.9, 62.0, 0.1]
2 [809.0, 810.0, 541.0, 542.0, 407.0, 324.0, 325...
3 [74.9, 23.1, 7.8, 0.1, 0.0, 74.9, 25.1, 0.0]
4 [655.0, 656.0, 438.0, 439.0, 330.0]
5 [80.6, 18.2, 19.1, 0.2, 0.0]
6 [694.0, 695.0, 463.0, 464.0, 348.0, 349.0]
7 [45.5, 48.6, 45.5, 53.1, 98.6, 1.4]
8 [647.0, 648.0, 433.0, 434.0, 326.0]
9 [47.8, 47.3, 18.2, 0.2, 0.0]
dtype: object
否则,如果您希望每行都作为列表。
df.stack().groupby(level=0).apply(list)
0 [847.0, 73.3, 809.0, 74.9, 655.0, 80.6, 694.0,...
1 [848.0, 24.3, 810.0, 23.1, 656.0, 18.2, 695.0,...
2 [566.0, 26.1, 541.0, 7.8, 438.0, 19.1, 463.0, ...
3 [567.0, 0.5, 542.0, 0.1, 439.0, 0.2, 464.0, 53...
4 [426.0, 0.0, 407.0, 0.0, 330.0, 0.0, 348.0, 98...
5 [339.0, 37.9, 324.0, 74.9, 349.0, 1.4]
6 [340.0, 62.0, 325.0, 25.1]
7 [341.0, 0.1, 326.0, 0.0]
dtype: object