I'm trying to sort a csv file that looks like this:
filename,field1,field2
10_somefile,0,0
1_somefile,0,0
2_somefile,0,0
3_somefile,0,0
4_somefile,0,0
5_somefile,0,0
6_somefile,0,0
7_somefile,0,0
8_somefile,0,0
9_somefile,0,0
I've referenced code from another thread:
with open(outfname, "rb") as unsorted_file:
csv_f = csv.reader(unsorted_file)
header = next(csv_f, None)
sorted_data = sorted(csv_f, key=operator.itemgetter(0))
with open(outfname, 'wb') as sorted_file:
csv_f = csv.writer(sorted_file, quoting=csv.QUOTE_ALL)
if header:
csv_f.writerow(header)
csv_f.writerows(sorted_data)
However, this won't move the '10_somefile' to the end. How can I sort this such that it uses the number before the underscore as the sorting field?
答案 0 :(得分:1)
This is happening because "10" < "1_". You want to compare integers, not strings. This behavior can be achieved by creating an integer for each line using the characters up to the underscore. Say you can get a string s
(which may be done using the itemgetter as you are currently doing). Then, the following lambda (when passed as key
for sorted
) will do what you want.
key=lambda s: int(s[: (s.index('_'))])))
What this function does is simple: it just returns the integer made up from the characters of s
up to, but not including, the first underscore.
答案 1 :(得分:1)
The key
argument to sorted
is returning the first element of each row as a string, making "10..."
come before "1_..."
. You need to use "natural sorting" instead of this raw sorting.
答案 2 :(得分:1)
Assuming that all your filename
fields start off with a number, the simplest thing you can do is to sort by the integer by parsing it out of the filename.
# Assume this is the data of the CSV after reading it in
filenames = ['10_somefile,0,0',
'1_somefile,0,0',
'2_somefile,0,0',
'3_somefile,0,0',
'4_somefile,0,0',
'5_somefile,0,0',
'6_somefile,0,0',
'7_somefile,0,0',
'8_somefile,0,0',
'9_somefile,0,0']
# Here, we treat the first part of the filename (the number before the underscore) as the sort key.
sorted_data = sorted(filenames, key=lambda l: (int(l.partition('_')[0])))
If you output sorted_data
, it should look like:
['1_somefile,0,0', '2_somefile,0,0', '3_somefile,0,0',
'4_somefile,0,0', '5_somefile,0,0', '6_somefile,0,0',
'7_somefile,0,0', '8_somefile,0,0', '9_somefile,0,0', '10_somefile,0,0']