Group & Get the Record Holding the Max Value
【Question】
I have an input file:
1A Traes_1AS_6052071D9.1 99.01 101 99.0
1A Traes_1DS_6BA87D1DA.1 96.04 101 99.0
1A Traes_1BS_480915AD0.1 94.06 101 99.0
1B Traes_1AS_49D585BA6.2 99.01 101 72.0
1B Traes_1BS_47F027BBE.2 98.02 101 89.0
1B Traes_1DS_3F816B920.1 97.03 101 92.0
1C Traes_1AS_3451447E0.1 99.01 101 97.0
1C Traes_1BS_9F243CEA6.2 92.93 99 97.0
1C Traes_1DS_2A6443F45.1 89.90 99 97.0
I need to:
1. Group and iterate inside each line[0];
2. Sort line[4] from lowest to highest value and take the highest value;
3. If they are similar, print the results choosing the one that has highest value in line[2], so that my output file looks like this:
Required output:
1A Traes_1AS_6052071D9.1 99.01 101 99.0
1B Traes_1DS_3F816B920.1 97.03 101 92.0
1C Traes_1AS_3451447E0.1 99.01 101 97.0
This is my try, but it only takes according to the highest line[4]:
import csv
from itertools import groupby
from operator import itemgetter
with open('my_file','rb') as f1:
with open('out_file', 'wb') as f2:
reader = csv.reader(f1, delimiter='\t')
writer1 = csv.writer(f2, delimiter='\t')
for group, rows in groupby(reader, itemgetter(0)):
seen = set()
rows = sorted(rows, key=lambda r: float(r[4]))
for row in rows:
max(rows, key=lambda r: float(r[4]))
writer1.writerow(row)
A solution:
Just have the key
function for max
return a tuple of (r[4], r[2])
Slightly simplified example (without output file)
with open('data.txt','rb') as f1:
reader = csv.reader(f1, delimiter='\t')
for group, rows in groupby(reader, itemgetter(0)):
best = max(rows, key=lambda r: (float(r[4]), float(r[2])))
print best
【Answer】
Try using SPL (Structured Process Language) to do this:
A |
|
1 |
=file("e:\\data.txt").import().group(#1).(~.maxp([#5,#3])) |
A1: #1 represents the first column; the sign ~ represents a group after records are divided by #1; maxp() function returns the record containing the max value in column 5 or, if column 5 has same values, the max value in column 3.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL