Assign Unique Value to Vield in Duplicate Records Group during GroupingBy
Question
According to the reply provided by devReddit here, I did grouping of CSV records (same client names) of following test file (fake data):
CSV test file
id,name,mother,birth,center
1,AntonioCarlosdaSilva,AnadaSilva,2008/03/31,1
2,CarlosRobertodeSouza,AmáliaMariadeSouza,2004/12/10,1
3,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2
4,DanilodaSilvaCardoso,SôniadePaulaCardoso,2002/08/10,3
5,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4
6,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2
7,AntonioCarlosdaSilva,AnadaSilva,2008/03/31,1
8,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4
9,RosanaPereiradeCampos,IvanaMariadeCampos,2002/07/16,3
10,PaulaCristinadeAbreu,CristinaPereiradeAbreu,2014/10/25,2
11,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2
12,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4
Client Entity
packageentities;
publicclassClient{
privateStringid;
privateStringname;
privateStringmother;
privateStringbirth;
privateStringcenter;
publicClient(){
}
publicClient(Stringid,Stringname,Stringmother,Stringbirth,Stringcenter){
this.id=id;
this.name=name;
this.mother=mother;
this.birth=birth;
this.center=center;
}
publicStringgetId(){
returnid;
}
publicvoidsetId(Stringid){
this.id=id;
}
publicStringgetName(){
returnname;
}
publicvoidsetName(Stringname){
this.name=name;
}
publicStringgetMother(){
returnmother;
}
publicvoidsetMother(Stringmother){
this.mother=mother;
}
publicStringgetBirth(){
returnbirth;
}
publicvoidsetBirth(Stringbirth){
this.birth=birth;
}
publicStringgetCenter(){
returncenter;
}
publicvoidsetCenter(Stringcenter){
this.center=center;
}
@Override
publicStringtoString(){
return"Client[id="+id+",name="+name+",mother="+mother+",birth="+birth+",center="+center
+"]";
}
}
Program
packageapplication;
importjava.io.IOException;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.LinkedHashMap;
importjava.util.List;
importjava.util.Map;
importjava.util.function.Function;
importjava.util.regex.Pattern;
importjava.util.stream.Collectors;
importentities.Client;
publicclassProgram{
publicstaticvoidmain(String[]args)throwsIOException{
Patternpattern=Pattern.compile(",");
List<Client>file=Files.lines(Paths.get("src/Client.csv"))
.skip(1)
.map(line->{
String[]fields=pattern.split(line);
returnnewClient(fields[0],fields[1],fields[2],fields[3],fields[4]);
})
.collect(Collectors.toList());
Map<String,List<Client>>grouped=file
.stream()
.filter(x->file.stream().anyMatch(y->isDuplicate(x,y)))
.collect(Collectors.toList())
.stream()
.collect(Collectors.groupingBy(p->p.getCenter(),LinkedHashMap::new,Collectors.mapping(Function.identity(),Collectors.toList())));
grouped.entrySet().forEach(System.out::println);
}
}
privatestaticBooleanisDuplicate(Clientx,Clienty){
return!x.getId().equals(y.getId())
&&x.getName().equals(y.getName())
&&x.getMother().equals(y.getMother())
&&x.getBirth().equals(y.getBirth());
}
Final Result (Grouped by Center)
1=[Client[id=1,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1],
Client[id=7,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1]]
2=[Client[id=3,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[id=5,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[id=6,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[id=8,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[id=11,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[id=12,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2]]
What I Need
I need to assign a unique value to each group of repeated records, starting over each time center value changes, even keeping the records together, since map does not guarantee this, according to the example below:
Numbers at left show the grouping by center (1 and 2). Repeated names have the same inner group number and start from "1". When the center number changes, the inner group numbers should be restarted from "1" again and so on.
1=[Client[group=1,id=1,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1],
Client[group=1,id=7,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1]]
//CENTERCHANGED(2)-Restartinnergroupnumberto"1"again.
2=[Client[group=1,id=3,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[group=1,id=6,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[group=1,id=11,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
//NAMECHANGED,BUTSAMECENTERYET-soincreasesby"1"(group=2)
Client[group=2,id=5,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[group=2,id=8,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[group=2,id=12,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2]]
Answer
The task requires to group the CSV file by center and sort name in each group in ascending order. The code will be very long if you try to do it in Java.
It is simple to get it done using SPL, the open-source Java package. Only one line of code is enough:
A |
|
1 |
=file("client.csv":"UTF-8").import@ct().sort(center,name).derive(ranki(name;center):group) |
SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as dense_rank.splx and invoke it in Java as you call a stored procedure:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call dense_rank ()");
st.execute();
…
Or execute the SPL string within a Java program as we execute a SQL statement:
…
st = con.prepareStatement("==file(\"client.csv\":\"UTF-8\").import@ct().sort(center,name).derive(ranki(name;center):group)");
st.execute();
…
View SPL source code.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version