Finding Duplicate Values in Multiple Colums in a SQL Table and Count for chars
Question
I have the following table structure:
idname1name2name3...
1HansPeterFrank
2HansFrankPeter
3HansPeterFrank
4PaulPeterHans
I use the following command to display duplicates and counts:
SELECTCOUNT(name1),name1,name2,name3
FROMtable
GROUPBYname1,name2,name3
HAVING(COUNT(name1)>1)AND(COUNT(name2)>1)AND(COUNT(name3)>1)
This command gives me a count of 2. I would like to know how the second line could also be counted as a duplicate.
Unfortunately, the solution to the original question (Finding duplicate values in multiple colums in a SQL table and count) does not work for char.
Answer
You just need to sort rows by all columns, group them, count the duplicates, and find the record with the most duplicates. It is rather a hassle to get this written in SQL, which uses a nested query containing window function. The SQL statement is long and difficult to read. An alternative is to export data out of database and handle it in Python or SPL. SPL, the open-source Java package, is easier to be integrated into a Java program and generate much simpler code. It finishes the task with only three lines of code:
A |
|
1 |
=sqlite.query("select name1,name2,name3 from names") |
2 |
=A1.group([#1,#2,#3].sort();~.len()).maxp(#2) |
3 |
=create(count,name1,name2,name3).record([A2.#2,A2.#1(1),A2.#1(2),A2.#1(3)]) |
View SPL source code.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version