Export MongoDB as a CSV File
MongoDB is used for storing unstructured data. It is especially great at storing JSON data. But users who are accustomed to database tables and who need structured data are probably want to convert the unstructured data to structured data for their computations. However, it’s not very convenient to perform a standard conversion with the mainstream languages. In this article, we’ll illustrate how we can do this efficiently in esProc SPL (Structured Process Language) for the NoSQL database through an example.
Here’s part of collection carInfo:
{
"_id" : ObjectId("5518f6f8a82a704fe4216a43"),
"id" : "No1",
"cars" : {
"name" : "Putin",
"car" : ["porche", "bmw"]
}
}
{
"_id" : ObjectId("5518f745a82a704fe4216a44"),
"id" : "No2",
"cars" : {
"name" : "jack",
"car" : ["Toyota", "Jetta", "Audi"]
}
}
……
Now you need to export the collection as a CSV file in the following format:
Below is the SPL script:
A |
|
1 |
=mongo_open("mongodb://localhost:27017/local?user=test&password=test") |
2 |
=mongo_shell(A1,"carInfo.find(,{_id:0})") |
3 |
=A2.conj((t=~,~.cars.car.new(t.id:id, t.cars.name:name, ~:car))) |
4 |
=file("D:\\data.csv").export@t(A3;",") |
5 |
>mongo_close(A1) |
A1: Connect to MongoDB; the connection string syntax is mongo://ip:port/db?arg=value&….
A2: Use find() function to retrieve all fields except for _id from collection carInfo without any filtering condition and return result as a cursor.
A3: Get desired fields, concatenate into a structured two-dimensional table and return it as a cursor. The “~” represents each of A2’s document; cars.car field is split into two rows of the table sequence; conj() function joins up rows vertically.
A4: Export A3’s table as a CSV file. @t option enables an export with column headers. esProc computing engine manages buffers automatically. It retrieves a batch of records from A3’s cursor into the memory for processing.
A5: Close the MongoDB connection.
If you want to customize the data retrieval buffer, just use the following SPL script:
A |
B |
|
1 |
=mongo_open("mongodb://localhost:27017/local?user=test&password=test") |
|
2 |
=mongo_shell(A1,"carInfo.find(,{_id:0})") |
|
3 |
for A2,1000 |
=A3.cars.car.new(A3.id:id, A3.cars.name:name, ~:car) |
4 |
=file("D:\\data.csv").export@ta(B3;",") |
|
5 |
>mongo_close(A1) |
A3: Loop through A2’s cursor to retrieve 1000 records each time into the memory. The for statement’s working range is the indented B3-B4, within which A3 references the loop variable to do the job. Below is A3’s result:
B3: Convert the current batch of records into a structured two-dimensional table:
B4: Export the current batch processing result set into the specified CSV file. @a option enables appending.
In a nutshell, SPL retrieves fields as a cursor, splits each document into rows by car field and concatenate them into a table sequence and appends each table sequence into a specified file. The whole process is clear, concise and efficient.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese Version