This article was originally published on DigitalOcean.

Introduction

A CSV is a plain text file format for storing tabular data. The CSV file uses a comma delimiter to separate values in table cells, and a new line delineates where rows begin and end. Most spreadsheet programs and databases can export and import CSV files. Because CSV is a plain-text file, any programming language can parse and write to a CSV file. Node.js has many modules that can work with CSV files, such as node-csv, fast-csv, and papaparse.

In this tutorial, you will use the node-csv module to read a CSV file using Node.js streams, which lets you read large datasets without consuming a lot of memory. You will modify the program to move data parsed from the CSV file into a SQLite database. You will also retrieve data from the database, parse it with node-csv, and use Node.js streams to write it to a CSV file in chunks.

Prerequisites

To follow this tutorial, you will need:

Step 1 — Setting Up the Project Directory

In this section, you will create the project directory and download packages for your application. You will also download a CSV dataset from Stats NZ, which contains international migration data in New Zealand.

To get started, make a directory called csv_demo and navigate into the directory:

  1. mkdir csv_demo
  2. cd csv_demo

Next, initialize the directory as an npm project using the npm init command:

  1. npm init -y

The -y option notifies npm init to say "yes" to all the prompts. This command creates a package.json with default values that you can change anytime.

With the directory initialized as an npm project, you can now install the necessary dependencies: node-csv and node-sqlite3.

Enter the following command to install node-csv:

  1. npm install csv

The node-csv module is a collection of modules that allows you to parse and write data to a CSV file. The command installs all four modules that are part of the node-csv package: csv-generate, csv-parse, csv-stringify, and stream-transform. You will use the csv-parse module to parse a CSV file and the csv-stringify module to write data to a CSV file.

Next, install the node-sqlite3 module:

  1. npm install sqlite3

The node-sqlite3 module allows your app to interact with the SQLite database.

After installing the packages in your project, download the New Zealand migration CSV file with the wget command:

  1. wget https://www.stats.govt.nz/assets/Uploads/International-migration/International-migration-September-2021-Infoshare-tables/Download-data/international-migration-September-2021-estimated-migration-by-age-and-sex-csv.csv

The CSV file you downloaded has a long name. To make it easier to work with, rename the file name to a shorter name using the mv command:

  1. mv international-migration-September-2021-estimated-migration-by-age-and-sex-csv.csv migration_data.csv

The new CSV filename, migration_data.csv, is shorter and easier to work with.

Using nano, or your favorite text editor, open the file:

  1. nano migration_data.csv

Once open, you will see contents similar to this:

demo_csv/migration_data.csv
year_month,month_of_release,passenger_type,direction,sex,age,estimate,standard_error,status
2001-01,2020-09,Long-term migrant,Arrivals,Female,0-4 years,344,0,Final
2001-01,2020-09,Long-term migrant,Arrivals,Male,0-4 years,341,0,Final
...

The first line contains the column names, and all subsequent lines have the data corresponding to each column. A comma separates each piece of data. This character is known as a delimiter because it delineates the fields. You are not limited to using commas. Other popular delimiters include colons(:), semicolons(;), and tabs(\td). You need to know which delimiter is used in the file since most modules require it to parse the files.

After reviewing the file and identifying the delimiter, exit your migration_data.csv file using CTRL+X.

You have now installed the necessary dependencies for your project. In the next section, you will read a CSV file.

Step 2 — Reading CSV Files

In this section, you will use node-csv to read a CSV file and log its content in the console. You will use the fs module's createReadStream() method to read the data from the CSV file and create a readable stream. You will then pipe the stream to another stream initialized with the csv-parse module to parse the chunks of data. Once the chunks of data have been parsed, you can log them in the console.

Create and open a readCSV.js file in your preferred editor:

  1. nano readCSV.js

In your readCSV.js file, import the fs and csv-parse modules by adding the following lines:

demo_csv/readCSV.js
const fs = require("fs");
const { parse } = require("csv-parse");

In the first line, you define the fs variable and assign it the fs object that the Node.js require() method returns when it imports the module.

In the second line, you extract the parse method from the object returned by the require() method into the parse variable using the destructuring syntax.

Add the following lines to read the CSV file:

demo_csv/readCSV.js
...
fs.createReadStream("./migration_data.csv")
  .pipe(parse({ delimiter: ",", from_line: 2 }))
  .on("data", function (row) {
    console.log(row);
  })

The createReadStream() method from the fs module accepts an argument of the filename you want to read, which is migration_data.csv here. Then, it creates a readable stream, which takes a large file and breaks it into smaller chunks. A readable stream allows you to only read data from it and not write to it.

After creating the readable stream, Node's pipe() method forwards chunks of data from the readable stream to another stream. The second stream is created when the csv-parse module's parse() method is invoked inside the pipe() method. The csv-parse module implements a transform stream (a readable and writable stream), taking a data chunk and transforming it to another form. For example, when it receives a chunk like 2001-01,2020-09,Long-term migrant,Arrivals,Female,0-4 years,344, the parse() method will transform it into an array.

The parse() method takes an object that accepts properties. The object then configures and provides more information about the data the method will parse. The object takes the following properties:

  • delimiter defines the character that separates each field in the row. The value , tells the parser that commas demarcate the fields.

  • from_line defines the line where the parser should start parsing the rows. With the value 2, the parser will skip line 1 and start at line 2. Because you will insert the data in the database later, this property helps you avoid inserting the column names in the first row of the database.

Next, you attach a streaming event using the Node.js on() method. A streaming event allows the method to consume a chunk of data if a certain event is emitted. The data event is triggered when data transformed from the parse() method is ready to be consumed. To access the data, you pass a callback to the on() method, which takes a parameter named row. The row parameter is a data chunk transformed into an array. Within the callback, you log the data in the console using the console.log() method.

Before running the file, you will add more stream events. These stream events handle errors and write a success message to the console when all the data in the CSV file has been consumed.

Still in your readCSV.js file, add the highlighted code:

demo_csv/readCSV.js
...
fs.createReadStream("./migration_data.csv")
  .pipe(parse({ delimiter: ",", from_line: 2 }))
  .on("data", function (row) {
    console.log(row);
  })
  .on("end", function () {
    console.log("finished");
  })
  .on("error", function (error) {
    console.log(error.message);
  });

The end event is emitted when all the data in the CSV file has been read. When that happens, the callback is invoked and logs a message that says it has finished.

If an error occurs anywhere while reading and parsing the CSV data, the error event is emitted, which invokes the callback and logs the error message in the console.

Your complete file should now look like the following:

demo_csv/readCSV.js
const fs = require("fs");
const { parse } = require("csv-parse");

fs.createReadStream("./migration_data.csv")
  .pipe(parse({ delimiter: ",", from_line: 2 }))
  .on("data", function (row) {
    console.log(row);
  })
  .on("end", function () {
    console.log("finished");
  })
  .on("error", function (error) {
    console.log(error.message);
  });

Save and exit out of your readCSV.js file using CTRL+X.

Next, run the file using the node command:

  1. node readCSV.js

The output will look similar to this (edited for brevity):

Output
[ '2001-01', '2020-09', 'Long-term migrant', 'Arrivals', 'Female', '0-4 years', '344', '0', 'Final' ] ... [ '2021-09', ... '70', 'Provisional' ] finished

All the rows in the CSV file have been transformed into arrays using the csv-parse transform stream. Because logging happens each time a chunk is received from the stream, the data appears as though it is being downloaded rather than being displayed all at once.

In this step, you read data in a CSV file and transformed it into arrays. Next, you will insert data from a CSV file into the database.

Step 3 — Inserting Data into the Database

Inserting data from a CSV file into the database using Node.js gives you access to a vast library of modules that you can use to process, clean, or enhance the data before inserting it into the database.

In this section, you will establish a connection with the SQLite database using the node-sqlite3 module. You will then create a table in the database, copy the readCSV.js file, and modify it to insert all the data read from the CSV file into the database.

Create and open a db.js file in your editor:

  1. nano db.js

In your db.js file, add the following lines to import the fs and node-sqlite3 modules:

demo_csv/db.js
const fs = require("fs");
const sqlite3 = require("sqlite3").verbose();
const filepath = "./population.db";
...

In the third line, you define the path of the SQLite database and store it in the variable filepath. The database file doesn't exist yet, but it will be needed for node-sqlite3 to establish a connection with the database.

In the same file, add the following lines to connect Node.js to a SQLite database:

demo_csv/db.js
...
function connectToDatabase() {
  if (fs.existsSync(filepath)) {
    return new sqlite3.Database(filepath);
  } else {
    const db = new sqlite3.Database(filepath, (error) => {
      if (error) {
        return console.error(error.message);
      }
      console.log("Connected to the database successfully");
    });
    return db;
  }
}

Here, you define a function named connectToDatabase() to establish a connection to the database. Within the function, you invoke the fs module's existsSync() method in an if statement, which checks if the database file exists in the project directory. If the if condition evaluates to true, you instantiate the SQLite's Database() class of the node-sqlite3 module with the database filepath. Once the connection is established, the function returns the connection object and exits.

However, if the if statement evaluates to false (if the database file doesn't exist), execution will skip to the else block. In the else block, you instantiate the Database() class with two arguments: the database file path and a callback.

The first argument is the path of the SQLite database file, which is ./population.db. The second argument is a callback that will be invoked automatically when the connection with the database has been established successfully or if an error occurred. The callback takes an error object as a parameter, which is null if the connection is successful. Within the callback, the if statement checks if the error object is set. If it evaluates to true, the callback logs an error message and returns. If it evaluates to false, you log a success message confirming that the connection has been established.

Currently, the if and else blocks establish the connection object. You pass a callback when invoking the Database class in the else block to create a table in the database, but only if the database file does not exist. If the database file already exists, the function will execute the if block, connect with the database, and return the connection object.

To create a table if the database file doesn't exist, add the highlighted code:

demo_csv/db.js
const fs = require("fs");
const sqlite3 = require("sqlite3").verbose();
const filepath = "./population.db";

function connectToDatabase() {
  if (fs.existsSync(filepath)) {
    return new sqlite3.Database(filepath);
  } else {
    const db = new sqlite3.Database(filepath, (error) => {
      if (error) {
        return console.error(error.message);
      }
      createTable(db);
      console.log("Connected to the database successfully");
    });
    return db;
  }
}

function createTable(db) {
  db.exec(`
  CREATE TABLE migration
  (
    year_month       VARCHAR(10),
    month_of_release VARCHAR(10),
    passenger_type   VARCHAR(50),
    direction        VARCHAR(20),
    sex              VARCHAR(10),
    age              VARCHAR(50),
    estimate         INT
  )
`);
}

module.exports = connectToDatabase();

Now the connectToDatabase() invokes the createTable() function, which accepts the connection object stored in the db variable as an argument.

Outside the connectToDatabase() function, you define the createTable() function, which accepts the connection object db as a parameter. You invoke the exec() method on the db connection object that takes a SQL statement as an argument. The SQL statement creates a table named migration with 7 columns. The column names match the headings in the migration_data.csv file.

Finally, you invoke the connectToDatabase() function and export the connection object returned by the function so that it can be reused in other files.

Save and exit your db.js file.

With the database connection established, you will now copy and modify the readCSV.js file to insert the rows that the csv-parse module parsed into the database.

Copy and rename the file to insertData.js with the following command:

  1. cp readCSV.js insertData.js

Open the insertData.js file in your editor:

  1. nano insertData.js

Add the highlighted code:

demo_csv/insertData.js
const fs = require("fs");
const { parse } = require("csv-parse");
const db = require("./db");

fs.createReadStream("./migration_data.csv")
  .pipe(parse({ delimiter: ",", from_line: 2 }))
  .on("data", function (row) {
    db.serialize(function () {
      db.run(
        `INSERT INTO migration VALUES (?, ?, ? , ?, ?, ?, ?)`,
        [row[0], row[1], row[2], row[3], row[4], row[5], row[6]],
        function (error) {
          if (error) {
            return console.log(error.message);
          }
          console.log(`Inserted a row with the id: ${this.lastID}`);
        }
      );
    });
  });

In the third line, you import the connection object from the db.js file and store it in the variable db.

Inside the data event callback attached to the fs module stream, you invoke the serialize() method on the connection object. The method ensures that a SQL statement finishes executing before another one starts executing, which can help prevent database race conditions where the system runs competing operations simultaneously.

The serialize() method takes a callback. Within the callback, you invoke the run method on the db connection object. The method accepts three arguments:

  • The first argument is a SQL statement that will be passed and executed in the SQLite database. The run() method only accepts SQL statements that don't return results. The INSERT INTO migration VALUES (?, ..., ? statement inserts a row in the table migration, and the ? are placeholders that are later substituted with the values in the run() method second argument.

  • The second argument is an array [row[0], ... row[5], row[6]]. In the previous section, the parse() method receives a chunk of data from the readable stream and transforms it into an array. Since the data is received as an array, to get each field value, you must use array indexes to access them like [row[1], ..., row[6]], etc.

  • The third argument is a callback that runs when the data has been inserted or if an error occurred. The callback checks if an error occurred and logs the error message. If there are no errors, the function logs a success message in the console using the console.log() method, letting you know that a row has been inserted along with the id.

Finally, remove the end and error events from your file. Due to the asynchronous nature of the node-sqlite3 methods, the end and error events execute before the data is inserted into the database, so they are no longer required.

Save and exit your file.

Run the insertData.js file using node:

  1. node insertData.js

Depending on your system, it may take some time, but node should return the output below:

Output
Connected to the database successfully Inserted a row with the id: 1 Inserted a row with the id: 2 ... Inserted a row with the id: 44308 Inserted a row with the id: 44309 Inserted a row with the id: 44310

The message, especially the ids, proves that the row from the CSV file has been saved into the database.

You can now read a CSV file and insert its content into the database. Next, you will write a CSV file.

Step 4 — Writing CSV Files

In this section, you will retrieve data from the database and write it into a CSV file using streams.

Create and open writeCSV.js in your editor:

  1. nano writeCSV.js

In your writeCSV.js file, add the following lines to import the fs and csv-stringify modules and the database connection object from db.js:

demo_csv/writeCSV.js
const fs = require("fs");
const { stringify } = require("csv-stringify");
const db = require("./db");

The csv-stringify module transforms data from an object or array into a CSV text format.

Next, add the following lines to define a variable that contains the name of the CSV file you want to write data to and a writable stream that you will write data to:

demo_csv/writeCSV.js
...
const filename = "saved_from_db.csv";
const writableStream = fs.createWriteStream(filename);

const columns = [
  "year_month",
  "month_of_release",
  "passenger_type",
  "direction",
  "sex",
  "age",
  "estimate",
];

The createWriteStream method takes an argument of the filename you want to write your stream of data to, which is the saved_from_db.csv file name stored in the filename variable.

In the fourth line, you define a columns variable, which stores an array containing the names of the headers for the CSV data. These headers will be written in the first line of the CSV file when you start writing the data to the file.

Still in your writeCSV.js file, add the following lines to retrieve data from the database and write each row in the CSV file:

demo_csv/writeCSV.js
...
const stringifier = stringify({ header: true, columns: columns });
db.each(`select * from migration`, (error, row) => {
  if (error) {
    return console.log(error.message);
  }
  stringifier.write(row);
});
stringifier.pipe(writableStream);
console.log("Finished writing data");

First, you invoke the stringify method with an object as an argument, which creates a transform stream. The transform stream converts the data from an object into CSV text. The object passed into the stringify() method has two properties:

  • header accepts a boolean value and generates a header if the boolean value is set to true.
  • columns takes an array containing the names of the columns that will be written in the first line of the CSV file if the header option is set to true.

Next, you invoke the each() method from the db connection object with two arguments. The first argument is the SQL statement select * from migration that retrieves the rows one by one in the database. The second argument is a callback invoked each time a row is retrieved from the database. The callback takes two parameters: an error object and a row object containing data retrieved from a single row in the database. Within the callback, you check if the error object is set in the if statement. If the condition evaluates to true, an error message is logged in the console using the console.log() method. If there is no error, you invoke the write() method on stringifier, which writes the data into the stringifier transform stream.

When the each() method finishes iterating, the pipe() method on the stringifier stream starts sending data in chunks and writing it in the writableStream. The writable stream will save each chunk of data in the saved_from_db.csv file. Once all the data has been written to the file, console.log() will log a success message.

The complete file will now look like the following:

demo_csv/writeCSV.js
const fs = require("fs");
const { stringify } = require("csv-stringify");
const db = require("./db");
const filename = "saved_from_db.csv";
const writableStream = fs.createWriteStream(filename);

const columns = [
  "year_month",
  "month_of_release",
  "passenger_type",
  "direction",
  "sex",
  "age",
  "estimate",
];

const stringifier = stringify({ header: true, columns: columns });
db.each(`select * from migration`, (error, row) => {
  if (error) {
    return console.log(error.message);
  }
  stringifier.write(row);
});
stringifier.pipe(writableStream);
console.log("Finished writing data");

Save and close your file, then run the writeCSV.js file in the terminal:

  1. node writeCSV.js

You will receive the following output:

Output
Finished writing data

To confirm that the data has been written, inspect the contents in the file using the cat command:

  1. cat saved_from_db.csv

cat will return all the rows written in the file (edited for brevity):

Output
year_month,month_of_release,passenger_type,direction,sex,age,estimate 2001-01,2020-09,Long-term migrant,Arrivals,Female,0-4 years,344 2001-01,2020-09,Long-term migrant,Arrivals,Male,0-4 years,341 2001-01,2020-09,Long-term migrant,Arrivals,Female,10-14 years, ...

You can now retrieve data from the database and write each row in a CSV file using streams.

Conclusion

In this article, you read a CSV file and inserted its data into a database using the node-csv and node-sqlite3 modules. You then retrieved data from the database and wrote it to another CSV file.

You can now read and write CSV files. As a next step, you can now work with large CSV datasets using the same implementation with memory-efficient streams, or you might look into a package like event-stream that make working with streams much easier.

To explore more about node-csv, visit their documentation CSV Project - Node.js CSV package. To learn more about node-sqlite3, visit their Github documentation. To continue growing your Node.js skills, see the How To Code in Node.js series.