Data manipulation in d3.js





This post describes the most common data manipulation tasks you will have to perform using d3.js. It includes sorting, filtering, grouping, nesting and more.

Mathematics


Get the maximum value of a column called value
d3.max(data, function(d) { return +d.value; })])

Get the minimum value of a column called value
d3.min(data, function(d) { return +d.value; })])

Object and Arrays


JavaScript objects are containers for named values called properties or methods. We can call any element of the object with its name:
var myObject = {name:"Nicolas", sex:"Male", age:34}
myObject.name
myObject["name"]

Create an array of 5 numbers

An array is a special variable, which can hold more than one value at a time. Arrays use numbered indexes.

var myArray = [12, 34, 23, 12, 89]

Access the first element in the array.
myArray[0]

Access the last element in the array.
myArray[myArray.length - 1]

Remove last element of the array. Add a new element a the end.
myArray.pop()
myArray.push(45)

What is the index of the element '34' in the array?.
myArray.indexOf(34)

Remove elements that are provided in a second array:
var myArray = ['a', 'b', 'c', 'd', 'e', 'f', 'g'];
var toRemove = ['b', 'c', 'g'];
filteredArray = myArray.filter( function( el ) {
  return !toRemove.includes( el );
} );

Keep only elements that are provided in a second array:
var myArray = ['a', 'b', 'c', 'd', 'e', 'f', 'g'];
var tokeep = ['b', 'c', 'g', 'o', 'z'];
filteredArray = myArray.filter( function( el ) {
  return tokeep.includes( el );
} );

Note: an array can be composed by several objects!

Filtering


Keep rows where the variable name is toto
data.filter(function(d){ return d.name == "toto" })

Keep rows where the variable name is different from toto
data.filter(function(d){ return d.name != "toto" })

Keep rows where the variable name is toto OR tutu
data.filter(function(d){ return  (d.name == "toto" || d.name == "tutu") })

Keep rows where the variable name has a value included in the list tokeep
tokeep = ["option1", "option2", "option3"]
data.filter(function(d,i){ return tokeep.indexOf(d.name) >= 0 })

Keep the 10 first rows
data.filter(function(d,i){ return i<10 })

color points using ifelse statement
.style("fill", function(d){ if(d.x<140){return "orange"} else {return "blue"}})

Sorting


Sorting on 1 numeric column called value. Use + instead of - for reverse order.
data.sort(function(a,b) { return +a.value - +b.value })

Sorting alphabetically on 1 categoric column called name. Use descending for reverse order.
data.sort(function (a,b) {return d3.ascending(a.name, b.name);});

Sorting alphabetically on 2 categoric columns called name1 and name2.
data.sort(function(a,b) { return d3.ascending(a.name1, b.name1) ||  d3.ascending(a.name1, b.name2) } )

Sorting on 1 categoric columns called name1 and then on 1 numeric called value.
data.sort(function(a,b) { return d3.ascending(a.name1, b.name1) ||  (a.value - b.value) } )

Sorting on 1 categoric columns called name1 according to the order provided in the variable targetOrder.
data.sort(function(a,b) {
    return targetOrder.indexOf( a.name1 ) > targetOrder.indexOf( b.name1 );
});

Nesting


Sorting on 1 numeric column called value. Use + instead of - for reverse order.
data.sort(function(a,b) { return +a.value - +b.value })

Sorting alphabetically on 1 categoric column called name. Use descending for reverse order.
data.sort(function (a,b) {return d3.ascending(a.name, b.name);});

Grouping


Get a list of unique entries of a column called name
var allGroup = d3.map(data, function(d){return(d.name)}).keys()

Loop


A for loop from one to ten:
var i
for (i = 0; i < 10; i++) {
  console.log(i)
}

A for loop for all the elements of a list: (Note that it returns 0, 1, 2, not a, b, c)
var allGroup = ["a", "b", "c"]
for (i in allGroup){
  console.log(i)
}

A while loop to count from 0 to 10
while (i < 10) {
  console.log(i)
  i++;
}

Reshape


It is a common task in data science to swap between wide (or untidy) format to long (or tidy) format. In R, there is a package called tidyr that is entirely dedicated to it. It is definitely doable in Javascript using the code snippets below. In case you're not familiar with this concept, here is a description of what these formats are:

long versus wide data format summary

Note: it is strongly advised to perform these data wrangling steps out of your javascript to save loading time of your dataviz


Going from wide to long format.
d3.csv("https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/data_correlogram.csv", function(data) {

  // Going from wide to long format
  var data_long = [];
  data.forEach(function(d) {
    var x = d[""];
    delete d[""];
    for (prop in d) {
      var y = prop,
        value = d[prop];
      data_long.push({
        x: x,
        y: y,
        value: +value
      });
    }
  });

  // Show result
  console.log(data_long)


Going from long to wide format.
d3.csv("https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/data.csv", function(data) {
  //todo
})

Stack


Stacking data is a common practice in dataviz, notably for barcharts and area charts. Stacking applies when a dataset is composed by groups (like species) and subgroups like soil condition. Stacking is possible thanks to the d3.stack() function, which is part of the d3-shape module. Here is an illustration of what happens when you read data from .csv format and stack it.

Stacking data explanation

Stacking from .csv format
d3.csv("https://raw.githubusercontent.com/holtzy/D3-graph-gallery/master/DATA/data_stacked.csv", function(data) {

  // Have a look to the data
  console.log(data)

  // List of subgroups = header of the csv files = soil condition here
  var subgroups = data.columns.slice(1)

  // List of groups = species here = value of the first column called group
  var groups = d3.map(data, function(d){return(d.group)}).keys()

  //stack the data? --> stack per subgroup
  var stackedData = d3.stack()
    .keys(subgroups)
    (data)

  // Have a look to the stacked data
  console.log(stackedData)

  // Get the stacked data for the second group

})

Get the key of each element of stackedData ???