RunPage tech overview: Danfo.js integration

In my previous post – RunPage tech overview: JS Sandboxing, I discussed how I handled client-side sandboxing of script-block codes. Here I will describe how Danfo.js was integrated into RunPage API.

Why integrate Danfo.js?

Relational DB is quite popular because it allows us to query for variety of information by use of SQL statements. It is very versatile and powerful. However, that requires we host a relational DB at the server-side which further means now all data now needs to be imported into the DB and fetched back to client-side for processing. So there goes the speed and security of data features of RunPage.

Our modern browsers now do support Web SQL which allows to store all data right in the browser which solves the above two fundamental issues. But, we now then need to clear and rebuild the DB on every run. Remember every run of RunPage starts with a clean slate. Also we will need to deal provide an ORM to ensure easier working with Web SQL as SQL is inherently quite verbose and the table creation and data insertion process is pretty cumbersome. What we really need is SQL’s query power without its other pains. The answer to that is virtual tables or DataFrames.

DataFrame concept is made popular by Python’s Pandas. Danfo.js is meant to provide the power of Pandas in Javascript.

Using Proxy

ES6 have a neat little api called Proxy. I have used that extensibly while integrating Danfo.js into RunPage. If you do not know what Proxy is then in short it allows you to wrap any JS object where invoking any method, property, etc. on the wrapped object can be intercepted by your code and you have the ability to change the complete outcome. The way it is different from creating a simple wrapper is that instanceof operator will still work with your wrapper object as it would with the wrapped object, and you need not reimplement all methods etc. of the wrapped object to intercept them. A single method in your wrapper can intercept all method and property calls.

The reason I had to wrap Danfo.js’ DataFrame and Series is because of its plot api – https://danfo.jsdata.org/api-reference/plotting/line-charts. If you notice the api there the plot is a method which takes input a DOM’s id where the graph is to be plotted. However, you do not have DOM access in script-block and I do not want you to have that access since RunPage needs to be able to decide where to render your graph. I use Proxy to to replace DataFrame and Seriesplot with my own version which eventually returns a JSON which the main thread code can interpret as instructions to render the plot.

Conceptually this was simple but quite challenging to actually implement because any kind of operation and series of method calls can eventually return a DataFrame or Series object. So RunPage wraps all objects returned by the wrapper object recursively including Arrays, Functions, etc.

How the main thread renders the graphs

The JSON object returned by the proxy plot is something like below.

{
   "$renderAs":"plot",
   "data":{
      "method":"pie",
      "args":[
            // Arguments passed to plot.pie()
      ],
      "dataframeOrSeriesJ": // DataFrame or Series data as JSON
   }
}

Using these data a Danfo DataFrame or Series is recreated and the actual plot method is invoked. In the above example the invocation code will be something like dfObject.plot('generatedDomId').pie(...args).

The main thread runs a series of output formatters each of which is meant for to render a particular type of JSON. The DOM returned by the output formatter is then used another sub-routine to finally add it into appropriate location on the page. So, the plot output formatter does not know when its given DOM will be add to the page. Only when the DOM is added then the above code needs to be run to actually render the graph. For this another trick is used.

const id = uuidv4();
const {method, args, dataframeOrSeriesJ} = json.data;

const div = document.createElement('div');
div.innerHTML = `<div class="outbox plot" id="${id}"></div>
<img data-id="plotLoader" src="${dummyImg}" style="height:1px;width:1px;" />`;
div.querySelector('[data-id="plotLoader"]').addEventListener('load', function plotter() {
    const dataframe = DataFrameOrSeriesJsonToDataFrameOrSeries(dataframeOrSeriesJ);
    const plotter = dataframe.plot(id);
    const plotM = plotter[method];
    plotM.apply(plotter, args);
});

return div;

Here we generate a div with a unique id which we later pass to the plot method. The neat trick to note here is the use of img tag. The src attribute contains path to an actual one pixel image transparent. When the image is loaded the browser invokes its load event handler which further invokes the actual plot function. Since the img tag is after the plot’s div hence we can be sure that by the time img‘s load is fired the div with the given id is already available.

Problems with Danfo.js

It claims to be Pandas’ equivalent in Javascript but in reality it provides fraction of the tools when compared with Pandas. It does not even provide a ‘not’ or ‘invert’ operator when querying data.

As of now I have filed three defects which are not even assigned to anyone or has any activity yet. The first of which was filed 20 days back. So it looks like after April 2022 the activity on this codebase has suddenly died out.

Looking at the kind of issues I have found the quality of this library is very poor. For example, it seems it is meant to process only numbers, strings and boolean data. If you store other JSON objects in DataFrame then it won’t complain but silently give you wrong and unexpected results. (ref) There lot more fundamental issues which makes it unreliable. In data processing the one thing which cannot be compromised on is reliability else what is the point of processing data if you cannot be sure if you can rely on its output or not! It even has a defect filed which claims that the current latest version 1.1.1’s package on NPM contains old code – https://github.com/javascriptdata/danfojs/issues/462; and this defect is more than month old and still zero activity on it.

So many issues and on top of that it has dependency on @tensorflow/tfjs which I do not need at all.

Given all these factors I am considering ripping out Danfo.js out of RunPage and replacing it with Data-Forge. However, I will first evaluate that extensibly so as not to commit the same mistake I did by integrating with Danfo.js.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.