New utilities and components in ATLAS

Sigfried_Gold · September 30, 2016, 12:12pm

I’ve recently added a bunch of stuff to ATLAS (some of this should be in libraries also accessible to other OHDSI tools, but it’s not yet) that other developers may want to know about. These are all new and mostly have only been used in code that I’ve written, so they are likely to need some work when used by other developers, but I’m very happy to provide support and assistance to anyone who would like to make use of any of it.

faceted-datatable-cf: this is a replacement for faceted-datatable (which appears throughout ATLAS). It manages facet filtering with crossfilter, so it is much faster on sizable datasets. The ActiveCounts on facet members act better (when you select a member, counts on all the other members of that facet don’t go to zero). It also allows filtering to be shared with components outside the datatable component, and allows column and facet configuration to be managed with a new ohdsiUtil.Field class.
ohdsi.util.Field: performs a bunch of work on fields derived from arrays of objects (like recordsets from ajax calls.) An incomplete rationale for this and description of how it works is here. It helps you consolidate all the behavior of a “field” in (selectively overridable) configuration code and then be able to interact in various ways with that field from different places without spaghetti code:
- load, change, or filter data and field accessors in other components will respond appropriately
- generate axis/scale ranges when chart dimensions are available or changed
- generate axis/scale domains when data is available or changed
- define fields like stddev or confidenceInterval that depend on not just a single data point but the whole dataset
- define labels, titles, captions, tooltips for use in different contexts (facet names, column headings, legend text, etc.) that can optionally be functions with access to any necessary data (e.g., avg of some dataset value)
- make fields that access or act as proxies for other fields (e.g., a BMI category could depend on a BMI field (which depends on weight and height fields), and datapoint color could depend on BMI category)
elementConvert: converts elements of whatever type (d3 selection, jquery selection, dom element, css id) into whatever type you want
D3Element and a bunch of subclasses – does a bunch of cool stuff for managing nested d3 selections that update when their parents update. Still in flux.
SvgLayout: dynamically manages regions of an svg chart (margins, axes, axis labels, etc. on top, bottom, left, right)
setState, getState, hasState, deleteState: saves/retrieves nestable state values to/from url, allowing urls of particular app states to be shared and back/forward browser buttons to work as users might expect. Only used in selected places because it may interfere with the way some parts of the application use routing or knockout observables
onStateChange: register listeners to specific state changes
cachedAjax: replacement for $.ajax that does local caching, BUT DON’T WORRY: if you see a call to cachedAjax, it will not cache and will act exactly like an $.ajax call unless the url matches a path in ALLOW_CACHING. You probably shouldn’t use this (keep ALLOW_CACHING empty), but it has saved me a ton of time during development.
storagePut, storageGet, storageExists: serialize and compress objects and save to/retrieve from local storage (this is what cachedAjax uses
SharedCrossfilter: facilitates sharing crossfilters across components (faceted-datatable-cf uses it), and letting them use Field objects.
new, highly configurable scatterplot

All of this stuff is used in sptest (http://localhost/#/sptest). The sptest stuff needs to be removed from master but I’ll do that after there’s another working example to refer to.

Don’t hesitate to reach out!

Sigfried

Chris_Knoll · September 30, 2016, 7:29pm

I’d like to propose that the util module be partitioned out into caching, svg utils d3 helpers, etc. At this rate, the ohdsi.util module (admittedly very abstractly named) will become a very large unstructured mass of functionality that will be difficult to navigate.

Sigfried_Gold · September 30, 2016, 8:13pm

While we’re at it, how about breaking app.js and main.js into smaller,
logical parts?

Chris_Knoll · September 30, 2016, 8:47pm

Sure that’s a fine discussion. My understanding is main is the main boot-loader for the SPA, defines the require.js config, sets up the baseline dependencies, and instantiates the ‘application’ (similar to what you’d see in a main() method of a C, Java or C# program). Is there something beyond this that’s happening in main.js that should be factored out?

In the context of app.js: app.js is the main application container (it’s instantiated in main.js but the global app handling happens inside app.js). Atlas is big, so app.js is also going to be big, but there is probably places that things were put into this ‘global’ context for atlas that should probably be moved into a separate subsystem. I’d draw the line at what’s ‘in’ app.js vs what should be delegated to a sub-module is: anything that is used to support the ‘container context’ of the atlas application should be found in app.js. Such as the dirty flag maintaining the overall ‘is anything open dirty?’ state of the app. Maybe routes should be defined and handled in this context (or maybe there’s a way to make route handling a sub module). Other things like ‘visualizationPacks’, that feels like it’s a specific reporting function that should probably be delegated to sub modules.

Do you have any specific recommendations about organizing this? I’m not suggesting we break anything up just for the sake of making it smaller, but rather just considering the logical code organization in the codebase.

-Chris

Sigfried_Gold · October 1, 2016, 10:57am

Yeah, I am still just learning how some of this works, but it strikes me that routing issues are spread between index.html, app.js, and main.js – and not just those because each “page” handles its own routing to some degree. But important application logic and data loading are also handled in main.js and app.js – including loading and management of currentCohortDefinition, concept sets, searches and vocabulary results, global variables that need to reach down into individual pages to steer tab navigation, etc. And, of course, a lot of the logic for cohort definitions, search, concepts sets, etc., is also handled in the pages specific to those concerns – and, in those pages (as well as in app/main/index) model, controller and view logic (if that’s the way you like to think of things being divided) is all mixed together.

What I’d like to see would be javascript modules that encapsulate data and “business” logic for a single concern (vocabulary, concept sets, cohort def, etc.) be separated out from view logic as much as possible. As a new developer coming to the code, when I want to write new functionality for dealing with cohort definitions or vocabulary navigation, I should be able to find a library of javascript classes or functions that allow me to communicate with WebAPI. But that mostly doesn’t exist right now. Those functions are scattered across app.js and individual components and “modules” and cannot easily be separated from their specific use on specific pages.

Sigfried_Gold · October 1, 2016, 12:16pm

As a specific example of my frustration: I thought I found what I was asking for (a centralized controller for vocabulary for instance) when I stumbled upon this: https://github.com/OHDSI/Atlas/blob/master/js/modules/WebAPIProvider/VocabularyProvider.js, but:

The function starting on line 14 is not a callable vocabulary service but a call for sources that runs on loading in order to prefetch data so a VocabularyProvider.getDomains function can be used, which only happens once in all of ATLAS, although the VocabularyProvider is loaded from many places.
The search() function on line 33 looks to be a straightforward call to the vocabulary/search api, which would be great (especially if more of the logic related to uses of the search api were also here); but I don’t know if this function ever gets called.
Rather, when I do I vocab search, the api call is invoked through components/search.js:executeSearch(). So is that now the place to go to launch vocab search api calls? I don’t know. It only gets called from the same file here.
But when I put a break point there to see how that gets used, the stack trace uselessly trails off in minified knockout code.
Further poking around tells me that this code gets triggered by app.js setting a global variable when navigating to the search route.

This actually turns out to be a pretty simple example. It took me less than half an hour to figure out how the vocab/search api gets called. But it could have taken me less than a minute if:

I could trust that all api calls to something like a vocabulary search service would be located in a file like VocabularyProvider.
Loading VocabularlyProvider didn’t have side effects (making ajax calls just by being included).
The search module was called directly by the router with explicit parameters rather than being triggered by the setting of global variables as it is (and all pages are).
The search module called the api with an explicit call to a VocaburalyProvider function, and any related logic that might possibly be reusable was also located in the VocabularyProvider, but if the search module needed to do something not reusable with the results, it did so by using the promise returned by VocabularyProvider.search().

s

Chris_Knoll · October 1, 2016, 2:43pm

Yep, any inline ajax call should be wraped inside some module that manages that communcation. This would let us provide mocks to the web service calls just by changing the ‘real’ implementation with a ‘mock’ one.

Few comments on your observations:

Not sure that i’d call a stack trace into a minified javascript library ‘usless’. There’s several minified libs in the app (d3, jquery to give 2 other examples) and it’s defintiely useful to know that a call stack travels through these libraries, even if the minified form doesn’t give much guidance in what is happening there. You can also swap the minified form with the debug’d form simply by channging the mapping in main.js.
You describe in #1 that the ‘VocabularyProvider is loaded from many places’. I want to be very clear about how AMD module loading works: a module can be referenced in many places (as you’ve seen) but it is only loaded once. So it is normal to find the VocabularyProvider referenced in different places (if those different places need the VocabularyProvider to do work) but, depending how the user navigates through the app, it may be loaded first if they jump right into cohort definitions, or it might be loaded first if they navigate into the search screen. Or it might be loaded right during startup (in app.js) but if it’s loaded there, then any subsequent require() call will get the already-loaded module that was required() earlier. So, it’s never the case that a module is loaded ‘all over the place’.

But you raise valid concerns, and it would be worth the time to take a broad view of the different modules in Atlas and determien if there’s redundancy, areas to group functionality together, and places where functionality should be split apart to have the code modules have a more specific purpose (which would aid in reuse down the road).

Sigfried_Gold · October 3, 2016, 11:13am

My point about VocabularyProvider was not that it’s loaded repeatedly, but that it performs an ajax call on loading, and most of the modules that request it to be loaded never make use of the ajaxed data. It’s inappropriate for a library of vocabulary functions to do that kind of work simply as a side-effect of being loaded.

I called the stack trace into the minified code useless, not because it’s useless to know that knockout or some other library was involved there (and I also realize that I can swap in non-minified library modules); but the stack trace trails off in the minified library code. The event I’m trying to track down, the one that led to the breakpoint, may have been a change to some knockout observable, or a side-effect (like the VocabularyProvider one) of require.js loading some module–whatever it was, it’s not easy to track down.

I would love if we could move to a more functional programming paradigm; not that these can’t also be confusing, but it would really be nice if the stack trace to any point in the code led back directly to the router handler or other user-initiated event where it originated.

I think our whole top-level architecture could be greatly simplified. Rather than have the router handlers (in app.js) set self.currentView (and contingently run a number of ajax-calling data setup routines to, possibly, set other observables), which then causes some div in index.html to invoke some knockout component and pass various observables to it–and rather than having multiple route handlers leading to the same component depending on details in the url path what I might like to see is something like:

one router handler per module–pass route parameters down to the module and let it decide how to behave in response to different route parameters;
a single, centralized, persistable repository of app state–like the overly simple url/state manager I recently added; or, better, something like flux where all state changes have to pass through a centralized dispatcher and get passed down to views through an explicit component hierarchy.
let each module decide what data it needs and when and have its requests managed by the ajax wrapper you just mentioned.

Chris – I don’t know what the history is here, so I hope I’m not touching on any sore points, but, would you be willing to come to the architecture calls so we could hash some of this out with Frank, Anthony, Taha, and others involved at this level and figure out whether there can be sufficient agreement and resources to do any of this?

Thanks,
Sigfried

Sigfried_Gold · October 3, 2016, 12:28pm

Sorry to be annoying, but here’s another example.

I’m writing some experimental code (on my own time and with no assurance I’ll have a chance to finish it) working with vocabularies. I’d like to load a concept. The current code for doing that is in app.js and is called by the routing table entry for the concept page. The loadConcept() function not only performs the ajax call, it also sets a number of observables–some having to do with the view, others with the data–and it also triggers further calls to get related data.

I shouldn’t change any of this because I might break something. Maybe what would be best is to make a service in some centralized place, use it myself, and hope that at some point app.js’s loadConcept call will be refactored to also use it. But we need to have these conversations about where such a call should go. So, in the meantime, I’ll make my own redundant code to do the same thing…and then if someone ever does attempt to centralize this function, they may have to deal with retrofitting both the app.js version and my new version to use it. Oy…

Chris_Knoll · October 3, 2016, 1:50pm

Lot of points here, I’ll try to cover them one at a time:

The architecture call: I’ve found the call not effective at discussing and working through complex technical issues. Consider the conversation we’re having right now, we can spend time to think out what issues are raised, find links to supporting information, even create proof-of-concepts to demonstrate solutions. This isn’t something that can be done every other week on an hour call. My experience is that the calls have worked more like a debate team where whoever makes the most arguments in the call that can’t be refuted wins. So, I suggest we bring this into the forums where we can have a more robust conversation about it and have more parties involved (across different time-zones, even).
It might be a style thing, but it is completely appropriate for a module load to kick off an async process (like an ajax get) during the first load. The alternative is to scatter ‘if isLoaded()’ calls throughtout the code and then call the init() methods everywhere because you never know which function of the module is going to be called first.
Stacktraces: if the router is using callbacks to signal processing, your stack trace may never show the path back to the router. This is similar to Java or C# async callbacks that when a async call is made the new stack trace begins at where the async method starts. That’s why many of the stack traces you see begin in knockout or jquery (or possibly d3) because that’s where the event handling started from. You might see a stack trace go back to the router if the specific action that caused an error was directly from a route, but this is an edge case for the typical actions in ATLAS.
Debugging observables: it depends on the nature of the issue you’re having, but you can either set up a subscription to an observable to set a breakpoint when the value changes, you can use the chrome knockout debugger that will let you inspect the view model that’s bound to a DOM element, and there might be other global debug handlers you can attach to knockout (I haven’t dug into this, haven’t needed this sort of thing) to catch any events that knockout throws. In an app like atlas, that would be a lot of noise, so the first two suggestions is what i’d try.
Top level architecture: what you are experiencing is that atlas was rebranded from a prior application that didn’t have any of the AMD/encapsulation discipline that is required for an application of this complexity. So, today we’re left with a bit of a heap that is hard to navigate. I pressed very hard for individually packaged modules that could be independently minified and versioned, but was vetoed for a more ‘all piled together’ approach that we have today. It means it’s simpler to add things to atlas (just throw it in there) but for people to jump into the application they have to try to digest many different aspects of the application at once to begin having just a basic understanding of how it all works.
App state encapsulation: I agree this would be better, and would give us a way to ‘formalize’ what is considered application state instead of passing ‘pagemodel’ around to every component.
Routing: related to app sate encapsulation, I think it’s worth looking into how modules that are loaded could be fed a routing manager that they could manipulate to add the routes that only they want to handle. Currently, it’s a bit of a challenge to add new routes in an organized manner (only way we can organize them is just the order they appear in app.js (or main.js?). I’m not sure if the current routing library we use today lets you add new routes at a later time. but that would be a very nice to have if we could have each page module define their own routes. We’d have to handle collisions but I don’t think that would be too hard to manage.

So, there’s a lot here to talk about, each of these points could be their own forum topic. As far as ‘sore points’, the only sore point i have is that I did a lot of groundwork towards trying to get an organized, modularized codebase in place, but it didn’t get any traction, but now issues are being raised that we should have been more organized from the start. I agree, but it will take a general consensus to get everyone to get on the same page (such as not using non-amd modules in the codebase, remove items from global scope, etc).

Sigfried_Gold · October 3, 2016, 11:21pm

Just about the routing for now: flatiron does have ad hoc routing. I tested some things with it and it’s pretty easy to use.

Chris_Knoll · October 4, 2016, 2:16am

Good find, i’ve seen it before. One thing that I’d love to see out of the box (but probably could be implemented with crafty use of before() events) is being able to ‘cancel’ a route action. For example, if you are currently looking at cohort definition id #2 at:#/cohortdefinition/2 and you make some edits and then you just want to flip over to 3 at #/cohortdefinition/3, the switch to a new cohort will cause the ‘Unsaved changes will be lost. Continue?’ dialog, but if you hit cancel, it will leave the url as the last invoked #/cohortdefinition/3 (but you canceled that, and should return to #2). So, I haven’t dug into this particular problem myself, but it looks like there’s enough hooks in the router you found that it might be something we can figure out.

-Chris

Sigfried_Gold · October 4, 2016, 9:21am

Actually, I don’t think you need router hooks for that. Something like this should work:

window.addEventListener('hashchange', function(evt) {
    if (evt.newURL === "something I want to cancel") {
        document.location = evt.oldURL;
    }
}

Sigfried_Gold · October 4, 2016, 9:23am

Oh, and, in case it wasn’t clear, that’s not a router I found, it’s the one we’re already using.