The Data Science is The Easy Part!

Blog entry

Lessons learned from a recent business intelligence project

By: Marc Alvarez

One of the biggest motivations of working in a chief data officer (CDO) role is the opportunity to spend time and thought on applying state-of-the-art business intelligence (BI) to the firm’s own data. Unfortunately, given the pressing need to support regulatory reporting and simply keeping data operations humming, this opportunity always seems to get pushed onto the back burner and almost always tends to stay there as the next crisis comes around the corner.

That’s too bad. I recently proposed to management an effort to take a look under the covers at the details and it’s clear that there is tremendous value in applying today’s BI methods (including artificial intelligence (AI), machine learning, predictive analytics and the like) to drive the business. This observation likely comes as no surprise to the many technology and analytics vendors out there (bit of an “ah ha!” moment), but there is one lesson learned in my case that isn’t immediately obvious: in putting together an initial working model, it turned out that the ‘data science’ components proved to be the least challenging part of the exercise.

The concept

Here’s a quick breakdown of the concept – the idea was to compare the trading activity carried out by the broker/dealer and compare it to broader market activity. In particular, the goal was to look for patterns over time and then generate and send notifications of anomalous market activity to sales and trading desks so they could get a heads up on potential activities affecting their clients.

The premise is built on the hypothesis that by generating and recording statistical coefficients over time, today’s BI capabilities should be able to identify patterns and, more importantly, outliers to provide insight to the dynamics in the market. The most important word there is ‘statistical’ – the goal is to produce a model that is entirely empirical and hopefully free from bias. Anybody familiar with market surveillance applications is likely familiar with the premise.

So far, so good. Getting this onto a whiteboard was the first step. The next was to figure how to make it a reality – in particular a proof of concept (PoC) was called for in order to validate that the application could actually work in dealing with OTC instruments. Breaking it down, this entailed:

  • First was to decide how to execute on this – in a busy firm treading water with demands on the IT department, this would likely require partnering with a solution provider. No big deal – write up a brief and send it around and get some dialogue going. Very interesting dialogue, by the way, the tools and resources available from solution vendors today are truly impressive.
  • Next up, in order to avoid the exercise becoming a ‘neverendum’, scope for the PoC needed to be defined and documented – it’s only fair to the solution vendor to avoid scope creep.
  • Selecting a partner (or partners – it’s worth taking multiple kicks at the can if possible to validate as many different options as possible, there are a lot of options out there today, that’s for sure) – and define a plan (do not try to do this without a basic plan in order to avoid scope creep and tangents).
  • Assemble the data content and make it available to your solution partner.
  • Integrate the data and perform the analysis on the data.
  • Present the exciting new tool to your management and win plaudits all round.

All sounds pretty straight forward doesn’t it? Here’s a look at the realities.

The realities

First, there’s the question of selecting a partner. Given that this is largely an experiment, there is, understandably, some reluctance to spending any significant sums of money on something that may or may not actually work. From the solution partners’ perspective, this looks like an unbudgeted initiative – something they could easily be drawn into that ends up costing them a lot of money. From the firm’s perspective, the need for non-disclosure and protection of intellectual property is top of mind.

Lesson learned – getting even to within sight of the starting line is a path full of obstacles and dependencies. It takes a lot longer than anybody could ever imagine going into the exercise.

Second is the need to define the scope of the project to a sufficient level of detail that an actual meaningful conversation can take place. All too often, what happens here is a concept is sketched out on a white board and then a talented technology person runs with it to produce ‘something’. The temptation is to get activity going as quickly as possible, without thinking through the whole approach.

The biggest issue that came up in this experience was clarifying precisely what would be demonstrated and how success (or failure) would be judged. Fortunately, I’ve been down this path before and learned firsthand the costs of rushing into things. So, time and effort was spent describing the scope and functionality in writing and, most important of all, the criteria by which success was to be judged. As this was intended to be a PoC, this involved a lot of back and forth with prospective business users.

Lesson learned – engaging prospective business users at this stage proved a time consuming distraction. The whole concept proved to be so out of the comfort zone and novel (good things in my opinion) that it was very difficult to get much meaningful feedback beyond ‘we’ll look at it when it’s ready’. Seriously, this was the experience, clearly a lot of interest but a user community that finds it difficult, if not impossible, to express its business requirements. No amount of discovery or requirements definition effort is going to help you here – at the cutting edge an operating, demonstrable application is called for. And good luck with getting anybody to read a document!

Data content

Okay, so at this point you have the general concept defined, buy-in from your business users, requirements defined, and solution partner(s) selected. You’re ready to go…or are you? The missing piece of the puzzle is the data content, which almost always seems to get overlooked.

In this case, most of the content needed to kick start the project was readily available from in-house databases (note, you need to have a handy SQL person on hand to get at it since the IT team is pretty much fully booked on mission critical projects) and vendors. The latter is another hurdle that needs to be considered. In this case, the request was for a custom extract of data on a one-off basis to prove a concept. Finding a vendor willing to work on a custom request and put together a contract for the content is another task that needs to be considered, along with getting budget approval for the spend. And it all takes time and a lot of effort to specify the data requirement and how it will be delivered.

Meanwhile, the clock is ticking and your prospective business users are reading more and more about AI and the like in the press and are starting to ask questions about when there will be something to see. On top of it all, there’s a day job to do so things are really not getting as much focus as they should.

The magic

Finally, all the inputs are delivered to your solution partner(s) and their data scientists. Then the magic happens – with today’s capabilities and the expertise they had available for the project, the solution partners easily turned around initial results and sample applications in as little as a week. More importantly, as more data became available, the application became increasingly functional and impressive. This also starts to throw up all the issues in the data provided as input. In many cases these proved to be showstoppers, requiring corrections and/or additional data to resolve. So, this turns out to be a highly iterative activity dependent on the same complexity of acquiring and delivering data content.

In the end, the concept becomes demonstrable and quite impressive. Be warned, however. At the first presentation to business users you should expect a firehose of questions and requirements – now they provide the input!! And of course, as this was a PoC exercise (marked by really getting your business users to buy in), there has been little thought put into what a production platform is going to look like. It may sound like a nice problem to have, but as the logical next step is to secure budget to produce an operational service, well, you know how that discussion goes.

Here are the key lessons from a front of the curve effort to monetise the firm’s data assets:

  • The data science really was the easy part. Selected solution partners had the skills and technology to deliver statistical and quantitative analysis in a surprisingly timely manner. That was great to see.
  • It takes far, far longer than anyone could anticipate to define and set the scope for the PoC. The temptation to go beyond an initial set scope is very strong and can easily become a major distraction.
  • Getting early input from the prospective business user community is very difficult. The technologies and analytics involved are simply too new and unproven to them. Thought leadership is required as well as a good dose of salesmanship to develop a compelling vision.
  • Plan for success – once it’s all put together and demonstrable, don’t underestimate the demand that will be generated. It’s absolutely essential to communicate how the new analytics are going to get into production and become a resource to help drive the business. In particular, this is where the loop needs to be closed with the IT team as getting something into production is almost certainly going to require the team’s assistance and effort.

All told, it’s clear that options available today to apply data science to help drive the business are here and real. However, successful adoption involves a significant increase in areas that may not at first glance seem obvious. As firms increasingly adopt digital methods of operating and deploy increasingly sophisticated quantitative and statistical techniques, there is a corresponding increase in dependency on the firm’s ability to acquire, collate and manage data content. Welcome to the new normal!