Citizen Data Scientists – Are we there yet?

in Industry Insights, Latest news

Artificial Intelligence is a field that captures the imagination in the press and the public domain, with a vision of machines supporting us in everyday life. But how close are we to that in everyday business life? The answer is very close, but with one key intermediary step.

Back in 2016, Gartner coined the term “Citizen Data Scientist” (CDS), meaning a person “who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.” Gartner also predicted that the CDS will surpass data scientists in the amount of advanced analysis they produce, by 2019. Until recently that prediction seemed inaccurate, even if only by the date, not the outcome.

However, the recent announcement from Google on the Beta-Stage release of its AutoML platform, and subsequent publicity, have reignited discussion around the future shift towards a CDS environment. Essentially, AutoML provides an optimal machine learning algorithm for particular business uses. All the user needs to provide is labelled data; data with an outcome that the algorithm can use to learn. No data science knowledge is required. AutoML is currently focussed on image classification and natural language translation, but plans to expand capability to other types of business problems are well underway.

There is no doubt that the Gartner prediction will eventually be realized, and there may be people reading this article, who already get excited about using AI to support their work; unlocking capability that only currently resides in the domain of data scientists, in their everyday business lives.

Therein lies a problem with the CDS concept; expectation. We are still in the early-days of progress towards the CDS, and the current focus is very much in the realm of fixed, repetitive tasks. However, user expectation is likely to quickly and vastly exceed capability, at least in the short term.

My team and I have been designing and implementing self-service machine learning solutions, that enable the CDS, since 2013. Self-Service is in our DNA and our stated objective is to deliver products that can be used by non-data scientists, including our CEO, Doctor. Mark Goldspink, who does not have a statistical background. We have had great success with these tools, enabling fraud managers and other business managers to make better decisions. However, we have also seen that even the most self-service solution can quickly come under pressure, when its put in the hands of inquisitive business domain experts.

Business users often want to stretch solutions by providing different data or using them for different types of decisions. This is well within the bounds of both the solutions and underlying technologies, but may require support from expert data scientists, as boundaries are stretched, and new data requires configuration.

The concept of machine learning has many different facets, and what works for one type of business problem may not work for another. For example, the amount of data and the time period that data requires to ‘learn’ a problem may differ between problems that, on first review, appear broadly similar. A business user could learn this through trial and error, but it is far more productive, for the user and their business, to enlist the expertise of a data scientist to define the correct training strategy.

Ideation, the formation of new ideas or concepts, is the biggest aim and paradoxically the biggest challenge for the CDS concept. Think about giving someone a shiny new sports car and then telling them they can only drive it on the commute to work, rather than to explore the country. The driver may try to find the most interesting and diverse route to work, but they still arrive in the same place.

How will the journey towards the CDS unfold? Undoubtedly in the long term, solutions will become more flexible and dynamic to realize the full definition of the CDS. In the short term, however, we require data scientists to actively engage with and support the budding CDS within the business. The driving instructor to help the driver use the sports car on country roads.

So, in the short term at least, we need to expand our definition of data science to think about data science as a service, essentially ‘Knowledge as a Service’, as a critical next step on the path towards the true Citizen Data Scientist.

Matthew Attwell
Risk & Client Services Director at The ai Corporation (ai)