The theme for today seems to be Data Analysis/Data Science. Three separate times today the topic of Data Analysis has come up. As I sat down to write my blog post tonight, it struck me as interesting that the topic had woven itself through my day so thoroughly. I took it as an opportunity to introduce this facet of my interests to the readers of my blog. If this isn’t your cup of tea, check back next post. My interests are many and varied and I intend to write on all of them. In other words, if you don’t like this post, come back for the next one. It’s bound to be different.
As may already have become obvious, I’ve recently embarked on a journey of exploration of Data Analysis and Data Science. If you read the Wikipedia articles that I linked to in the first paragraph above, you will see that the field of Data Analysis is very broad and the field of Data Science is somewhat controversial. While initially they seem to be different but related fields, the more I try to characterize the difference between them, the more I realize that they have a lot in common.
I think the problem with trying to differentiate between them is that they both appeal to the naive interpretation of their names, which in both cases is incredibly broad. I am reminded of the problem that the field of Artificial Intelligence has struggled with for its entire existence, namely that there isn’t an unambiguous definition of Artificial or Intelligence that rigorously captures what the practitioners of the field intuitively understand it to mean.
Getting back to the inspiration for this post, the first time the subject came up today, I was getting a demonstration of the work that some of my colleagues were doing with Oculus Rift and the Unity development environment. We ended up discussing the fact that the customers, for whom they were developing applications, had started by capturing their working data using Microsoft Office applications like Excel, Access, and PowerPoint. Over time, their data had become so large that these applications had become unwieldy. My colleagues had taken the data that had been captured with these legacy processes and imported it into a new application and had thus been able to provide a more unified way to manage the data.
One of the things that was learned along the way is the customer had learned to love their existing processes. Consequently, the application that was being developed to supersede those older tools had to retain much of the look and feel of them in order to gain acceptance from the customer. This was a very important realization. Earlier in my career I have had personal experiences where customer acceptance was never achieved because of an aversion to the perceived difficulty in learning a new interface. Thus, the first observation that I gleaned about large collections of data is that the owners of the data have to be accommodated when you are trying to evolve their use of their ever growing collection of relevant data.
A little bit later I had a conversation with a colleague about my understanding, naive as it is at this stage, of what Data Analytics is and how it is relevant to the large aerospace company for which we both work. Strangely enough, the conversation soon turned to the fact that the first thing that we, as would-be practitioners of Data Analysis, would have to do is to educate the engineering and business staff about the benefits that could be accrued from using the data that is already being collected in this way while, at the same time, being careful to respect their perspective on the data and the ways that they are currently using it.
Then, when I got home and was reading my personal email, I came across a link to Big Smart Data, the blog of Max Goff. I met Max a number of years ago in Seattle while he was working for Sun Microsystems. He was a Java Evangelist and at the time I was struggling to persuade the software developers where I worked of the benefits of write once, run everywhere, the battle cry of Java at the time. I followed his career as he left Sun and started a consultancy in Tennessee. Somewhere along the line, I lost track of where he was blogging. I was thrilled to find his latest blog and also excited about the fact that he was exploring the same aspects of big data that form the core inspiration of my interest in Data Analysis.
A former boss of mine once said something to the effect that you could tell when an AI application was going mainstream when it had a database behind it. I think there is a lot of wisdom in that observation. Access to a large data store is necessary but not sufficient for emergent AI. I believe we are on the cusp of the emergence of Artificial Intelligence, ambiguous as the definition of it may be. I believe that Big Data, Data Analysis, and Data Science are going to be instrumental in this emergence.
When I first came to work at the aforementioned big aerospace company, it was because I was given the opportunity to work in their AI laboratory. AI winter soon set in and I spent the intervening years doing what I could to help the company solve hard problems with computers. Along the way, I have worked on some amazing projects but I have always longed to pursue the goal of emergent artificial intelligence. I am beginning to see the potential for pursuing that goal arising again. Things just naturally come around in cycles. And so that was my day from the perspective of Data Analysis and Data Science.