Excel is ubiquitous in the business world and has enabled millions of business folks from c-suite executives to entry-level employees to view, edit and summarize data. However, as a result of its appeal to the masses, it misses critical functionality for an analyst with little to no data management, preparation, or munging features. Unfortunately, it is precisely these tasks that take up most of an analyst’s time.
It is often said that 80% of an analyst’s work is comprised of data acquisition and cleaning, while a mere 20% consists of the actual analysis, modeling, visualizing, and complaining about the format of the data. Big Data Borat summed this up in less than 140 characters when they tweeted:
Putting Big Data Into Perspective
It isn’t ‘Big Data’ if it doesn’t fit in Excel. It is likely just a couple hundred megabytes. Here, software developer Chris Stucchio, discusses the tools necessary to handle large data files—those too large for Excel to open. There are surprisingly few cases where you actually need a ‘Big Data’ stack, but there are numerous use cases where you don’t ‘see’ your data to acquire, manipulate and analyze. Additionally, there are even types of analysis and data that might be too big for your MacBook Pro, but still don’t require a ‘Big Data’ stack.
For only a few dollars an hour, you can borrow a computer on Amazon Web Services that can easily handle analyzing in-memory multiple gigabytes of data, without even thinking about Hadoop and Map Reduce jobs.
Using the Right Tool For the Job
Now, let’s address the big question about ‘Big Data’. If Excel isn’t the right tool for an analyst’s everyday tasks, what tools must an analyst obtain to produce insightful analysis? The answer to this question lies simply in the plural: tools. Unfortunately, techies have yet to develop a one-stop-shop where an analyst can complete all data analysis tasks. Until then, they must utilize a combination of tools to successfully complete these tasks—patiently waiting for the magical unicorn device that will single-handedly make all of their data analysis dreams come true.
In my experience, the most effective tools for data acquisition, preparation and analysis are R and Python. Each have their strengths (both are open sourced) and weaknesses (both are open sourced). If you’re an analyst without much of a technical background, they may prove to have a steep learning curve. But rest assured, it is well worth riding out this curve for the wealth of data obtainable from these resources.
Evolving Analysts Into Engineers
The combination of more and more data with business stakeholders asking harder and harder questions is requiring an analyst to think more like an engineer and less like a bean counter. Besides, who wants the bean counter label? It is the engineer who innovates, solves challenging problems and creates elegant solutions. This is precisely what an analyst can and should be doing with data.
I haven’t heard of predictive modeling or machine learning being done in Excel—and if it has been attempted, I pity that poor soul. With that being said, the most productive and insightful analyst is armed with multiple tools from SQL to R to Python and most importantly, knows which tool is best for the job.
Collaboration Across Multiple Teams
Beyond the tools an analyst has at his or her disposal to wrangle data, it is equally critical to be able to collaborate across all levels of analysis. To assist in collaboration, an analyst should strive to make their work reproducible. Reproducible work is helpful not only for collaboration, but also for validating results and efficiently updating the analysis when new data is available.
In addition to collaborating with other analysts, it is just as important to be able to collaborate with both technical teams as well as functional business teams for the most beneficial impact. Predictive models aren’t beneficial if they can’t be deployed into a product system and reports will only collect dust if business users aren’t able to digest the information and make sense of it. In both cases, analysts are most effective when they’re able to collaborate with peers, technical teams and domain subject experts.
I believe one day ‘Big Data’ will be able to deliver on much that it promises, but the full value data carries cannot be attained without cross functional team collaboration and making use of the right tools for the job.
By Andrew Harris