Defining Data Buzzwords

by Marie Hoie, Data Analyst

Posted on April 4, 2018

In this new era of data-driven business decisions, a substantial glossary of formerly industry-specific jargon is now being used by laypeople to discuss potential applications for big data in corporate, government, and non-profit settings. Many public and private entities have adopted certain elements of data analytics, but some company leaders may be limited in taking full advantage of available opportunities purely because the terms applied to describe them are often used interchangeably and/or incorrectly. Without a sufficient understanding of these fundamental terms, decision-makers will not be able to gauge what opportunities should or should not be pursued. Furthermore, they will not be able to determine the true extent of efficiency gains or the return on investment of any decisions they do make. With this in mind, HeinfeldMeech presents this partial list of commonly used data-related terminology as a starting point for enhancing  client understanding of some analytics services currently in existence as well as where we may be headed in the future.

Concepts –

Big Data:

datasets that grow so large and complex that they are difficult to capture, store, manage, share, analyze, and visualize within the current computational architecture

Business Intelligence:

any technology, application, or practice instituted for the collection, analysis, and presentation of business information; results provide actionable information to help in business decisions

Cloud Computing:

virtual storage and access of data and programs offsite through a third-party, where the burden of computing demands is managed by pooling computing resources and implementing prioritization algorithms

Data Mining:

umbrella term for exploration and analysis, by automated or semi-automated means, of large quantities of data in order to discover meaningful patterns; extraction of implicit, previously unknown, and potentially useful information from raw data

Closely associated with, though more comprehensive than, the term “Machine Learning”

IoT: 

an acronym for the “Internet of Things”

a further development of the Internet in which everyday objects have network connectivity, allowing them to send and receive data (mostly streaming); these objects can provide real-time data from this constant monitoring, thereby increasing efficiencies and helping predict faults/failures, but can also expose users to hacking risks

Examples –  “smart” security cameras, telepresence equipment, Amazon Echo

Supervised Learning:

the machine learning model-building procedure of using labeled datasets to make predictions about future datasets

Unsupervised Learning:

the machine learning model-building procedure of drawing inferences from datasets for which an outcome of interest is unknown

Testwork –

Benford’s Law:

analysis of large set(s) of continuous numerical data against the expected frequency distributions of the leading digits; used to help detect financial misstatements and fraud

Correlation:

a statistical measure describing the extent of interdependence of two or more variables; not to be confused with causation, which is an indication that one event is the result of the occurrence of a separate event — correlation does not equal causation

Data Visualization:

any effort to aid in the comprehension of patterns, trends, and correlations in data by placing it in a visual context; may help detect and expose insights that could otherwise remain undiscovered

Fuzzy Logic:

employed to handle the concept of partial truth, when the value of a variable may range anywhere between completely false and completely true; e.g. in comparing two addresses that are fundamentally the same but are not exact matches due to different typing conventions, fuzzy logic provides a similarity index between 0% and 100% match

Outliers:

data objects with characteristics that are considerably different than most of the other data objects in the dataset; most commonly +/- 3 standard deviations away from the average value

Regression:

a set of statistical processes for estimating the relationship between a dependent variable and one or more explanatory/predictor/independent variables

Text Mining:

the gathering, format conversion, and analysis of free-form/unstructured text in order to discover meaningful patterns or generate previously unknown insights