Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is Big Data

...

Widget Connector
width1000
urlhttps://www.youtube.com/watch?v=56s6E69u2vs
height600

...

Widget Connector
width1000
urlhttps://www.youtube.com/watch?v=lknC2rR_ikY
height600


In this lesson, we will provide you with an overview of Big Data. And you will learn how to get value from it. We will cover the terms, the concepts and the technologies and what has led to the big data era.

Many of us are generating and using big data without being aware that we are.

How is big data impacting business and people? Have you ever searched for or bought a product on Amazon?

Did you notice that Amazon started making recommendations related to the product you searched for?

Recommendation engines are a common application of big data. Companies like Amazon, Netflix and Spotify use algorithms based on big data to make specific recommendations based on customer preferences and historical behavior.

Personal assistants like Siri on Apple devices use big data to devise answers to the infinite number of questions end users may ask.

Google now makes recommendations based on the big data on a user's device. Now that we have an idea of how consumers are using big data, let's take a look at how big data is impacting business.

In 2011, McKinsey & Company said that big data was going to become the key basis of competition supporting new waves of productivity growth and innovation.

In 2013, UPS announced that it was using data from customers, drivers and vehicles in a new route guidance system aimed to save time, money and fuel. Initiatives like this one support the statement that big data will fundamentally change the way businesses compete and operate.

How does a firm gain a competitive advantage?

Have you ever heard of the Netflix show called House of Cards?

The first season of the show was released in 2013 and it was an immediate hit. At the time, the New York Times reported that Netflix executives knew that House of Cards would be a hit before they even filmed it, but how do they know that?

Big data.

Netflix has a lot of data. Netflix knows the time of day when movies are watched. It logs when users pause, rewind and fast forward. It has ratings from millions of users as well as the information on searches they make.

By looking at all these big data, Netflix knew many of its users had streamed the work of David Fincher and films featuring Kevin Spacey had always done well. And it knew that the British version of House of Cards had also done well. It also knew that people who liked Fincher also liked Spacey.

All these information suggested that buying the series would be a good bet for the company, and in fact it was.\ In other words, thanks to big data, Netflix knows what people want before they do. Now let's review another example. Market saturation and selective customers will require Chinese e-commerce companies to make better use of big data in order to gain market share.

Companies will have to persuade customers to shop more frequently, to make larger purchases and to buy from a broader array of online shopping categories.

E-commerce players already have the tools to do this as digital shopping grows. Leading players are already using data to build models aimed at boosting retention rates and spending per customer based on e-commerce data.

They have also started to adopt analytics backed pricing and promotional activities. The Internet of Things refers to the exponential rise of connected devices. IoT suggests that many different types of data today products will be connected to a network or to the internet for example refrigerators,
coffee machines or pillows. Another connection of IoT is called wearables and it refers to items of clothing or things we wear that are now connected.

These items include Fitbits, Apple Watches or the new Nike running shoes that tie their own shoelaces.

You have seen some of the characteristics of big data and you have seen some of the applications.

Beyond the Hype

...

Widget Connector
width1000
urlhttps://www.youtube.com/watch?v=PXGQhPlWhvs
height600

In this lesson, we will look at some examples of Big Data and how it is being generated. We will discuss sources of Big Data and the different types of Big Data. So why is everyone talking about Big Data?

More data has been created in the past two years than in the entire history of humankind. By 2020, about 1.7 megabytes of new information will be created every second for every human being in the world.

By 2020, the data we create and copy will reach around 35 zettabytes, up from only 7.9 zettabytes today. The chart on the right shows the growth in global data in zettabytes. Note the jump from 2015 to 2020 of 343%.

How big is a zettabyte? One bit is binary. It's either a one or a zero. Eight bits make up one byte, and 1024 bytes make up one kilobyte. 1024 kilobytes make up one megabyte. Large videos and DVDs will be in gigabytes where 1024 megabytes make up one gigabyte of storage space. These days we have USBs or memory sticks that can store a few dozen gigabytes of information where computers and hard drives now store terabytes of information.

One terabyte is 1025 gigabytes. 1024 terabytes make up one petabyte, and 1024 petabytes make up an exabyte. Think of a big urban city or a busy international airport like Heathrow, JFK, O'Hare, Dubai, or O. R. Tambo in Johannesburg.

And now we're talking petabytes and exabytes. All those airplanes are capturing and transmitting data. All the people in those airports have mobile devices. Also consider the security cameras and all the staff in and around the airport.

A digital universe study conducted by IDC claimed digital information reached 0.8 zettabytes last year and predicted this number would grow to 35 zettabytes by 2020. It is predicted that by 2020, one tenth of the world's data will be produced by machines, and most of the world's data will be produced in emerging markets. It is also predicted that the amount of data produced will increasingly outpace available storage. Advances in cloud computing have contributed to the increasing potential of Big Data. According to McKinsey in 2013, the emergence of cloud computing has highly contributed to the launch of the Big Data era.

Cloud computing allows users to access highly scalable computing and storage resources through the internet. By using cloud computing, companies can use server capacity as needed and expand it rapidly to the large scale required to process big data sets and run complicated mathematical models.

Cloud computing lowers the price to analyze big data as the resources are shared across many users, who pay only for the capacity they actually utilize. A survey by IBM and SAID Business School identified three major sources of Big Data. People-generated data, machine-generated data, and business-generated data, which is the data that organizations generate within their own operations.

The chart on the right shows different responses where responders were allowed to select multiple answers. Big Data will require analysts to have Big Data skills. Big Data skills include discovering and analyzing trends that occur in Big Data.

Big Data comes in three forms. Structured, unstructured, and semi-structured.

  • Structured data is data that is organized, labelled, and has a strict model that it follows.
  • Unstructured data is said to make up about 80% of data in the world, where the data is usually in a text form and does not have a predefined model or is organized in any way.
  • And semi-structured data is a combination of the two. It is similar to structured data, where it may have an organized structure, but lacks a strictly-defined model. Some sources of structured Big Data are relational databases and spreadsheets.

With this type of structure, we know how data is related to other data, what the data means, and the data is easy to query, using a programming language like SQL. Some sources of semi-structured Big Data are XML and JSON files.

These sources use tags or other markers to enforce hierarchies of records and fields within data. A large multi-radio telescope project called Square Kilometer Array, or SKA, produced about 1000 petabytes, in 2011 at least, of raw data a day.

It is projected that it will produce about 20,000 petabytes or 20 billion gigabytes of data each day in 2020.

Currently, there is an explosion of data coming from internet activity and in particular, video production and consumption as well as social media activities. These numbers will just keep growing as internet speeds increase and as more and more people all over the world have access to the internet.

Structured data refers to any data that resides in a fixed field within a record or file. It has the advantage of being easily entered, stored, queried, and analyzed.

In today's business setting, most Big Data generated by organizations is structured and stored in data warehouses.

Highly structured business-generated data is considered a valuable source of information and thus equally important as machine and people-generated data.


...

Professor Norman White (@normwhiteis the Faculty Director at the Stern Center for Research Computing at New York University. The following readings are from his blog: researchcomputing.blogspot.ca where he comments on prevailing Big Data related events. The posts are much like diary entries and reflect what was happening at different points from 2011 to 2015. 

  1. http://researchcomputing.blogspot.com/2011/10/big-data-and-business-analytics-comes.html
  2. http://researchcomputing.blogspot.com/2011/04/facebook-joins-google-in-hpc-computing.html