What is data?
- Data is information
- It may be in many forms – table (as given below), images, text, audio, video, etc.
- Data size may vary – from a few kilobytes to petabytes or more
Following is an example of a simple dataset –
What is data analysis?
- Extracting useful information from data
- Learn patterns and structures in data, and answer questions given new data
Example of data-analysis –
- Predict team that would win in a given match
- Estimate the room rent for a given house
- Recognize objects/handwriting in images
- And much more.
Preparing data for analysis
- Data should be understood – just like requirements should be understood before developing a software
- The meaning of each column should be known for efficient analysis
- Permission, schema, and location of data should be known
- For example, we should be able to join from all required tables in a relation into a single table for analysis
- Data should be cleaned by removing
- duplicate rows
- inconsistent or impossible values, like a negative tip on a restaurant bill data
- invalid data, like data collected when transferring some sensor to the test location
- redundant data – both hours and minutes give the same information, hence both are not desired
- Data might need to be formatted – example, to change the values from name to identifying numbers