AWS Analytics offerings

This was my preparation note while I appeared for AWS solution architect – Associate exam. I cleared it in first attempt with good margin. Sharing it here as I guess it helps for beginners and aspirants.

Other notes in this series.

AWS data pipeline

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data.
With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.

Redshift

Large scale data warehousing system.
Based on Postgre but customized.
AWS data warehousing solution for business intelligence.
Supports
- Single node (160 Gb)
- Multi node
  - Leader node
  - Compute node – up to 128.
Redshift is column based database.
Supports compression
Billing
- Charged only for the hours of compute node. Leader node not changed
- Data transfer
- Backup.
Available in only in 1 AZ
Can be created within VPC
Redshift doesn’t provide data interface
- Connection requires usage of ODBC.JDBC connections and PostgreSQL drivers
Block size of Redshift is 1MB or 1024 KB.

Elastic Map Reduce (EMR)

Elastic map reduce
Supports mapreduce and apache spark
Big data ecosystem.

Elastic Search

Elastic search on amazon.
Elasticsearch is distributed search and analytics platform (similar to solr).

Kinesis for real time message processing

Real time stream processing.
Used for
- Real time analytics
- Real time notifications
- Complex events processing.

Amazon Machine Learning

Create predictive models.
Deploy models
Do scoring
A set of algorithms available

AWS Analytics offerings

Related posts

Leave a Reply Cancel reply