This was my preparation note while I appeared for AWS solution architect – Associate exam. I cleared it in first attempt with good margin. Sharing it here as I guess it helps for beginners and aspirants.
Other notes in this series.
AWS data pipeline
- AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data.
- With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.
Redshift
- Large scale data warehousing system.
- Based on Postgre but customized.
- AWS data warehousing solution for business intelligence.
- Supports
- Single node (160 Gb)
- Multi node
- Leader node
- Compute node – up to 128.
- Redshift is column based database.
- Supports compression
- Billing
- Charged only for the hours of compute node. Leader node not changed
- Data transfer
- Backup.
- Available in only in 1 AZ
- Can be created within VPC
- Redshift doesn’t provide data interface
- Connection requires usage of ODBC.JDBC connections and PostgreSQL drivers
- Block size of Redshift is 1MB or 1024 KB.
Elastic Map Reduce (EMR)
- Elastic map reduce
- Supports mapreduce and apache spark
- Big data ecosystem.
Elastic Search
- Elastic search on amazon.
- Elasticsearch is distributed search and analytics platform (similar to solr).
Kinesis for real time message processing
- Real time stream processing.
- Used for
- Real time analytics
- Real time notifications
- Complex events processing.
Amazon Machine Learning
- Create predictive models.
- Deploy models
- Do scoring
- A set of algorithms available