Learning

AWS Concepts

aws

This article lays out the basic AWS concepts to get a fundamental understanding of the AWS services and their usage.

Storage

Types

  1. S3
    • Types of S3
      • S3 – 2+ AZ (Availability Zone)
      • S3 IA (Infrequently Accessed)
      • S3 One Zone IA
      • Reduced Redundancy Storage – 99.99% availability
      • Glacier
        – Also supports Intelligent Tiering which moves objects around S3 types based on 30 days access.
    • Writing to S3 has immediate consistency, but update/delete has eventual consistency.
    • A random prefix can be used before folder/file names to make better partitioning, which will help in faster read/writes
    • Allow users to read private data on S3, by using CloudFront with signed URLs
    • S3 charges are based on:
      1. Storage per GB
      2. Requests GET/PUT/POST/etc.
      3. Storage Management Pricing
      4. Data Management Pricing
      5. Transfer Acceleration
    • S3 encryption
      1. In transit (SSL/TLS)
      2. At rest (Server Side Encryption)
        • SSE-S3 S3 managed key
        • SSE-KMS AWS Key Management Store
        • SSE-C Encryption with customer provided key
      3. Client Side Encryption
        • x-amz-server-side-encryption needs to be put in the request header. Then this header can be used to deny requests to the bucket without this header
          • AES256 – SSE-S3
          • aws:kms – SSE-KMS
    • Resource S3 ARN must include /* at the end of the bucket name to be a valid ARN
  2. Glacier
    Storage for long term at cheap prices. Takes time to recover data. User for compliance related data.
  3. EFS – Elastic File Storage
    User for installing applications
  4. EBS – Elastic Block Storage
    • Storage volumes that can be attached to EC2 instances.
    • These are created in Specific Availability Zones
    • Types of EBS volumes:
      1. General Purpose SSD (GP2) – 3-50 IOPS per GB (100-16000 IOPS, 33.3 GB to 5.33 TB)
      2. Provisioned IOPS SSD (101) – 10000+ IOPS suitable for Database
      3. Throughput Optimized HDD (ST1) – Data warehousing. Can’t be boot volume
      4. Cold HDD (SC1) – For low cost infrequently accessed server
      5. Magnetic (standard) – Lowest cost bootable drive
    • Creating Encrypted volume from Unencrypted volume
      1. Create snapshot from active volume
      2. Copy snapshot and cd select Encrypted
      3. Create new storage from this encrypted snapshot
    • The root device can also be encrypted using OS level encryption (example – BitLocker)

Database

Types

  1. RDS – Relational Database Storage
    • Types
      1. SQL Server
      2. Oracle
      3. MySQL
      4. PostgreSQL
      5. Aurora – Amazon’s own flagship, not available on Free tier
      6. MariaDB
    • Support for Backups, Multi-AZ and Read replicas
      1. Automated Backups – Recover database with retention period of 1-35 days.
        • The backup storage is equal to the database size and is provided for free.
      2. Database Snapshots
        • These are done manually and are stored even after the RDS instance is destroyed.
    • Multi AZ
      • Allows to have an exact copy of the DB in another Availability Zone. It is used for Disaster Recovery only
    • Read Replicas
      • Copies of the main DB to support reads only.
      • One DB can have up to 5 read replicas, and each one can have their own 5 read replicas.
      • Each read replica will have its own DNS endpoint and Multi AZ.
    • An encrypted database will also have all its data encrypted.
    • A restore can be done to a particular second, as the snapshot stores all the actions performed, and so is applied back when the restore is done.
    • Should not be available for direct public access. So a private subnet with NAT gateway for public access is preferable.
  2. DynamoDB
    • Fast and flexible NoSQL database with single digit millisecond latency.
    • Supports both documents and key-value pairs
    • Documents can be JSON, XML, HTML
    • Primary keys:
      1. Partition key
      2. Composite key (Partition + Sort key)
    • 2 consistency models
      1. Strongly consistent (requires explicit declaration)
      2. Eventually consistent (faster reads)
    • Indexes:
      1. LSI – Local Secondary Index – Can only be created while creating table.  It uses the same partition key as table. An alternative sort key can result in faster operations, as well as a different view of the table.
      2. GSI – Global Secondary Index – Can be created anytime. Uses different partition for primary as well as sort key.
    • Types of querying to DynamoDB
      1. Scan
        • It examines all the items in the table.
        • Column attributes can be filtered out using ProjectionExpression
      2. Query
        • It finds items In a table based on the primary key (required), and an optional sort key.
        • Column attributes can be filtered out using ProjectionExpression
        • Default is ascending order of sort key, which can be set to false by using ScanIndexForward parameter
    • DynamoDB scans in 1MB incremental operations and can be configured for parallel scan, by dividing the table into segments.
    • DynamoDB Provisioned Throughput
      • 1 WCU (Write Capacity Unit) = 1KB/sec write
      • 1 RCU (Read Capacity Unit) = 1 strong consistent read at 4KB/sec
        Or 2 eventually consistent read at 4KB/sec
        Or 1 transaction read at 2KB/sec
      • For reading 80 3KB items /sec would requires 3/4 = 0.75 ~ 1 RCU * 80 = 80 RCU with strong consistency
        Or 80/2 = 40 RCU for eventually consistent reads
      • For writing 100 512bytes items/sec would require 512bytes/1KB = 0.5 ~ 1 WCU * 100 = 100 WCU
    • DynamoDB OnDemand capacity – For unpredictable workloads.
    • Charges apply for reading, writing and storing.
    • Can be switched between Provisioned and OnDemand once per day.
    • DAX (DynamoDB Accelerator)
      1. Clustered in-memory cache
      2. 10x read performance
      3. It caters to eventually consistent reads
      4. If not in cache, DAX will make an eventually consistent GetItem query to DynamoDB
    • DynamoDB Streams – Time ordered logs for item level modifications and is stored for 24 hours.
    • If request rate is too high, then it will throw ProvisionedThroughputExceededException, and it using the SDK, it will continue retrying with progressive incremental longer waits (50ms, 100ms… till 1min)
    • The Throughput is divided evenly across all partitions. So it’s possible for Exceptions from a partition level, rather than the Database level
    • Use conditional-expression to avoid overwrites.
  3. RedShift
    For data warehousing (OLAP).
  4. ElastiCache
    • Web service that makes it easy to deploy, operate and scale in-memory DB. Good for read operations to take load off the main DB.
    • Types
      1. Redis – In memory, Multi AZ key-value DB
      2. Memcached – Single AZ, object based caching system with support for multi-threading and multiple cores.
    • Caching strategies
      1. Lazy loading
        • Will load when the request is made.
        • So the first call will always return null.
        • Also won’t update when the database is updated, as it only tries to fetch an item which is not present in cache. TTL can be used to mitigate this to some extent.
      2. Write-through
        • Update the cache when data is written in the DB.
        • This would also mean that there would be data present in the cache which might never get read.
    • OLAP – Online Analytics Processing
    • OLTP – Online Transaction Processing
    • DynamoDB ACID (Atomic, Consistent, Isolated, Durable) Transactions can be used to perform actions across multiple tables with rollback option on fail.
    • BatchWriteItem – It’s possible some actions can succeed
    • TransactionWriteItem – Needs to have all success, or it will revert

Analytics

  1. Athena – SQL queries on S3. Is serverless, and can’t be used to manage transactions.
  2. EMR – Elastic Map Reduce – Process large amounts of data
  3. Cloud Search – Creating search capabilities
  4. Elastic Search – Creating search capabilities
  5. Kinesis – Analyzing large data sets in TBs/hr
    • Services:
      1. Kinesis Streams
        • Consists of shards with 2MBPS read and 1MBPS write (5 reads/sec or 1000 writes/sec)
        • Retain time is between 24hr to 7 days
      2. Kinesis Firehose
        • The data is processed in this instance without any retain time.
      3. Kinesis Analytics
        • Runs SQL queries over Streams/Firehose and stores the result afterwards.
    • Kinesis Client Library is used to read data from Kinesis, and creates one record processor for each shard.
    • Record processors are present in each consumer (example EC2).
    • Kinesis Data Streams provides 2MBPS output to be shared by all applications consuming data from the stream. But Enhanced Fanout can be used to provide 2MBPS per shard, thus increasing the consuming capacity to scale to the number of shards.
  6. Data Pipeline – Move data from one place to another. Example: S2 to DynamoDB
  7. QuickSight – Analyze data in the S2, DBs, and provide insights

Security & Identity

  1. IAM
    1. 3 types of policies
      1. Managed Policies – Created by AWS and non-editable
      2. Customer Managed Policies – Created by the customer
      3. Inline Policies – Applied directly to role, group, user
  2. Inspector – Installed in VM to report its state
  3. ACM (AWS Certificate Manager) – Easily provision, manage, and deploy public/private SSL/TLS certificates for use with AWS services.
  4. Directory Service
  5. WAF – Web Application Firewall
  6. Artifacts
  7. KMS – Key Management Service
    • Manage encryption keys to be used across AWS environment
    • The encryption keys are regional, even when IAM is global
    • CMK (Customer Master Key) is used to decrypt the data (envelope) key, and then the Envelope Key is used to decrypt the data.
    • CMK can never be exported.
    • Can only encrypt up to 4KB of data
    • Generate-data-key can be used to encrypt larger files at client end.
    • Environment variables can be up to a maximum of 4KB
  8. Web Identity Federation
    • Authenticating with Web Identity providers like Amazon, Facebook, Google to authenticate users.
    • These are traded for temporary AWS security credential
    • Amazon Cognito is the identity broker between the application and the web based identity provider.
    • User Pools are directories used to manage signup and sign-in functionality
    • Identity Pools used to create unique identities for the users and authenticate them with Identity providers.
    • Cognito uses Push synchronization to send silent push notifications of user data updates to multiple devices for a user ID via SNS
  9. STS – Security Token Service – This is the service running beneath Cognito.
    • Regular web applications can use STS to use AssumeRoleWithWebIdentity, which will return a set of temporary credentials (access key ID, secret access key and security token)
    • AssumedRoleUser ARN and AussumedRoleID are used to programmatically reference the temporary credentials.

Management Tools

  1. CloudWatch – Monitor environment resources, as well as applications running on AWS
    • Host level metrics
      • CPU
      • Network
      • Disk (not storage)
      • Status check of EC2 instance
    • RAM utilization is a custom metric
    • Standard Monitoring – 5 minute intervals – Default
    • Detailed Monitoring – 1 minute intervals
    • Custom Metric can be done for 10 second intervals too.
    • The metrics are stored indefinitely by default, but can be modified
    • Can be retrieved using GetMetricsStatistics API, or by 3rd party API’s
    • CloudWatch can be done on-premise too by installing the SSM agent and CloudWatch agents
    • CloudWatch vs CloudTrail vs Config
      • CloudWatch tracks performance
      • CloudTrail tracks API calls
      • Config records the state of the AWS environment and can notify of changes
    • 3 states of CloudWatch Alarm
      • INSUFFICIENT_DATA
      • OK
      • ALARM
  2. Elastic Beanstalk – Deploy and manage applications without worrying about infrastructure.
    1. It’s a service to deploy, manage and scale applications by provisioning resources.
    2. It has button clicks, whereas CloudFormation uses JSON based structure to provision resources.
    3. Upgrade Policies:
      1. All at once – Outage
      2. Rolling – Reduced capacity
      3. Rolling with additional batch
      4. Immutable – Creates a new fleet
    4. Beanstalk can be customized with YAML or JSON files
    5. RDS shouldn’t be instantiated within Elastic Beanstalk as terminating it, would also terminate the RDS instance. Can be instantiated outside, and used inside the Beanstalk by using the connection information and additional security group.
  3. CloudFormation – Provision AWS infrastructure as code.
    1. Supports YAML and JSON, and the template can be uploaded to S3, and Cloud Formation reads it to create a complete stack.
    2. Can be used to rollback and delete entire stacks. Rollback is done when template processing fails.
    3. Template looks like
      • AWSTemplateFormatVersion: “2010-09-09”,
      • Description: “Template for creating EC2 instance”,
      • Metadata:
      • Parameters: # Input custom values
      • Conditions: # Provision resources based on environment
      • Mappings: # Create custom mapping like region. Also used to specify SAM
      • Transform: # Reference code located in S3
      • Resources: # AWS resources to create
    4. Fn::GetAtt to get the value of an attribute in the template
    5. Fn::FindInMap returns values from a 2 level map declared in Mappings section
    6. CloudFormation Change Sets allows preview to see the impact of changes on running instances.
    7. The default limit is 200 for CloudFormation stacks
  4. SAM – Serverless Application Model
    • Extension of CloudFormation to define Serverless Applications
    • SAM CLI:
      1. Package the application and upload to S3
        sam package –template-file xxx –output-template-file yyy –s3-bucket-name zzz
      2. Deploy serverless app with CloudFormation
        sam deploy –template-file yyy –stack-name aaa –capabilities CAPABILITY_IAM
    • SAM AWSTempalte
      • Transform: AWS::serverless-2016-10-21
      • Resources
  5. CloudFormation Nested Stacks
    • Nested stacks allow reuse of CloudFormation templates
    • Output section of one CloudFormation template can be used as an input to another template
  6. Cloud Trail – Audit for changes on AWS environment
  7. Opsworks – Automatic deployments using Chef
  8. Config – Monitor environment
  9. Service Catalog – Authorize/Unauthorize services to users
  10. Trusted Advisor – Gives tips to do performant customizations.
  11. Access Advisor Feature in IAM console – Reports last used timestamps about AWS requests for roles. Can be used to identify, analyze and remove unused roles
  12. IAM Access Analyzer – Identifies resources shared with external entities.
  13. Amazon Inspector – Automated security assessment service that accesses applications for exposure, vulnerabilities and deviations from best practices.

Application Service

  1. Step Functions – Coordination of components in a distributed application.
  2. X-Ray – Service that collects data about requests and provides tools to identify issue locations.
    • Contains:
      1. X-Ray SDK – Contains interceptors, client handlers and HTTP client to add to application for tracking.
      2. X-Ray Daemon (For EC2 and Elastic Beanstalk). For docker, it needs to be installed in another docker.
      3. X-Ray API
      4. X-Ray Console
    • Integrates with:
      1. Elastic Load Balancer
      2. AWS Lambda
      3. Amazon API Gateway
      4. Amazon EC2
      5. AWS Elastic Beanstalk
      6. ECS
    • Annotation – Records additional information about the requests in key-value pairs, helping to index and filter data
  3. SWF – Simple Workflow Service – Coordinate work across distributed components.
  4. API Gateway – Create and deploy own REST and WebSocket API’s at scale
    • Can import swagger 2 definition files
    • API throttling at 10000 requests/sec with 5000 requests per AWS account. Else will throw 429 too many requests.
  5. API Caching – API responses can be cached, to decrease overhead on Lambda. It comes with throttling and logging into Cloudwatch.
  6. AppStream – Stream desktop applications to user
  7. Elastic Transcode – Convert media files to required formats.

Developer Tools

  1. CodeCommit – Version control system
  2. CodeBuild – Build service to compile source code, run unit-tests, and produce artifacts for deploying.
  3. CodeDeploy – Automate the deployment of application to instances, and update application as required.
    • Can deploy to:
      1. ECS (EC2 and Fargate)
      2. On-premise servers
      3. Lambda
    • Approaches:
      1. In place – Rolling – Better for first time deployments as is very disruptive.
      2. Blue-Green – Immutable
        1. Easy to switch between old and new
        2. Need to pay for 2 environments until the old one is deleted.
    • Canary and Rolling updates are not an option for CodeDeploy
    • Deploying using AppSpec
      1. Define parameters to be used during deployment
      2. Uses YAML, and JSON for lambda
      3. Looks like:
        • version (0.0)
        • os
        • files
        • hooks
      4. appspec.yaml needs to be located in the root directory
      5. Lifecycle events have a very specific order and can be divided into 3 phases
        • Deregister instances from Load Balancer
          1. BeforeBlockTraffic
          2. BlockTraffic
          3. AfterBlockTraffic
          4. ApplicationStop
          5. DownloadBundle
        • Perform the application deployment
          1. BeforeInstall
          2. Install
          3. AfterInstall
          4. ApplicationStart
          5. ValidateService
        • Reregister the instances to Load Balancer
          1. BeforeAllowTraffic
          2. AllowTraffic
          3. AfterAllowTraffic
  4. CodePipeline – Model, visualize and automate steps for Continuous Delivery service
    1. Workflow is defined
    2. New Code appears
    3. Code is build and tested
    4. Application deployed
  5. CodeStar – For easier CI/CD process without too much intervention
    • Continuous Integration – CodeCommit
    • Continuous Delivery – CodeBuild + CodeDeploy
    • Continuous Deployment – CodePipeline
    • Code Reviewer / Code Reviewer for Java can be selected to receive recommendations for fixing Java Code. Done while creating the repository

Messaging

  1. SNS – Simple Notification Service
    • Can send notifications via Email, SQS, HTTP end points, as well as Lambda
    • Can group multiple recipients with ‘Topics’
    • Push based delivery system using publisher-subscriber (pub-sub) method.
  2. SQS – Simple Queue Service
    • It’s the first AWS service, and gives access to a message queue waiting to be processed.
    • It is a pull based system, and supports message size up to 256KB
    • Messages larger than 256KB to 2GB can be stored in S3 and the AWS SDK or Amazon SQS Extended Client Library for Java can be used to deliver those messages.
    • Types:
      1. Standard Queues – Guarantees delivery at least once, but no particular order can be guaranteed.
      2. FIFO
    • Messages can be kept for 60 seconds to 14 days
    • Visibility Timeout can be set to 30sec to 12 hours (default 4 hours), during which the message would be invisible in the SQS queue.
    • Long polling is required to stop AWS from continuous polling.
    • SQS Delay Queues can be used to postpone delivery of messages to the Queue (0sec to 900sec)
    • Dead letter queue with 3 Maximum Receives will make sure that after 3 retries, the message is sent to the dead letter queue for Forensic purposes.
  3. SES – Simple Email Service
    • Highly scalable email service.
    • Incoming email delivered to S3 bucket can trigger SNS, Lambda function

Migration

  1. Snowball
  2. DMS – Migrate databases without any downtime
  3. SMS – Server Migration Services – Migrate VMs to AWS cloud

Mobile Services

  1. Mobile Hub – Gives access to all services to build a mobile application.
  2. Cognito – SignUp/SignIn
  3. Device Farm – Testing on lot of devices
  4. Mobile Analytics
  5. Pinpoint – User engagement data gathering

Business Productivity

  1. Workdocs
  2. WorkMail

Desktop and App Streaming

  1. Workspaces – Having the whole setup in the cloud
  2. Appstream

Artificial Intelligence

  1. Alexa (it has Lex inside)
  2. Polly – Converts Text to Voice
  3. Machine Learning
  4. Rekognition – Understands images

Miscellaneous

Things in this part can be moved to the specific categories over time.

  1. VPC – Virtual Private Cloud – Virtual Network for the private data center
  2. Route 53 – DNS Service – Amazon’s Domain Name System web service to connect EC2 instances or Load Balancers.
  3. CloudFront – Storage for caching files
    • Types
      1. Edge location – Location where content is cached. Can also be used for WRITE requests
      2. Origin – Origin of the file (S3, EC2, Load Balancer, etc.)
      3. Distribution – Collection of edge locations
      4. Web distribution – User for websites
      5. RTMP – For media streaming
    • Cache is cleared on TTL (Time To Live, expressed in epoch time). Manually invalidation of cache can be done, but with a charge.
    • On TTL expiry, the items is marked for deletion, and will be deleted in 48 hours.
  4. Direct Connect – Physically connecting to servers
  5. EC2 – Elastic Compute Cloud – Virtual Machines in the cloud
    • Types
      1. On Demand
      2. Reserved
        1. Standard RI
        2. Convertible RI – Change attributes of RI resources for equal or more
        3. Scheduled RI – Launch only at reserved timeframe
      3. Spot
        • If instance is terminated by Amazon, then the partial usage isn’t charged.
      4. Dedicated Hosts
    • The default max is 20 EC2 instances
    • Terminate HTTPS connections at the EC2 instance
    • To create an auto-scaling group, it’s required to create a launch template with all the parameters like the AMI (Amazon Machine Image).
    • Use CURL or GET to access instance metadata.
  6. ECS – EC2 Container Service – Container Management Service to run Docker
    • Can be EC2 or Fargate (Fargate is for non-technical. It will spin up instances as required)
    • Container consists of code, libraries, virtual kernel
    • Container runs in a docker, and a docker runs in an OS
    • ECS will help run containers in cluster of virtual machines
    • Fargate is for serverless architecture
    • Docker is a container which is a standalone, lightweight executable, which has everything that the software needs to run.
    • Example buildspec.yaml for docker:
      • Docker build -t $REPOSITORY_URI:latest
      • Docker push $REPOSITORY_URI:latest
    • buildspec.yaml commands can be overridden while launching the build in the console
    • Pull docker images from ECR using
      docker pull aws_account_id.dkr.ecr.us-west-2.amazonaws.com/mydockerrepo:latest
  7. ECR – Elastic Container Registry – Used for storage of container images.
  8. Lambda – Run code without servers. Pay for compute time.
    • 1 event can trigger 1 function, which in turn can trigger multiple functions.
    • For overly complicated Lambda architecture, AWS X-RAY can be used for debugging.
    • Each version has a unique ARN (Amazon Resource Name) and each version is immutable.
    • Qualified ARN, Unqualified ARN
    • For getting the latest, can use Qualified ARN + $LATEST tag, or the unqualified ARN which will always return the latest one.
    • If an alias is being used, make sure to update the alias when deploying new code.
    • Lambda Triggers – API Gateway, AWS IoT, Alexa Skills Kit, Alexa Smart Home, Cloudfront, Cloudwatch Events, Cloudwatch Logs, CodeCommit, Cognito Sync Trigger, DynamoDB, Kinesis, S3, SNS
    • Priced by:
      1. Number of requests
      2. Duration (rounded to 100ms)
    • Default concurrent execution is limited to 1000 per region, but can be increased by contacting AWS support.
    • Reserved concurrency can be made available for critical functions.
    • To access VPC, Lambda requires the Private Subnet ID and the Security Group ID (with required access). Lambda uses these information to setup ENI (Elastic Network Interface) from the private subnet.
    • Handler property specifies the entry point for a lambda function.
      • Example:
        Handler: lambda_function.lambda_handler
        This will execute the lambda_handler function inside lambda_function.py file
    • Runtime refers to the language of the lambda function
  9. Lightsail – A simple virtual private server.
  10. Elastic Load Balancer
    • Types
      1. Application Load Balancer – Operates at Layer 7 and is application aware. Routes requests by Type.
      2. Network Load Balancer – Balances TCP traffic at Layer 4. Used for extreme performance, capable of handling millions of requests per second and is very costly.
      3. Classic Load Balancer – Can operate on either Layer 4 or Layer 7. Gives 504 error when server stops responding.
    • To know public IP address of the request, look at x-forwarded-for header
  11. AWS System Manager Parameter Store
    • Store confidential information to be accessed across applications
    • Can be stored in plain text or encrypted.
  12. AWS CLI pagination – Control the items fetched when running a CLI command
    • By default the CLI uses a 1000 page-size and makes multiple API calls to fetch all the items.
    • It may cause timeout for large data sets.
    • –page-size can be used for smaller page-size
    • –max-items can be used to show limited items in output.
    • –starting-token
    • –no-paginate
  13. IAM Policy Simulator can be used to test the effects of IAM policies.
  14. S3, DynamoDB, Lambda, Kinesis, Fargate, SNS, API Gateway are considered serverless services.
  15. Client Errors are 4xx and Server side errors are 5xx HTTP Response Code
  16. Easiest way to serve a website via HTTPS is to create a CloudFront distribution, which supports HTTPS with a custom domain name.
  17. Provisioned IOPS to volume in GB is 50:1. So for 100GB of storage, max IOPS can be 50*100 = 5,000 IOPS
  18. EBS volumes are AZ locked.
  19. An Auto-Scaling group can span instances across multiple AZ, but not across multiple regions
  20. Memory available to Lambda is 128MB to 3008MB in 64MB increments. At 1792MB, it’s provided 1vCPU.

Leave a Reply

Your email address will not be published. Required fields are marked *