Installing Apcera on AWS
This document describes how to install and configure Terraform in preparation for installing the Apcera Platform on AWS.
- AWS Cluster Description
- AWS Installation Prerequisites
- Create AWS Resources
- Configure Cluster Deployment
- Deploy Apcera Platform to AWS
- Complete Post Installation Tasks
AWS Cluster Description
The reference cluster installed on AWS as described here makes use of the following resources:
- 14 EC2 instances
- 1 ELB
- 4 Elastic IPs
- 6 Security Groups
- 3 Availability Zones
- 8 volumes (in addition to host root disks)
- 3 subnets
For reference, see also the following topics:
- List and description of Apcera cluster components.
- Sizing considerations for a minimum production deployment.
- Required Ports.
AWS Installation Prerequisites
Before you install Apcera on AWS you must complete the following prerequisites.
Prerequisite | Description |
---|---|
Create IAM keys | To provision AWS resources, you will need to provide the access_key and secret_key for an authorized IAM user. |
Select AWS region | To provision AWS resources using the default configuration, you will need to specify an aws_region with 3 availability zones (AZs) where you want to deploy the cluster. |
Upload public SSH key to AWS | To access the Orchestrator host and other cluster hosts remotely, you will need to generate a SSH public/private key pair and upload the public key to AWS. |
Configure Google Auth | For cluster access via APC and the web console, you will need to create a Google Auth project and generate the keys for client_id , client_secret , web_client_id . |
Generate SSL certificate chain and key | HTTPS is recommended for production clusters. |
Registered domain name | To deploy the cluster you will need to have a registered domain_name so you can update DNS records with the address of the ELB for the HTTP routers and IP address of the monitoring host. |
Install Terraform | You will need Terraform version 0.7.4 or later to provision AWS resources. |
Install Ruby | You will need Ruby to generate the cluster.conf file that is used to deploy a cluster. |
Create AWS Resources
-
Verify Terraform version 0.7.4 or later.
Run command
terraform version
to verify that you are using Terraform version 0.7.4 or later.If necessary install Terraform version 0.7.4 or later.
When you run Terraform commands as described below, a local state file is created (
terraform.tfstate
) that maintains the record of the resources created. For production clusters it is recommended that you store the state remotely. -
Download and unzip the AWS installation files.
Get the installation files from Apcera Support.
Unzip the file contents to a working directory, such as
apcera-aws-mpd
.Copy this directory to a known location, such as
$HOME/apcera-aws-mpd
.These files contain configuration information for deploying a minimum production cluster (MPD) on AWS. As instructed below, you will update portions of
main.tf
,terraform.tfvars
, andcluster.conf.erb
to deploy your cluster. -
Load the Terraform modules.
The
terraform-module
subdirectory includes theapcera/aws
andapcera/aws/ami-copy
modules which define the infrastructure for AWS. Themain.tf
file references these modules using local relative paths.CD to the working directory where you extracted the installation files and run the
terraform get
command. This command caches in the working directory the modules used by this particular Terraform configuration.If you receive an error running
terraform get
, editmain.tf
so that thesource
entry for both modules point to the local path where you have placed the modules. Such as:module "apcera-aws" { source = "Users/user/aws_example/terraform-module/apcera/aws" module "ami-copy" { source = "Users/user/aws_example/terraform-module/apcera/aws/ami-copy/"
Run
terraform get
and verify that the Terraform modules are loaded. -
Edit the
terraform.tfvars
file.Populate the following parameter values:
Parameter Value Description key_name
"SSH public key name that you uploaded to AWS" Enter the EC2 key pair name you specified when you uploaded your public SSH key to AWS. aws_region
"your-preferred-aws-region" Enter the AWS Region, such as "us-west-2". az_primary
"a" Primary subnet availability zone (AZ). You may need to adjust this if the AZ does not support the requested EC2 instance type. az_secondary
"b" Secondary subnet AZ. You may need to adjust this if the AZ does not support the requested EC2 instance type. az_tertiary
"c" Tertiary subnet AZ. You may need to adjust this if the AZ does not support the requested EC2 instance type. access_key
"REDACTED" Enter your AWS IAM access key. secret_key
"REDACTED" Enter your AWS IAM secret key. cluster_name
"your-cluster-name" Enter a unique cluster name using alphanumeric characters. monitoring_db_master_password
"EXAMPLE_PASSWORD" Enter a password for the monitoring DB. rds_postgres_db_master_password
"EXAMPLE_PASSWORD" Enter a password for the component DB. gluster_per_AZ
"0" Leave the default "0" unless you are using Gluster, in which case set it to the desired number of Gluster servers per Availability Zone. NOTE: Each password must be 8 characters or more and cannot have special characters "@", "/", or "".
-
Prevent a singleton from being built (optional).
The
aws_example
Terraform files will build a singleton host that is not required or used. Optionally, you can update the TF files so that this host is not built.In terraform.tfvars, add the following line at the bottom.
singleton-count = "0"
In variables.tf, add the following line at the bottom.
variable "singleton-count" {}
In main.tf, in the
module "apcera-aws"
section, add the following line at the bottom.singleton-count = "${var.singleton-count}"
-
Run the
terraform plan
command.This command displays the changes Terraform will attempt.
Using the default configuration, you should see the result:
Plan: 65 to add, 0 to change, 0 to destroy
. -
Run the
terraform apply
command.Use the
terraform apply
command to apply and run the changes. This command may take some time to complete.NOTE: If you receive an error, review the error message and troubleshoot accordingly. Some errors may only require that you simply run the
terraform apply
again. Note that Terraform does not roll back created resources. If you need to make edits to the Terraform files, repeat theplan
andapply
commands. Runterraform refresh
if you need to update the resource state. -
Verify creation of AWS resources.
When the
terraform apply
command completes successfully, run the following command to display the resources created:terraform output
You should see the output showing all AWS resources that were created by Terraform. You can also use
terraform refresh
. See Terraform commands for a complete list of commands.At this point you can log in to the AWS Console for your account. You should see the resources that were created, including several EC2 instances, volumes, elastic load balancers, and security groups.
Configure Cluster Deployment
Now that the AWS infrastructure is created, the next step is to configure the deployment.
-
Edit the
cluster.conf.erb
file.The
cluster.conf.erb
file is used to generate thecluster.conf
file.Section Description provisioner
Specifies information related to the creation of the machines that will run within the cluster. machines
Defines the various "zones" within the cluster, the machines that belong to the zone, and the roles within the cluster that are allowed to be assigned to those machines. components
Specifies the desired number of each of the component types. Changes here will either find a new place to run components or scale the cluster down if the numbers are decreased. chef
Configures the cluster and base domain names, the ID provider and users, SSL for HTTPS and cluster component monitoring. -
Verify the
provisioner
isgeneric
.The
provisioner
specifies information related to the creation of the machines that will run within the cluster. Thegeneric
provisioner uses IP addresses to identify the infrastructure.provisioner { type: generic }
-
Update the
machines
section if necessary.The machines section defines the various machine types within the cluster, the hosts that belong to that type, and the roles within the cluster that are allowed to be assigned to those machines
Refer to the configuration documentation if you want to change
machines
values.For example, you may want to comment out the entire Gluster block if you are not using Gluster. You may also want to comment out the IP Manager.
machines: { auditlog: { # TERRAFORM OUTPUT: auditlog-addresses <%= capture_or_die('terraform output auditlog-addresses') %> suitable_tags: [ "auditlog-database" ] } central: { # TERRAFORM OUTPUT: central-addresses <%= capture_or_die('terraform output central-addresses') %> suitable_tags: [ "component-database" "api-server" "job-manager" "router" "package-manager" "stagehand" "cluster-monitor" "health-manager" "metrics-manager" "nats-server" "events-server" "auth-server" "basic-auth-server" "google-auth-server" "app-auth-server" "kv-store" "vault" ] } instance_manager: { # TERRAFORM OUTPUT: instance-manager-addresses hosts: [ # TERRAFORM OUTPUT: instance-manager-addresses <%= capture_or_die('terraform output instance-manager-addresses') %>, ] suitable_tags: [ "instance-manager" ] } # Uncomment if using Gluster # gluster: { # TERRAFORM OUTPUT: gluster-addresses # <%= capture_or_die('terraform output gluster-addresses') %> # suitable_tags: [ # "gluster-server" # ] # } metricslogs: { # TERRAFORM OUTPUT: metricslogs-address <%= capture_or_die('terraform output metricslogs-address') %> suitable_tags: [ "graphite-server" "redis-server" ] } # Uncomment if using IP Manager. # ip_manager: { # TERRAFORM OUTPUT: ip-manager-address # <%= capture_or_die('terraform output ip-manager-address') %> # suitable_tags: [ # "ip-manager" # ] # } # TCP Router is on a dedicated host so that it has it own public IP. tcp_router: { # TERRAFORM OUTPUT: tcp-router-address <%= capture_or_die('terraform output tcp-router-address') %> suitable_tags: [ "tcp-router" ] } monitoring: { # TERRAFORM OUTPUT: monitoring-address <%= capture_or_die('terraform output monitoring-address') %> suitable_tags: [ "monitoring" ] } # Default NFS singleon. Comment if using Gluster HA NFS. # Gateway. nfs: { # TERRAFORM OUTPUT: nfs-address <%= capture_or_die('terraform output nfs-address') %> suitable_tags: [ "nfs-server" ] } }
-
Update the
components
counts if necessary.Refer to the configuration documentation if you want to change these values. The components section specifies the desired number of each of the component types. Changes here will either find a new place to run components or scale the cluster down if the numbers are decreased.
components: { monitoring: 1 component-database: 3 api-server: 3 job-manager: 3 router: 3 package-manager: 3 health-manager: 3 metrics-manager: 3 nats-server: 3 events-server: 3 cluster-monitor: 1 auth-server: 3 basic_auth_server: 3 google_auth_server: 3 app-auth-server: 3 kv-store: 3 vault: 3 auditlog-database: 2 # Uncomment if using Gluster. # gluster-server: 3 instance-manager: 3 tcp-router: 1 # ip-manager: 1 graphite-server: 1 redis-server: 1 nfs-server: 1 stagehand: 1 }
-
Specify the cluster_name and base_domain.
In the
chef.continuum
section provide a uniquecluster_name
andbase_domain
for which you have set up a DNS record. For example:chef: { "continuum": { "cluster_name": "example", "base_domain": "example.mycompany.com",
-
Specify the Package Manager S3 endpoint.
Change the
s3_store.endpoint
value (default is "endpoint": "s3.amazonaws.com") to point to the S3 endpoint for your region.For example, if your AWS region is
us-west-1
, thes3_store.endpoint
is as follows:"endpoint": "s3-us-west-1.amazonaws.com",
NOTE: If you are using the
us-east-1
region you do not need to change this. -
Optionally, configure HTTPS.
By default HTTPS is disabled (
chef.continuum.router.https_port.ssl
is disabled.If this is a production cluster, you will need to enable HTTPS by uncommenting this section the adding the SSL certificate chain and key.
This is how the
ssl
entry should be formatted. Note that each closing parenthesis must be on its own line.chef: { "continuum": { "router": { "http_port": 8080, "https_port": 8181, "ssl": { "enable": true, "tlshosts": [ { "server_names": [ "*.example.com" ], "certificate_chain": (-----BEGIN CERTIFICATE----- LONGSTRING -----END CERTIFICATE----- ) "private_key": (-----BEGIN RSA PRIVATE KEY----- LONGSTRING -----END RSA PRIVATE KEY----- ) }, ] # tlshosts } # ssl }, # router }
If you have an existing SSL cert and key, see configuring HTTPS for guidance on adding it cluster.conf.erb.
If necessary you can generate an SSL cert and key.
By default the Terraform module assumes that you are using HTTPS. If you do not want to use HTTPS, after deployment you update the ELB in the AWS console to use HTTP port 8080 as shown below.
Note that the installation instructions explain how to do this, so you don't have to do anything now to disable HTTPS, just be aware that it is the default.
-
Add your public SSH key.
In the
chef.continuum.ssh
section, add your SSH public key.chef: { "continuum": { "ssh": { "custom_keys":[ # Name and contanct in for this key here "ssh-rsa LONGSTRING" ] }, }
-
Configure cluster authentication.
In the
chef.continuum.auth_server
section ofcluster.conf.erb
, configure the identity provider and users.By default a cluster uses Google Device auth which allows the defined gmail user(s) access to the cluster via APC. To access the cluster via the web console, you will need to also include an identity provider.
The following configuration example uses the default Google Device auth and adds Basic Auth.
Configure Google Device auth by adding your gmail address to the
google.users
section, replacing "your-gmail-address@gmail.com" with your actual address. You should also add this email address to theauth_server.admins
section to give this user admin policy.Basic Auth is enabled by adding the "basic" section shown below, and including
NAME@apcera.me
in theadmins
section.If you want to enable [Google Auth](/config/auth-google/, provide the
client_id
,client_secret
, andweb_client_id
for a Google App project.chef: { "continuum": { ... "auth_server": { "identity": { "default_provider": "basic", # Configuration for Google OAuth "google": { "enabled": false "users": [ "your-gmail-address@gmail.com", ], "client_id": "690542023564-abcdefghbqrgpnopqrstuvwxyz.apps.googleusercontent.com" "client_secret": "byS5RFQsKqXXXbbqENhczoD" "web_client_id": "690542023564-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com" }, "basic": { "enabled": true, "users": [ { "name": "admin", "password": "PaSsWoRd!" } ] } }, "admins": [ "your-gmail-address@gmail.com", "admin@apcera.me" ] "apcera_ops": [] }, }, }
Basic Auth is for demonstration and development purposes and is not supported for production clusters.
-
Configure Monitoring.
Enter passwords for the monitoring guest user (
chef.apzabbix.guest.pass
) and admin user (chef.apzabbix.admin.pass
).Enter the cluster name and domain for the
apzabbix.web_hostnames
parameter.See Monitoring Your Cluster for guidance on configuring this section.
Deploy Apcera Platform to AWS
At this point you can now deploy the cluster.
-
Generate the
cluster.conf
file.Run the following Ruby command to generate the
cluster.conf
file:erb cluster.conf.erb > cluster.conf
This command uses the
cluster.conf.erb
file in the cluster directory to generate thecluster.conf
file, which is used to deploy the cluster. If successful this command should exit silently.Verify that the generated
cluster.conf
file is output to your cluster directory. If you encounter an error, run the erb command again. -
SSH to Orchestrator as
root
.First, run the following command to get the Orchestrator IP address:
terraform output orchestrator-public-address
Using the SSH key you configured, SSH to the Orchestrator host.
ssh -A root@52.71.173.49
Type
yes
to confirm the remote connection.You should be connected, indicated by
root@<cluster-name>-orchestrator:~#
. -
Update the Orchestrator OS kernel and orchestrator-cli
Run the following command:
apt-get update && apt-get dist-upgrade
This command will update the Orchestrator host OS and also perform the update on
orchestrator-cli
bringing it up to the latest version. -
Reboot the Orchestrator host.
This can be accomplished by running
reboot
.Run the
uname -r
command to see the current running kernel.Run
orchestrator-cli version
and verify that Orchestrator is updated. -
Copy the SSH key to the
orchestrator
user.cd /etc/ssh/userauth cat root > orchestrator chown orchestrator: orchestrator chmod 600 orchestrator
Delete the ubuntu SSH key.
rm ubuntu
Change permissions on /etc/ssh/userauth
chmod 755 /etc/ssh/userauth/
Use
ls -ld /etc/ssh/userauth/
to verify. -
Disable orchestrator user password.
In the /etc/shadow file, modify the orchestrator user's encrypted password to
*
.grep orchestrator /etc/shadow
The output should show the encrypted password for the orchestrator user. For example:
orchestrator:$6$sf.w91gW$gS1QqmJtCvbx/UE.8yITZlnjOLPN1OYUvv92Fz5Hp3C1Iq08qk3K8cx4svg1q6Lsl5wMlGfFPsvqiS9eBA.N60:16876:0:99999:7:::
Modify the orchestrator entry by replacing the encrypted password with
*
using your preferred text editor. For example:vi /etc/shadow
.Use
grep orchestrator /etc/shadow
to verify that the orchestrator user has*
in the password field. For example:orchestrator:*:16876:0:99999:7:::
Type
exit
to log out the SSH session. -
Log in as
orchestrator
with agent forwarding.Verify that you can log in as the
orchestrator
user (with the password you used for your SSH key):ssh -A orchestrator@52.71.173.49
You should be connected, indicated by
orchestrator@ip-10.0.0.187:~$
.Once verified
exit
the log in. -
Upload
cluster.conf
to Orchestrator.SCP
cluster.conf
to Orchestrator.scp cluster.conf orchestrator@52.71.173.49:
If you see the message, “Are you sure you want to continue connecting (yes/no)?” Type “yes” to proceed.
-
SSH into Orchestrator as the
orchestrator
user.ssh -A orchestrator@34.202.245.235
Type
ls
and verify that cluster.conf is copied to the Orchestrator home directory. -
Initialize Orchestrator.
Required for an initial deployment of a cluster:
orchestrator-cli init
-
Perform a dry run.
orchestrator-cli deploy -c cluster.conf --update-latest-release --dry
Performing a dry run verifies the syntax of
cluster.conf
. If the dry run is successful, agraph.png
file is created. This is sufficient to verify the format of the configuration file.To view the deployment graph,
exit
the ssh session and run the following command to copy it to your local machine for viewing:scp orchestrator@34.202.245.235:graph.png ~/aws_example
-
Deploy the cluster.
Deploy the latest promoted release of the Apcera Platform.
orchestrator-cli deploy -c cluster.conf --update-latest-release
Successful deployment is indicated by the message "Done with cluster updates."
-
Troubleshoot depoloyment errors, if necessary.
If the deployment fails, run the
orchestrator-cli deploy
command again.If deployment still fails, check the latest chef client log for the error(s) and debug as necessary.
Run
ls -ltr
to sequence the log files. The last file shown is the most recent one to check.Run
less chef-client.log
to scroll through the log, or usecat
with/
and a search term; usen
to go to the next occurrence of the term. See also troubleshooting deployments. -
Reboot the cluster.
Because there is a new kernel, full cluster reboot is required.
orchestrator-cli reboot -c cluster.conf
Complete Post Installation Tasks
To verify your deployment, complete the following post-installation tasks.
-
Update DNS records.
DNS records are required for the HTTP routers (via the ELB) and monitoring host using the external (public) IP address for this hosts. You can use nslookup to verify the DNS entries you make.
DNS Entry Description base_domain
CNAME record with the address of the ELB (get this value using terraform output elb-address
).*.base_domain
CNAME record to base_domain
(alias or pointer, such as*.cluster.example.com
)monitoring.cluster-name.domain.tld
A record pointing to the public IP address of the monitoring host (get value using terraform output monitoring-public-address
). Note that this value cannot be underbase_domain
entry and should match what you entered in theapzabbix.webhostnames
section of the cluster.conf.erb file. For example:monitoring.cluster.example.com
.tcp-router
Public IP address of the TCP router (get value using terraform output tcp-router-public-address
). This entry is optional.For example, here is how DNS entries would exist for an example AWS cluster (using AWS Route 53):
-
Verify and bootstrap the deployment.
Log in to the web console and download and install APC.
Target the cluster and log in using APC.
-
Install Apcera packages.
Install Apcera packages that you want for your cluster.