{"id":1619,"date":"2020-10-23T00:31:34","date_gmt":"2020-10-22T18:31:34","guid":{"rendered":"http:\/\/techsatwork.com\/blog\/?p=1619"},"modified":"2022-08-22T09:07:43","modified_gmt":"2022-08-22T03:07:43","slug":"creating-an-emr-cluster-using-cli","status":"publish","type":"post","link":"https:\/\/techsatwork.com\/?p=1619","title":{"rendered":"Creating an EMR cluster using CLI"},"content":{"rendered":"\n<p>I normally start a cluster from the UI and decided I post how to create a cluster from the CLI.  This is assuming the AWS CLI is installed and configured on your machine. <\/p>\n\n\n\n<p>The command you want to use is aws emr create-cluster.  You will have to figure out what release of emr do you want to create, what all applications you need to include , the instance type you want to use, the number of instances, if you are using the fleet (which I am here), etc.  Once you have chosen all that , open a notepad and create the command. Mine looks like this :<br><\/p>\n\n\n\n<p><em>aws emr create-cluster \\<br> &#8211;applications Name=Hadoop Name=Hive Name=Tez \\<br> &#8211;tags &#8216;Project=Covid-19 Analysis&#8217; &#8216;region=us&#8217; &#8216;Contact=Raju Pillai&#8217; &#8216;Name=Covid-19 Analysis&#8217; \\<br> &#8211;ec2-attributes &#8216;{<br>    &#8220;KeyName&#8221;:&#8221;xxxx&#8221;,<br>    &#8220;InstanceProfile&#8221;:&#8221;EMR_EC2_DefaultRole&#8221;,<br>    &#8220;SubnetId&#8221;:&#8221;subnet-XXX&#8221;,<br>    &#8220;EmrManagedSlaveSecurityGroup&#8221;:&#8221;sg-xxx&#8221;,<br>    &#8220;EmrManagedMasterSecurityGroup&#8221;:&#8221;sg-yyyy&#8221;<br> }&#8217; \\<br> &#8211;release-label emr-6.1.0 \\<br> &#8211;log-uri &#8216;s3n:\/\/raju-datalake-emr\/logs\/&#8217; \\<br> &#8211;configurations &#8216;[<br>    {<br>       &#8220;Classification&#8221;:&#8221;emrfs-site&#8221;,<br>       &#8220;Properties&#8221;:{<br>          &#8220;fs.s3.consistent.retryPeriodSeconds&#8221;:&#8221;10&#8243;,<br>          &#8220;fs.s3.consistent&#8221;:&#8221;true&#8221;,<br>          &#8220;fs.s3.consistent.retryCount&#8221;:&#8221;5&#8243;,<br>          &#8220;fs.s3.consistent.metadata.tableName&#8221;:&#8221;EmrFSMetadata&#8221;<br>       }<br>    },<br>    {<br>       &#8220;Classification&#8221;:&#8221;hive-site&#8221;,<br>       &#8220;Properties&#8221;:{<br>          &#8220;hive.metastore.client.factory.class&#8221;:&#8221;com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory&#8221;<br>       }<br>    }<br> ]&#8217; \\<br> &#8211;instance-fleets &#8216;[<br>    {<br>       &#8220;InstanceFleetType&#8221;:&#8221;MASTER&#8221;,<br>       &#8220;TargetOnDemandCapacity&#8221;:1,<br>       &#8220;TargetSpotCapacity&#8221;:0,<br>       &#8220;InstanceTypeConfigs&#8221;:[<br>          {<br>             &#8220;WeightedCapacity&#8221;:1,<br>             &#8220;BidPriceAsPercentageOfOnDemandPrice&#8221;:100,<br>             &#8220;InstanceType&#8221;:&#8221;m5d.xlarge&#8221;<br>          }<br>       ],<br>       &#8220;Name&#8221;:&#8221;Master &#8211; 1&#8243;<br>    },<br>    {<br>       &#8220;InstanceFleetType&#8221;:&#8221;CORE&#8221;,<br>       &#8220;TargetOnDemandCapacity&#8221;:2,<br>       &#8220;TargetSpotCapacity&#8221;:2,<br>       &#8220;LaunchSpecifications&#8221;:{<br>          &#8220;SpotSpecification&#8221;:{<br>             &#8220;TimeoutDurationMinutes&#8221;:10,<br>             &#8220;TimeoutAction&#8221;:&#8221;SWITCH_TO_ON_DEMAND&#8221;<br>          }<br>       },<br>       &#8220;InstanceTypeConfigs&#8221;:[<br>          {<br>             &#8220;WeightedCapacity&#8221;:4,<br>             &#8220;BidPriceAsPercentageOfOnDemandPrice&#8221;:100,<br>             &#8220;InstanceType&#8221;:&#8221;m5d.xlarge&#8221;<br>          }<br>       ],<br>       &#8220;Name&#8221;:&#8221;Core &#8211; 2&#8243;<br>    },<br>    {<br>       &#8220;InstanceFleetType&#8221;:&#8221;TASK&#8221;,<br>       &#8220;TargetOnDemandCapacity&#8221;:1,<br>       &#8220;TargetSpotCapacity&#8221;:3,<br>       &#8220;LaunchSpecifications&#8221;:{<br>          &#8220;SpotSpecification&#8221;:{<br>             &#8220;TimeoutDurationMinutes&#8221;:30,<br>             &#8220;TimeoutAction&#8221;:&#8221;TERMINATE_CLUSTER&#8221;<br>          }<br>       },<br>       &#8220;InstanceTypeConfigs&#8221;:[<br>          {<br>             &#8220;WeightedCapacity&#8221;:4,<br>             &#8220;BidPriceAsPercentageOfOnDemandPrice&#8221;:100,<br>             &#8220;InstanceType&#8221;:&#8221;m5d.xlarge&#8221;<br>          },<br>          {<br>             &#8220;WeightedCapacity&#8221;:4,<br>             &#8220;EbsConfiguration&#8221;:{<br>                &#8220;EbsBlockDeviceConfigs&#8221;:[<br>                   {<br>                      &#8220;VolumeSpecification&#8221;:{<br>                         &#8220;SizeInGB&#8221;:32,<br>                         &#8220;VolumeType&#8221;:&#8221;gp2&#8243;<br>                      },<br>                      &#8220;VolumesPerInstance&#8221;:2<br>                   }<br>                ]<br>             },<br>             &#8220;BidPriceAsPercentageOfOnDemandPrice&#8221;:100,<br>             &#8220;InstanceType&#8221;:&#8221;m5.xlarge&#8221;<br>          }<br>       ],<br>       &#8220;Name&#8221;:&#8221;Task &#8211; 3&#8243;<br>    }<br> ]&#8217; \\<br> &#8211;bootstrap-actions &#8216;[<br>    {<br>       &#8220;Path&#8221;:&#8221;s3:\/\/raju-datalake-emr\/scripts\/dev\/bootstrap_scripts\/Covid_Sync_emrfs.sh&#8221;,<br>       &#8220;Name&#8221;:&#8221;Copy Scripts&#8221;<br>    }<br> ]&#8217; \\<br> &#8211;ebs-root-volume-size 50 \\<br> &#8211;service-role EMR_DefaultRole \\<br> &#8211;enable-debugging \\<br> &#8211;name &#8216;Covid-19-EMR6.1-AutoScale&#8217; \\<br> &#8211;scale-down-behavior TERMINATE_AT_TASK_COMPLETION \\<br> &#8211;region us-east-1<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I normally start a cluster from the UI and decided I post how to create a cluster from the CLI. This is assuming the AWS CLI is installed and configured on your machine. The command you want to use is aws emr create-cluster. You will have to figure out what release of emr do you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","site-transparent-header":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[961,957,917,925,358,962],"tags":[686,918,1048],"class_list":["post-1619","post","type-post","status-publish","format-standard","hentry","category-aws","category-big-data","category-bigdata","category-hadoop","category-how-to","category-s3","tag-aws","tag-big-data","tag-emr"],"_links":{"self":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1619","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1619"}],"version-history":[{"count":1,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1619\/revisions"}],"predecessor-version":[{"id":1620,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1619\/revisions\/1620"}],"wp:attachment":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1619"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1619"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1619"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}