Friday, December 8, 2023
HomeBig DataEasy methods to Automate Apache NiFi Information Movement Deployments within the Public...

Easy methods to Automate Apache NiFi Information Movement Deployments within the Public Cloud

With the newest launch of Cloudera DataFlow for the Public Cloud (CDF-PC) we added new CLI capabilities that permit you to automate knowledge movement deployments, making it simpler than ever earlier than to include Apache NiFi movement deployments into your CI/CD pipelines. This weblog submit walks you thru the info movement growth lifecycle and the way you should utilize APIs in CDP Public Cloud to completely automate your movement deployments.

Understanding the info movement growth lifecycle

Like another software program utility, NiFi knowledge flows undergo a growth, testing and manufacturing part. Whereas key NiFi options like visible movement design and interactive knowledge exploration are entrance and middle in the course of the growth part, operational options like useful resource administration, auto-scaling and efficiency monitoring change into essential as soon as a knowledge movement has been deployed in manufacturing and enterprise features depend upon it. 

CDF-PC, the primary cloud-native runtime for Apache NiFi knowledge flows, is concentrated on operationalizing NiFi knowledge flows in manufacturing by offering useful resource isolation, auto-scaling and detailed KPI monitoring for movement deployments.

On the similar time, Movement Administration for Cloudera Information Hub supplies a conventional NiFi expertise targeted on visible movement design and interactive knowledge exploration. Collectively, Movement Administration for Information Hub and Cloudera DataFlow for the Public Cloud present all of the capabilities it’s worthwhile to assist the complete knowledge movement growth lifecycle from growth to manufacturing. 

Dev Deployment

Determine 1: Develop your knowledge flows utilizing Movement Administration for Information Hub and operationalize them utilizing Cloudera DataFlow for the Public Cloud (CDF-PC)

Growing knowledge flows with model management

As Determine 1 reveals, Movement Administration for Information Hub supplies a perfect setting that permits builders to shortly iterate on their knowledge flows till they’re able to be deployed in manufacturing. Each Movement Administration cluster comes preinstalled with NiFi Registry making it straightforward for builders to model management their knowledge flows. 

Word: Whereas model controlling knowledge flows is just not required for manually exporting knowledge flows from the NiFi canvas, it’s a prerequisite for automating knowledge movement export utilizing the NiFi Registry API.

To start out model controlling a knowledge movement, merely proper click on the method group you wish to model, choose Model and Begin model management

Start Versioning

Determine 2: Beginning model management shops course of teams within the NiFi Registry and makes them accessible by way of the NiFi Registry API

Within the subsequent window, use the Bucket choice to affiliate your knowledge movement with a selected venture or staff and specify a Movement Title. Optionally you can even present a Movement Description and Model Feedback.

Save Flow Version

Determine 3: Whenever you begin model management you’ll be able to choose a Bucket and supply a reputation in your movement definition

As soon as your knowledge movement model has been saved to the NiFi Registry, you’ll discover a inexperienced tick showing in your NiFi course of group indicating that the method group is present and represents the most recent model which is saved within the NiFi Registry.

Flow Definition

Determine 4: The inexperienced tick signifies that this course of group is utilizing the most recent model of the movement definition

Altering your knowledge movement logic within the NiFi canvas introduces native modifications that aren’t but synchronized to the NiFi Registry. Proper click on on the method group, choose Model and Commit native modifications to create a brand new model that features your latest modifications.

Local Changes

Determine 5: A gray star signifies that native modifications must be dedicated to the NiFi Registry leading to a brand new model of the info movement

Word: In case you are planning to export your knowledge flows from the event setting utilizing the NiFi Registry API, make it possible for any native modifications you wish to embody have been dedicated again to the Registry.

Now that you’re aware of versioning your knowledge flows in your growth setting, let’s have a look at how one can export these variations and deploy them utilizing CDF-PC.

Exporting knowledge flows from Movement Administration for Information Hub

Apache NiFi 1.11 launched a brand new Obtain movement definition functionality which exports the info movement logic of a course of group. The export contains any controller providers that exist within the chosen course of group in addition to parameter contexts which were assigned to the chosen course of group. 

Flow Def

Determine 6: Exporting knowledge flows utilizing the “Obtain movement definition” functionality within the NiFi canvas even works if you find yourself not versioning your course of teams

To manually export a movement definition from the NiFI canvas, proper click on the method group you wish to export and choose Obtain movement definition to acquire the movement definition in JSON format. This technique exports the present course of group from NiFi together with any native modifications which could not have been dedicated to the NiFi Registry but. Since this operation doesn’t depend on the NiFi Registry, you’ll be able to obtain the movement definitions with out versioning your knowledge flows.

Exporting knowledge flows utilizing the NiFi Registry API

Downloading movement definitions proper from the NiFi canvas is simple but it surely requires a guide motion. One solution to automate this course of is to instantly use the NiFi Registry API which lets you programmatically export any model of your knowledge movement that has been saved within the Registry. 

Word: To make use of the NiFi Registry strategy you need to model your knowledge flows as defined within the earlier part.

In CDP Public Cloud, endpoints just like the NiFi Registry API are protected and uncovered via a central Apache Knox proxy. To acquire the NiFi Registry API endpoint, navigate to your Movement Administration Information Hub cluster and choose the Endpoints tab.

Flow Management End Points

Determine 7: Movement Administration cluster endpoints uncovered via Knox

Copy the NiFi Registry Relaxation URL and use it as the bottom URL to assemble your Relaxation calls. Confer with the Apache NiFi Registry Relaxation API documentation for all accessible API calls. First, you wish to export the most recent model of your knowledge movement from the Registry, subsequently the endpoint it’s worthwhile to use is /buckets/{bucketId}/flows/{flowId}/variations/newest .

After acquiring the Registry Relaxation URL and the API endpoint, it’s worthwhile to receive the bucketID and flowId to assemble the total API path. To do that, navigate to your Movement Administration Information Hub cluster and click on the NiFi Registry icon which logs you into the NiFi Registry UI.

Navigating to Registry

Determine 8: Navigating to the NiFi Registry UI

Within the NiFi Registry UI, discover the movement definition that you simply wish to export by on the lookout for the movement identify that you simply supplied whenever you began versioning your course of group. Broaden the corresponding entry and duplicate the BUCKET IDENTIFIER and the FLOW IDENTIFIER.

Utilizing the NiFi Registry Relaxation URL in addition to the Bucket and Movement identifiers now you can assemble the ultimate URL: site
Nifi Registry Buckets

Determine 9: Acquiring the bucketID and flowId from NiFi Registry

Because the NiFi Registry API is uncovered via a Knox proxy, it’s worthwhile to authenticate your Relaxation API name utilizing a CDP workload person and password. You should use your private CDP workload person or a machine person for this objective so long as the EnvironmentUser position has been assigned to the CDP workload person for the CDP setting which is internet hosting your Movement Administration cluster.

So as to add the EnvironmentUser position, navigate to your CDP setting, choose “Handle Entry” from the Actions menu and assign the EnvironmentUser position to the CDP workload person you wish to use.

User Setup

Determine 10: Assigning the EnvironmentUser position to a CDP workload person

In CDP Public Cloud, entry to versioned NiFi knowledge flows within the NiFi Registry is managed by Apache Ranger. The CDP workload person that you’re planning to make use of to name the NiFi Registry Relaxation API must be allowed entry to the movement definition that you simply wish to export. To permit the nifi-kafka-ingest person entry to the bucket caea6227-2bde-452f-a325-3eac0424868f it’s worthwhile to create a corresponding coverage in Ranger: 

Rangers Setup

Determine 11: This Ranger coverage permits your beforehand created machine person to entry the NiFi Registry bucket which shops the movement definition you wish to export.

Now that you’ve arrange your CDP workload person, ensured that it might entry the movement definition within the Registry, and obtained all the required IDs, you’ll be able to go forward and export your movement definition from the NiFi Registry.

Let’s mix the endpoint URL data you collected earlier with the bucket and movement identifiers and CDP workload person particulars to assemble your last Relaxation API name. The response would be the movement definition in JSON format and you’ll select to reserve it to a file utilizing the redirect operator >

curl -u CDP_WORKLOAD_USER:CDP_WORKLOAD_USER_PASSWORD > /house/youruser/myflowdefinition.json

Word: In case you are working the command on one of many NiFi situations, substitute “gateway” by “management0” to make sure the Registry endpoint may be reached.

Word: On this instance we’re utilizing curl to invoke the Registry Relaxation endpoint. In case you are utilizing Python, try nipyapi, which already supplies Python wrappers for the NiFi and NiFi Registry API endpoints.

Word: To automate exporting knowledge flows even additional you should utilize NiFi Registry Hooks that permit you to execute a script when a sure motion within the Registry is triggered. You can arrange a Registry hook that robotically exports the movement definition and uploads it to the CDF-PC Movement Catalog each time a brand new model is created. 

Exporting knowledge flows utilizing the NiFi CLI

You can too use the NiFi CLI to export movement definitions from the registry. The NiFi CLI is a part of the NiFi toolkit which is put in on any NiFi node in your Movement Administration cluster. 

To make use of the NiFi CLI, set up an SSH reference to any NiFi node and login together with your CDP workload person identify. Begin the NiFi CLI by executing the next command:


Along with the movement identifier, NiFi Registry Relaxation endpoint and CDP workload person credentials, this strategy additionally requires you to explicitly specify a truststore configuration to determine a safe connection. Whereas the truststore location (/hadoopfs/fs4/working-dir/cm-auto-global_truststore.jks

) and the truststore sort (JKS) are the identical on each Movement Administration cluster, the truststore password is exclusive for every cluster and must be obtained from /and so forth/hadoop/conf/ssl-client.xml

With the Registry Relaxation endpoint, CDP workload person credentials, movement identifier and truststore data now you can assemble the total registry export-flow-version command:

registry export-flow-version --baseUrl site/flow-management/cdp-proxy-api/nifi-registry-app/nifi-registry-api --flowIdentifier 45f308ce-9dc2-4ac7-9ff2-153d714b52dd --basicAuthUsername CDP_WORKLOAD_USER --basicAuthPassword CDP_WORKLOAD_USER_PASSWORD --truststore /hadoopfs/fs4/working-dir/cm-auto-global_truststore.jks --truststorePasswd TRUSTSTORE_PASSWORD --truststoreType jks --outputType json --outputFile /house/youruser/myflowdefinition.json 

The command will return the movement definition in json format and write it to the situation specified utilizing –outputFile.

Word: In case you are working the nifi toolkit on one of many NiFi situations, substitute “gateway” by “management0” to make sure the Registry endpoint may be reached.

Importing knowledge flows into CDF for the Public Cloud

Now that you’ve exported the movement definition from the Movement Administration growth setting, it’s worthwhile to import it into CDF-PC’s central Movement Catalog earlier than you’ll be able to create deployments.

A lot of the actions that you may carry out in CDF-PC’s UI will also be automated utilizing the CDP CLI. Earlier than you can begin utilizing the CDP CLI to add your movement definition to the Movement Catalog it’s worthwhile to obtain and configure it accurately.

Word: CDF-PC CLI instructions are at present solely accessible within the CDP Beta CLI. Use these directions to put in and configure the Beta CLI.

After getting arrange the CDP CLI you’ll be able to discover all accessible CDF-PC instructions just by working cdp df.

The command for importing movement definitions into the catalog is df import-flow-definition and requires you to specify the trail to the movement definition you wish to add and supply a reputation for it within the catalog. 

cdp df import-flow-definition --file myflowdefinition.json --name MyFlowDefinition --description “That is my first uploaded Movement Definition” --comments “Model 1”   

You might have now efficiently imported your movement definition and might discover it within the Movement Catalog.

Flow Definition Importing

Determine 12: The movement definition has been imported efficiently to the catalog

If you wish to add new variations of this movement definition, use the import-flow-definition-version command. It requires you to specify the CRN of the present movement definition within the catalog in addition to the brand new movement definition JSON file that you simply wish to add as a brand new model.

To get the movement definition CRN, navigate to the catalog, choose your movement definition and duplicate the CRN. Use the CRN to assemble the ultimate import-flow-definition-version command:

cdp df import-flow-definition-version --file myflowdefinition_v2.json --flow-crn crn:cdp:df:us-west-1:558bc1d2-8867-4357-8524-311d51259233:movement:MyFlowDefinition --comments “Model 2 with fixes for processing knowledge”

After profitable execution, you’ll now see a second model for the movement definition within the catalog.

Flow Def Version Import

Determine 13: A brand new model has been created for the imported movement definition

Deploying knowledge flows with CDF for Public Cloud

After importing your movement definition into the catalog you should utilize the create-deployment command to automate movement deployments. 

To create a movement deployment in CDF-PC, you need to present the movement definition CRN from the Movement Catalog, any parameter values the movement may require, any KPIs you wish to arrange in addition to deployment configurations just like the NiFi node dimension or whether or not the deployment ought to robotically scale up and down. 

The best solution to assemble the total create-deployment command is to stroll via the Deployment Wizard as soon as and use the View CLI Command function within the Evaluation step to generate the corresponding CLI command and the required parameter and KPI recordsdata.

View Clic Command

Determine 14: The Evaluation step within the Deployment Wizard creates parameter and KPI property recordsdata and constructs the ultimate create-deployment command

In case your movement deployment comprises movement parameters and KPIs, obtain the Movement Deployment Parameters JSON and Movement Deployment KPIs JSON recordsdata. These recordsdata outline all parameters and their values in addition to KPIs that you simply outlined within the wizard.

Word: Values for Parameters marked as delicate is not going to be included within the generated parameters file. Replace the parameter worth after downloading the file.

With these two recordsdata downloaded, all you could have left to do is copy the CLI command from the wizard, alter the parameter-groups file and kpis file paths earlier than you’ll be able to hit enter and programmatically create your first movement deployment.

  cdp df create-deployment 

  --service-crn crn:cdp:df:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:service:e7aef078-aa34-44eb-8bb7-79e89a734911 

  --flow-version-crn crn:cdp:df:us-west-1:558bc1d2-8867-4357-8524-311d51259233:movement:MyFlowDefinition/v.2 

  --deployment-name "MyFirstDeployment" 



  --cluster-size-name EXTRA_SMALL 


  --auto-scale-min-nodes 1 

  --auto-scale-max-nodes 3 

  --parameter-groups file://PATH_TO_UPDATE/flow-parameter-groups.json 

  --kpis file://PATH_TO_UPDATE/flow-kpis.json

After issuing the create-deployment command, you’ll be able to navigate to the Dashboard in CDF-PC and watch the deployment course of. As soon as the deployment has been created efficiently you’ll be able to handle it by utilizing each the UI and the CLI.


Automating movement deployments with a single command is a key function of CDF-PC and helps you deal with knowledge movement growth, deployment and monitoring as a substitute of worrying about creating infrastructure and organising advanced CI/CD pipelines. Going ahead we are going to proceed to enhance the CDF-PC CLI capabilities to additional optimize the movement growth lifecycle. Take the CDF-PC Product Tour and be taught extra about CDF-PC within the documentation.




Please enter your comment!
Please enter your name here

Most Popular

Recent Comments