Monday, June 15, 2026
HomeArtificial IntelligenceClasses realized from deploying Cloudera Knowledge Platform for IBM Cloud Pak for...

Classes realized from deploying Cloudera Knowledge Platform for IBM Cloud Pak for Knowledge – IBM Developer

[ad_1]

Right here on this final weblog put up in our sequence, we deal with classes realized from putting in, sustaining, and verifying the connectivity of Cloudera Knowledge Platform and IBM Cloud Pak for Knowledge. In the event you haven’t learn the primary two posts — A technical deep-dive on integrating Cloudera Knowledge Platform and IBM Cloud Pak for Knowledge and Putting in Cloudera’s CDP Personal Cloud Base on IBM Cloud with Ansible, then I’d invite you to return and skim them for extra context.

On this installment, we’d prefer to share some helpful ideas and methods and educate you how you can keep away from frequent errors by first-time installers

Lesson 1: Use a bastion host

Our Cloudera cluster had a complete of 8 VMs (3 grasp nodes, 3 employee nodes, and a pair of edge nodes). We needed quick access to every node and needed to restrict public community site visitors to the Cloudera cluster as a lot as doable. Fortunately, there’s already a well known answer to this downside: utilizing a bastion host.

We spun up a small VM on the identical subnet as our Cloudera cluster and will then simply talk over non-public community interfaces (10.x.y.z IP addresses). For the set up course of, this alternative supplied the good thing about not dropping connections for long-running Ansible playbooks.

alt_text
Determine 1. The structure of our Cloudera for Cloud Pak for Knowledge setting

Lesson 2: Use VS Code’s Distant Extension Plug-in

When putting in Cloudera Knowledge Platform with Ansible playbooks you’re probably going to want to alter a number of config choices and values within the playbooks. We’re not towards utilizing Vim, however we opted to make use of the Visible Studio Code Distant Growth Extension Pack. This made looking via the recordsdata, modifying values, and importing and downloading recordsdata a lot simpler.

alt_text
Determine 2. VSCode’s Distant Growth Extension helpful for enhancing recordsdata and working instructions towards our distant machines

Lesson 3: Stick to personal networks

This level could seem apparent, however it’s extra about being constant. Anyplace an IP handle was to be enter, we at all times made positive to make use of the non-public community IP handle. This ensured that any site visitors would keep on the IBM Cloud community and never the general public Web.

Lesson 4: Remove all inbound site visitors besides RDP on the Home windows Energetic Listing server

Here’s a refined lesson which may in any other case be little difficult to pin down. After a number of days of uptime, the well being checks on our Cloudera Knowledge Platform had been indicating that the hosts couldn’t attain our Energetic Listing (AD) server. Certainly we found that our AD had hung. After we would reboot the AD server issues would return to regular for a day or so after which it might repeat.

We seemed over capability and efficiency of the server. After we checked out networking utilization, we observed a excessive degree of site visitors going to and from the system from the Web dealing with interface. After wanting on the server configuration and the site visitors, we had been capable of decide {that a} overwhelming majority was over the LDAP port.

Since our solely use of LDAP is inner, the answer to this downside was to restrict the inbound site visitors to the AD by making a rule that solely allowed site visitors on the RDP protocol, which is used for distant desktop administration. On IBM Cloud, we created a customized safety group allowing inbound TCP on port 3389 for RDP.

Lesson 5: Mount secondary drives to /information/dfs robotically

The storage necessities for putting in Cloudera required us to buy extra drives to associate with our digital machines. These drives needed to be mounted earlier than working any playbooks. We used slightly little bit of bash and SSH to do it in an automatic means. In our case, we selected to mount the drives to /information/dfs:

for i in {1..8}
do
  ssh cid-vm-0$i mkfs.ext4 -m0 -O sparse_super,dir_index,extent,has_journal /dev/xvdc
  ssh cid-vm-0$i mkdir -p /information/dfs
  ssh cid-vm-0$i mount /dev/xvdc /information/dfs
  ssh cid-vm-0$i 'echo "/dev/xvdc  /information/dfs   ext4  defaults,noatime 1 2" | tee -a /and so on/fstab'
executed

Lesson 6: Replace OpenShift DNS operator so it is aware of the Cloudera node hostnames

We needed our IBM Cloud Pak for Knowledge occasion which runs on OpenShift have the ability to talk with our newly deployed Cloudera Knowledge Platform cluster. We caught to our “at all times use non-public community interfaces” rule, however that resulted in 404s since OpenShift didn’t know how you can resolve these hostnames. To get round this, we would have liked to edit the DNS operator on our OpenShift occasion. It’s documented within the OpenShift DNS Documentation, however for brevity, we’ve added what labored for us.

Edit the dns operator default CR: oc edit dns.operator/default replace by including to the spec part:

spec:
  servers:
  - forwardPlugin:
      upstreams:
      - <your non-public ip>
      - <your public ip>
    title: cdplab-server
    zones:
    - cdplab.native

Then confirm the configmap for CoreDNS is up to date: oc get configmap/dns-default -n openshift-dns -o yaml

apiVersion: v1
information:
  Corefile: |
    # cdplab-server
    cdplab.native:5353 {
        ahead . <your non-public ip> <your public ip>
    }

Then create a pod and attempt to entry CDP from the pod, and HTML ought to be returned, not a 404 error message.

bash-4.4$ curl -k https://cid-vm-01.cdplab.native:7183/cmf/residence

Lesson 7: Make sure the AD self-signed certificates can be utilized as a certificates authority

This lesson might be broadly utilized to different LDAP and AD eventualities. In our case, we might efficiently connect with the Impala service working on Cloudera via Kerberos, however not via LDAP. After double-checking that our LDAP-specific Impala configuration was appropriate, we had been nonetheless getting a not-so-helpful “Can’t contact LDAP server” error.

We slowly began to peel again the layers of the issue. We managed to isolate the issue to our LDAP configuration, and we realized this was the case as a result of after we ran ldapsearch in an try to bind the person, it gave us the identical error message. Ah-ha! Impala was utilizing an OpenLDAP library beneath the covers.

$ ldapsearch -H ldaps://cid-adc.cdplab.native:636 -D "stevemar@CDPLAB.LOCAL" -b "dc=cdplab,dc=native" '(uid=stevemar)' -W
Enter LDAP Password:
ldap_sasl_bind(SIMPLE): Cannot contact LDAP server (-1)

After double-checking that the Home windows firewall wasn’t the perpetrator, we narrowed down the issue to a lacking bit of data within the self-signed certificates we had created for the AD. We wanted so as to add the -TextExtension "2.5.29.19={textual content}CA=true" flag for the Home windows New-SelfSignedCertificate command. Our new command seemed like (earlier than it was lacking the final parameter):

New-SelfSignedCertificate -Topic *.$dnsName `
  -NotAfter $lifetime.AddDays(365) -KeyUsage DigitalSignature, KeyEncipherment `
  -Kind SSLServerAuthentication -DnsName *.$dnsName, $dnsName `
  -TextExtension "2.5.29.19={textual content}CA=true"

There’s no actual single piece of recommendation right here, apart from if you happen to’re going to make use of Kerberos to safe your Cloudera cluster, get acquainted with Kerberos ideas, like keytabs, and instruments like ktutil and ktpass.

Abstract and subsequent steps

We hope you loved studying about a number of the pitfalls we encountered and keep in mind a number of the ideas we shared the subsequent time you’re deploying an information and AI platform. You’ll be able to study extra in regards to the Cloudera Knowledge Platform for IBM Cloud Pak for Knowledge joint providing.

[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments