[ad_1]
This put up was written in collaboration with Abdulsalam Alshallah (Salam), Software program Architect, and Hans Roessler, Principal Software program Engineer at SEEK Asia.

SEEK is a market chief in on-line employment marketplaces with deep and wealthy insights into the way forward for work. As a worldwide enterprise, SEEK has a presence in Australia, New Zealand, Hong Kong, Southeast Asia, Brazil and Mexico and its web sites entice over 400 million visits per 12 months. SEEK Asia’s enterprise operates throughout seven nations and contains main portal manufacturers resembling jobsdb.com and jobstreet.com and leverages knowledge and know-how to create progressive options for candidates and hirers.
On this put up, we share how SEEK Asia modernized their search-based system with a steady integration and steady supply (CI/CD) pipeline and Amazon OpenSearch Service (successor to Amazon Elasticsearch Service).
Challenges related to a self-managed search system
SEEK Asia supplies a search-based system that allows employers to handle interactions between hirers and candidates. Though the system was already on AWS, it was a self-managed system operating on Amazon Elastic Compute Cloud (Amazon EC2) with restricted automation.
The self-managed system posed a number of challenges:
- Slower launch cycles – Deploying new configurations or new discipline mappings into the Elasticsearch cluster was a high-risk exercise as a result of adjustments affected the soundness of the system. The little automation on each the self-managed cluster and workflows led to slower launch cycles.
- Increased operational overhead – Sizing the cluster to ship better efficiency, whereas managing affordably, was the opposite problem. As with each different distributed system, even with sizing steering, figuring out the suitable variety of shards per node and the variety of nodes to fulfill efficiency necessities nonetheless required some quantity of trial and error, turning the train right into a tedious and time-consuming exercise. This consequently additionally led to slower launch cycles. To beat this problem, in lots of events, oversizing the cluster turned the quickest technique to obtain the specified time to market, on the expense of value.
Additional challenges the crew confronted with self-managing their very own Elasticsearch cluster included maintaining with new safety patches, and minor and main platform upgrades.
Automating search supply with Amazon OpenSearch Service
SEEK Asia knew that automation would the important thing to fixing the challenges of their present search service. Automating the undifferentiated heavy lifting would allow them to ship extra worth to their clients rapidly and enhance workers productiveness.
With the issues outlined, the crew got down to resolve the challenges by automating the next:
- Search infrastructure deployment
- Search A/B testing infrastructure deployment
- Redeployment of search infrastructure for any new infrastructure configuration (resembling safety patches or platform upgrades) and index mapping updates
The important thing companies enabling the automation can be Amazon OpenSearch Service and establishing a search infrastructure CI/CD pipeline.
Structure overview
The next diagram illustrates the structure of the SEEK infrastructure and CI/CD pipeline with Amazon OpenSearch Service.

The workflow contains the next steps:
- Earlier than the workflow kicks off, an present Amazon OpenSearch Service cluster with a reside feeder hydrates it. The reside feeder is a serverless software constructed on Amazon Easy Queue Service (Amazon SQS) through Amazon Easy Notification Service (Amazon SNS) and AWS Lambda. Amazon SQS queues paperwork for processing, Amazon SNS allows knowledge fanout (if required), and a Lambda operate is invoked to course of messages within the SQS queue to import knowledge into Amazon OpenSearch Service. The feeder receives reside updates for adjustments that must be mirrored on the cluster. Write concurrency to Amazon OpenSearch Service is managed by limiting the variety of concurrent Lambda operate invocations.
- The Amazon OpenSearch Service index mapping is model managed in SEEK’s Git repository. Every time an replace to the index mapping is dedicated, the CI/CD pipeline kicks off a brand new Amazon OpenSearch Service cluster provisioning workflow.
- As a part of the workflow, a brand new knowledge hydration initialization feeder is deployed. The initialization feeder assemble is much like the reside feeder, with one further part: a script that runs throughout the CI/CD pipeline to calculate the variety of batches required to hydrate the newly provisioned Amazon OpenSearch Service cluster as much as a selected timestamp. The feeder techniques have been designed to attain idempotency processing. This meant distinctive identifiers (UIDs) from the supply knowledge shops are reused for every doc, and duplicated paperwork replace an present doc with the very same values.
- Concurrently Step 3, an Amazon OpenSearch Service cluster is deployed. To speed up the preliminary knowledge hydration course of briefly, the brand new cluster could also be sized two or thrice bigger in opposition to sizing steering with shard replicas and index refresh interval disabled till the hydration course of is full. The prevailing Amazon OpenSearch Service cluster stays as is, which implies that two clusters are operating concurrently.
- The script inspects the variety of paperwork the supply knowledge retailer has and teams the paperwork by batch sizes. SEEK recognized that 1,000 paperwork per batch supplied the optimum ingestion import time, after operating quite a few experiments.
- Every batch is represented as one message and is queued into Amazon SQS through Amazon SNS. Each message that lands in Amazon SQS invokes a Lambda operate. The Lambda operate queries a separate knowledge retailer, builds the doc, and masses it into Amazon OpenSearch Service. The extra messages that go into the queue, the extra capabilities are invoked. To create baselines that allowed for additional indexing optimization, the crew took the next configurations into consideration and reiterated to attain greater ingestion efficiency:
- Reminiscence of the Lambda operate
- Measurement of batch
- Measurement of every doc within the batch
- Measurement of cluster (reminiscence, vCPU, and variety of major shards)
- With the initialization feeder operating, new paperwork are streamed to the cluster till it’s synced with the info supply. Ultimately, the newly provisioned Amazon OpenSearch Service cluster catches up and is in the identical state as the prevailing cluster. The hydration is full when there are not any remaining messages within the SQS queue.
- The initialization feeder is deleted and the Amazon OpenSearch Service cluster is downsized robotically to finish the deployment workflow, with duplicate shards created and the index refresh interval configured.
- Stay search site visitors is routed to the newly provisioned cluster when A/B testing is enabled through the API layer constructed on Utility Load Balancer, Amazon Elastic Container Service (Amazon ECS), and Amazon CloudFront. The API layer decouples the shopper interface from the backend implementation that runs on Amazon OpenSearch Service.
Improved time to market and different outcomes
With Amazon OpenSearch Service, SEEK was capable of automate a whole cluster, full with Kibana, in a safe, managed atmosphere. If testing didn’t produce the specified outcomes, the crew might change the scale of the cluster horizontally or vertically utilizing completely different occasion choices inside minutes. This enabled them to carry out stress exams rapidly to establish the candy spot between efficiency and price of the workload.
“By integrating Amazon OpenSearch Service with our present CI/CD instruments, we’re capable of totally automate our search operate deployments, which accelerated software program supply time,” says Abdulsalam Alshallah, APAC Software program Architect. “The newly discovered confidence within the fashionable stack, alongside improved engineering practices, allowed us to mitigate the chance of adjustments—enhancing our time to market by 89% with zero impression to uptime.”
With the adoption of Amazon OpenSearch Service, different groups additionally noticed enhancements, together with the next:
- Widespread Vulnerability and Publicity (CVE) has dropped to zero with Amazon OpenSearch Service dealing with the underlying {hardware} safety updates on SEEK’s behalf, enhancing their safety posture
- Improved availability with the Amazon OpenSearch Service Availability Zone consciousness function
Conclusion
Amazon OpenSearch Service managed capabilities has helped SEEK Asia to enhance buyer expertise with pace and automation. By eradicating the undifferentiated heavy lifting, groups can deploy adjustments rapidly to their engines like google, permitting clients to get the newest search options quicker and in the end contributing to the SEEK goal of serving to individuals reside extra productive working lives and organisations succeed.
To study extra about Amazon OpenSearch Service, see Amazon OpenSearch Service options, the Developer Information, or Introducing OpenSearch.
In regards to the Authors
Fabian Tan is a Principal Options Architect at Amazon Net Companies. He has a robust ardour for software program growth, databases, knowledge analytics and machine studying. He works intently with the Malaysian developer neighborhood to assist them deliver their concepts to life.
Hans Roessler is a Principal Software program Architect at SEEKAsia. He’s enthusiastic about new applied sciences and upgrading legacy to newer stacks. At all times staying in contact with the newest applied sciences is one in every of his passions.
Abdulsalam Alshallah (Salam) is a Software program architect at SEEK, Beforehand a Lead Cloud Architect for SEEKAsia, Salam has at all times been enthusiastic about new applied sciences, Cloud, Serverless & DevOps, along with his ardour of eliminating wasted time/effort & sources; He’s additionally one of many leaders of AWS Consumer Group Malaysia.
[ad_2]
