Tuesday, June 30, 2026
HomeCloud ComputingFeathr: LinkedIn’s function retailer is now accessible on Azure | Azure Weblog...

Feathr: LinkedIn’s function retailer is now accessible on Azure | Azure Weblog and Updates

[ad_1]

This weblog publish is co-authored by David Stein, Senior Workers Software program Engineer, Jinghui Mo, Workers Software program Engineer, and Hangfei Lin, Workers Software program Engineer, all from Feathr staff.

Characteristic retailer motivation

With the advance of AI and machine studying, corporations begin to use advanced machine studying pipelines in varied functions, comparable to advice programs, fraud detection, and extra. These advanced programs normally require lots of to 1000’s of options to help time-sensitive enterprise functions, and the function pipelines are maintained by completely different staff members throughout varied enterprise teams.

In these machine studying programs, we see many issues that eat numerous vitality of machine studying engineers and information scientists, particularly duplicated function engineering, online-offline skew, and have serving with low latency.

Figure 1: Illustration on problems that feature store solves

Determine 1: Illustration on issues that function retailer solves.

Duplicated function engineering

  • In a corporation, 1000’s of options are buried in several scripts and in several codecs; they don’t seem to be captured, organized, or preserved, and thus can’t be reused and leveraged by groups apart from those that generated them.
  • As a result of function engineering is so necessary for machine studying fashions and options can’t be shared, information scientists should duplicate their function engineering efforts throughout groups.

On-line-offline skew

  • For options, offline coaching and on-line inference normally require completely different information serving pipelines—making certain constant options throughout completely different environments is pricey.
  • Groups are deterred from utilizing real-time information for inference because of the problem of serving the proper information.
  • Offering a handy approach to make sure information point-in-time correctness is essential to keep away from label leakage.

Serving options with low latency

  • For real-time functions, getting function lookups from database for real-time inference with out compromising response latency and with excessive throughput will be difficult.
  • Simply accessing options with very low latency is essential in lots of machine studying eventualities, and optimizations must be accomplished to mix completely different REST API calls to options.

To unravel these issues, an idea referred to as function retailer was developed, in order that:

  • Options are centralized in a corporation and will be reused
  • Options will be served in a synchronous approach between offline and on-line surroundings
  • Options will be served in real-time with low latency

Introducing Feathr, a battle-tested function retailer

Growing a function retailer from scratch takes time, and it takes way more time to make it secure, scalable, and user-friendly. Feathr is the function retailer that has been utilized in manufacturing and battle-tested in LinkedIn for over 6 years, serving all of the LinkedIn machine studying function platform with 1000’s of options in manufacturing.

At Microsoft, the LinkedIn staff and the Azure staff have labored very carefully to open supply Feathr, make it extensible, and construct native integration with Azure. It’s accessible on this GitHub repository and you may learn extra about Feathr on the LinkedIn Engineering Weblog.

A number of the highlights for Feathr embrace:

  • Scalable with built-in optimizations. For instance, based mostly on some inner use case, Feathr can course of billions of rows and PB scale information with built-in optimizations comparable to bloom filters and salted joins.
  • Wealthy help for point-in-time joins and aggregations: Feathr has excessive performant built-in operators designed for Characteristic Retailer, together with time-based aggregation, sliding window joins, look-up options, all with point-in-time correctness.
  • Extremely customizable user-defined features (UDFs) with native PySpark and Spark SQL help to decrease the educational curve for information scientists.
  • Pythonic APIs to entry all the pieces with low studying curve; Built-in with mannequin constructing so information scientists will be productive from day one.
  • Wealthy sort system together with help for embeddings for superior machine studying/deep studying eventualities. One of many widespread use instances is to construct embeddings for buyer profiles, and people embeddings will be reused throughout a corporation in all of the machine studying functions.
  • Native cloud integration with simplified and scalable structure, which is illustrated within the subsequent part.
  • Characteristic sharing and reuse made simple: Feathr has built-in function registry in order that options will be simply shared throughout completely different groups and increase staff productiveness.

Feathr on Azure structure

The high-level structure diagram under articulates how would a person interacts with Feathr on Azure:

Feathr on Azure architecture.

Determine 2: Feathr on Azure structure.

  1. An information or machine studying engineer creates options utilizing their most popular instruments (like pandas, Azure Machine Studying, Azure Databricks, and extra). These options are ingested into offline shops, which will be both:

    • Azure SQL Database (together with serverless), Azure Synapse Devoted SQL Pool (previously SQL DW).
    • Object storage, comparable to Azure BLOB storage, Azure Knowledge Lake Retailer, and extra. The format will be Parquet, Avro, or Delta Lake.

  2. The information or machine studying engineer can persist the function definitions right into a central registry, which is constructed with Azure Purview.
  3. The information or machine studying engineer can be part of on all of the function dataset in a point-in-time right approach, with Feathr Python SDK and with Spark engines comparable to Azure Synapse or Databricks.
  4. The information or machine studying engineer can materialize options into a web based retailer comparable to Azure Cache for Redis with Lively-Lively, enabling multi-primary, multi-write structure that ensures eventual consistency between clusters.
  5. Knowledge scientists or machine studying engineers eat offline options with their favourite machine studying libraries, for instance scikit-learn, PyTorch, or TensorFlow to coach a mannequin of their favourite machine studying platform comparable to Azure Machine Studying, then deploy the fashions of their favourite surroundings with companies comparable to Azure Machine Studying endpoint.
  6. The backend system makes a request to the deployed mannequin, which makes a request to the Azure Cache for Redis to get the web options with Feathr Python SDK.

A pattern pocket book containing all of the above circulation is positioned within the Feathr repository for extra reference.

Feathr has native integration with Azure and different cloud companies. The desk under exhibits these integrations:











Feathr element

Cloud Integrations

Offline retailer – Object Retailer

Azure Blob Storage

Azure ADLS Gen2

AWS S3


 

Offline retailer – SQL

Azure SQL DB

Azure Synapse Devoted SQL Swimming pools (previously SQL DW)

Azure SQL in VM

Snowflake

On-line retailer

Azure Cache for Redis

Characteristic Registry

Azure Purview

Compute Engine

Azure Synapse Spark Swimming pools

Databricks

Machine Studying Platform

Azure Machine Studying

Jupyter Pocket book

File Format

Parquet

ORC

Avro

Delta Lake

Desk 1: Feathr on Azure Integration with Azure Providers.

Set up and getting began

Feathr has a pythonic interface to entry all Feathr parts, together with function definition and cloud interactions, and is open sourced right here. The Feathr python shopper will be simply put in with pip:

pip set up -U feathr

For extra particulars on getting began, please check with the Feathr Quickstart Information. The Feathr staff may also be reached within the Feathr neighborhood.

Going ahead

On this weblog, we’ve launched a battle-tested function retailer, referred to as Feathr, which is scalable and enterprise prepared, with native Azure integrations. We’re devoted to bringing extra functionalities into Feathr and Feathr on Azure integrations, and be happy to provide any suggestions by elevating points in Feathr GitHub repository.

[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments