📜 Real-time Embeddings Inference
Text embeddings are condensed and semantically rich representations of textual data that capture the contextual and semantic meaning of words, sentences, or documents.
While real-time text embeddings can boost model performance for applications such as recommendation systems, fraud detection, and content moderation, building a production embeddings inference system can be quite the challenge. With Tecton, you can run text embedding inference in real-time with plain Python and deploy it in production in minutes!
This tutorial assumes some basic familiarity with Tecton. If you are new to Tecton, we recommend first checking out Building a Production AI Application with Tecton which walks through an end-to-end journey of building a real-time ML application with Tecton.
In this tutorial we will:
- Create a custom environment with an embeddings library
- Create an On-Demand Feature View that runs the embedding model
- Retrieve the embedding with the HTTP API
Before you start:
- Install the Tecton SDK with
pip install tecton
. We recommend doing so in a virtual environment. - Run
tecton login [your-org-account-name].tecton.ai
in your CLI. Be sure to fill in your organization's Tecton account name. - Run these commands to create a new Tecton repo:
mkdir tecton-feature-repo
cd tecton-feature-repo
tecton init
🌏 Create a custom environment with an embeddings library
Environments
are isolated compute instances with Python packages where transformations are
run during online feature retrieval. In this tutorial, we will use the
FastEmbed package for embeddings
generation. In the feature repo, let's create a requirements.txt
file with the
following entries:
tecton-runtime==0.0.3
fastembed
urllib3==1.26.6
Next, we'll use the Tecton CLI to create an environment in Tecton by running the following commands:
tecton environment create --name "embeddings-env" --description "Embeddings custom environment" --requirements /path/to/requirements.txt
Once the environment creation command completes, we can check the status of the environment with:
tecton environment get --name "embeddings-env"
Id Name Status Created At
========================================================================================================================
5c83b9014e2e4f1eb68c58eba6bc0796 embeddings-env REMOTE_ENVIRONMENT_STATUS_PENDING 2023-12-21 20:18:55 UTC
You will require admin privileges to create custom environments via the Tecton CLI.
👩💻 Define an On-Demand Feature View that runs the embeddings model
Once we have an environment created, we'll create an On-Demand Feature View that
receives a text query as an input and computes the embedding, and a Feature
Service that serves the embedding online. Create features.py
and copy paste
the following code:
from tecton import on_demand_feature_view, RequestSource, FeatureService
from tecton.types import Field, Float32, String, Array
request_schema = [Field("text", String)]
user_request = RequestSource(schema=request_schema)
output_schema = [Field("embedding", Array(Float32))]
@on_demand_feature_view(
sources=[user_request],
mode="python",
schema=output_schema,
environments=["embeddings-env"],
owner="demo-user@tecton.ai",
)
def user_request_embedding(request):
from fastembed.embedding import FlagEmbedding as Embedding
embedding_model = Embedding(model_name="sentence-transformers/all-MiniLM-L6-v2", max_length=384)
# Embedding.embed method returns a generator, which we cast to a list
request_embedding = list(embedding_model.embed([request["text"]]))[0]
result = {"embedding": request_embedding}
return result
embedding_service = FeatureService(
name="embedding_service",
features=[user_request_embedding],
online_serving_enabled=True,
on_demand_environment="embeddings-env",
)
The next step is to apply our repo to a production workspace.
tecton workspace create embeddings-inference --live
tecton apply
Using workspace "embeddings-inference" on cluster https://[your-org-account-name].tecton.ai
✅ Imported 1 Python module from the feature repository
✅ Imported 1 Python module from the feature repository
⚠️ Running Tests: No tests found.
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Initializing.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
+ Create Transformation
name: user_request_embedding
owner: demo-user@tecton.ai
+ Create On-Demand Feature View
name: user_request_embedding
owner: demo-user@tecton.ai
+ Create Feature Service
name: embedding_service
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Generated plan ID is 04d26a37b2674f8c90479d30e24f6ddb
View your plan in the Web UI: https://[your-org-account-name].tecton.ai/app/embeddings-inference/plan-summary/04d26a37b2674f8c90479d30e24f6ddb
Note: Updates to Feature Services may take up to 60 seconds to be propagated to the real-time feature-serving endpoint.
Are you sure you want to apply this plan to: "embeddings-inference"? [y/N]> y
🎉 Done! Applied changes to 3 objects in workspace "embeddings-inference".
🌐 Generate an embedding in real-time with the HTTP API
Now let's use Tecton's HTTP API to generate and retrieve embeddings at low latency.
To do this, you need to create an API Key so you can securely access the FeatureService's HTTP API.
Follow these commands in your terminal:
tecton service-account create --name "[your-name]-embeddings" --description "Embeddings service account"
tecton access-control assign-role -r consumer -w embeddings-inference -s [service account id from last command]
Use the API key from the first command in the curl command below to retrieve embeddings online!
curl -X POST https://[your-org-account-name].tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key [your-api-key]" -d\
'{
"params": {
"feature_service_name": "embedding_service",
"request_context_map": {
"text": "A string we want to embed with Tecton in real-time"
},
"workspace_name": "embeddings-inference"
}
}'
When you first create an environment, it can take a bit to warm up. If your first request times out, try again in about 30 seconds.
The above request would return:
{"result":{"features":[[-0.03619580343365669,-0.0071067302487790585,-0.030815867707133293,-0.024253128096461296,-0.007858719676733017,0.009995675645768642,-0.019014321267604828,0.047876518219709396,0.017867758870124817,0.007437283638864756,0.02801748551428318,-0.02503320202231407,-0.03840475156903267,-0.017582708969712257,-0.05487662926316261,-0.04472479224205017,-0.006559576839208603,-0.02741415984928608,0.017196696251630783,-0.05212639644742012,-0.023486996069550514,0.01844124123454094,-0.005200718995183706,0.02668837457895279,0.013857265003025532,0.017996253445744514,-0.027098998427391052,0.05101435258984566,0.022155914455652237,-0.054076291620731354,-0.008761831559240818,-0.02586090937256813,-0.03149819001555443,-0.04346754029393196,-0.027587616816163063,0.12463308125734329,-0.04734521731734276,-0.0005328112165443599,-0.05148272588849068,-0.015557595528662205,-0.007963898591697216,-0.05398021265864372,0.046715762466192245,0.03416147083044052,0.014055794104933739,0.001770290662534535,0.008141646161675453,0.034209754317998886,-0.011343367397785187,0.006657424382865429,-0.017086515203118324,-0.025193339213728905,0.01173311099410057,0.033875927329063416,-0.04868960380554199,0.06343544274568558,-0.009835761040449142,0.06333739310503006,0.017540808767080307,0.0180942565202713,0.01467122696340084,0.0022247221786528826,-0.011574748903512955,-0.007652068044990301,0.012417946942150593,0.03463456407189369,-0.01510446798056364,0.03983590379357338,0.01825852319598198,0.021327942609786987,0.03268776461482048,0.07479500025510788,-0.021546894684433937,0.013749735429883003,0.0004799810703843832,0.0009950066450983286,0.025383565574884415,-0.08586027473211288,0.004758354742079973,0.011218101717531681,-0.0018267424311488867,-0.016444386914372444,-0.0091475835070014,-0.028570378199219704,0.012991462834179401,-0.01637505181133747,-0.03157641738653183,0.02127731591463089,-0.0012278505600988865,0.027464743703603745,0.041131239384412766,0.02200375311076641,0.020240793004631996,0.018411744385957718,-0.03697337210178375,0.02120233327150345,-0.06779686361551285,0.01729404926300049,-0.027447890490293503,0.7340685725212097,-0.060156818479299545,0.012967723421752453,-0.05963454023003578,0.01756754145026207,0.013456746004521847,0.012929880060255527,0.01768234372138977,0.033665336668491364,0.007068166509270668,0.003847129177302122,0.007172753103077412,-0.008322104811668396,0.023624787107110023,-0.04027644172310829,0.0029374684672802687,-0.003055715933442116,-0.02890702709555626,-0.01022906880825758,0.022764012217521667,0.02192525751888752,0.010177436284720898,0.0008218589355237782,-0.06531564891338348,-0.01851942203938961,0.021488381549715996,-0.15166935324668884,0.0588265024125576,-5.096658447785364e-33,0.041328515857458115,0.04548569768667221,-0.03153710439801216,0.013483569957315922,0.061332833021879196,0.0249782782047987,-0.015666352584958076,0.012034065090119839,-0.032899159938097,0.019466709345579147,-0.00887045543640852,-0.007793949451297522,-0.02033240534365177,-0.0010206301230937243,0.011305486783385277,-0.07498350739479065,0.012262660078704357,0.0065218666568398476,-0.03770469129085541,0.02958555892109871,-0.003148639341816306,0.03687252476811409,0.01714613288640976,-0.03144218772649765,-0.04812529683113098,0.027248580008745193,-0.019963324069976807,-0.01782120205461979,-0.016354558989405632,-0.02393381856381893,-0.035261936485767365,-0.04037245735526085,-0.03398047015070915,-0.035330213606357574,0.0789356678724289,-0.05606032907962799,-0.020962437614798546,0.0007033448200672865,-0.030370408669114113,0.012592645362019539,0.007866479456424713,-0.03387833759188652,-0.01016305759549141,0.009925209917128086,-0.02392476797103882,0.013035053387284279,0.035573747009038925,0.05879547819495201,-0.013345158658921719,-0.04078797623515129,0.09110800921916962,0.08421231061220169,-0.021931400522589684,-0.03162265196442604,-0.010250741615891457,-0.01780121587216854,-0.03833739832043648,0.027513844892382622,0.01747809909284115,0.05410526692867279,-0.022811565548181534,0.014701907522976398,0.01390357781201601,0.0680956244468689,-0.03022737056016922,-0.03816721960902214,0.05881470814347267,-0.0634295642375946,0.0366397425532341,-0.0041647194884717464,-0.03344395384192467,0.021778633818030357,-0.01365756057202816,0.00013289469643495977,-0.0011218355502933264,0.01701664738357067,-0.006052190903574228,-0.03239540383219719,-0.004229600075632334,0.025179509073495865,-0.05097115784883499,-0.0634521096944809,0.027340561151504517,0.021535661071538925,0.054953139275312424,-0.039231881499290466,0.031329140067100525,-0.039867572486400604,0.08217984437942505,0.027664849534630775,-0.030518097802996635,-0.043875399976968765,0.0017012696480378509,-0.04795874282717705,0.0076629845425486565,5.2081566583547506e-33,0.012104833498597145,-0.021845536306500435,-0.015446627512574196,-0.010473235510289669,-0.05128714069724083,0.010650970973074436,-0.016057992354035378,0.01881362497806549,-0.03297743201255798,-0.0018400881672278047,0.05127448961138725,0.01450151577591896,-0.016745571047067642,-0.029828153550624847,-0.005555866751819849,-0.00828798022121191,-0.019364597275853157,-0.0015745770651847124,0.009171772748231888,-0.029114922508597374,0.0008622808963991702,-0.04243538901209831,-0.024029036983847618,0.01960836723446846,0.011869891546666622,0.07446419447660446,-0.01924433559179306,-0.002174383495002985,-0.0007952310261316597,-0.006053476594388485,-0.043259862810373306,-0.039985865354537964,-0.014156135730445385,-0.002533454680815339,-0.06660178303718567,0.09812138229608536,0.022666659206151962,-0.055665869265794754,0.08104902505874634,0.000935126154217869,0.09971322119235992,0.038834474980831146,-0.011497567407786846,0.0011623678728938103,-0.10670998692512512,0.047228068113327026,-0.032456427812576294,0.008877604268491268,-0.009157225489616394,0.05851537361741066,0.03596493974328041,0.06668457388877869,-0.007284317631274462,0.0005738968611694872,0.04926493763923645,0.0012918368447571993,-0.08841469138860703,-0.039277441799640656,-0.04654211923480034,0.013190841302275658,0.006112964358180761,-0.07500524073839188,-0.007845882326364517,-0.03045484609901905,0.018820742145180702,-0.013270453549921513,-0.02050086483359337,0.005337714217603207,0.010481786914169788,0.0013193555641919374,0.04849417880177498,0.006565076764672995,-0.03539564087986946,0.0258104857057333,-0.002458569360896945,-0.02467247284948826,0.0019318674458190799,0.006328521762043238,0.023029647767543793,-0.0031779001001268625,0.022799061611294746,0.012989071197807789,0.027068745344877243,0.009035971015691757,0.039787016808986664,0.03365872800350189,-0.0023655081167817116,0.015953972935676575,0.06879419833421707,0.032359395176172256,-0.015522497706115246,0.04026757925748825,0.0015271325828507543,-0.018388692289590836,0.05082261562347412,-1.1081983330996081e-8,0.02559957280755043,0.008673243224620819,-0.01838422566652298,-0.04593697190284729,0.0039826021529734135,-0.0073942462913692,-0.004485790152102709,-0.03468233719468117,-0.04646198824048042,0.023823289200663567,0.013737988658249378,-0.031179344281554222,-0.008556094020605087,-0.018238037824630737,0.007536490447819233,0.012072671204805374,-0.022044239565730095,-0.048613108694553375,-0.007703342474997044,-0.006661818828433752,0.017121020704507828,0.07403019070625305,0.01853829063475132,-0.0522589236497879,-0.03423937037587166,0.000539599743206054,0.037293966859579086,-0.0641719400882721,0.001750165014527738,0.029817037284374237,0.01577877253293991,0.008167493157088757,-0.033213771879673004,-0.018605923280119896,0.00048052938655018806,0.009263535030186176,-0.023660821840167046,-0.02140936255455017,-0.01433686912059784,-0.00890348106622696,-0.05006374046206474,0.046146851032972336,-0.044094268232584,0.00032841702341102064,0.01000332273542881,-0.05119035765528679,-0.011189521290361881,0.03329552710056305,0.03409237787127495,0.008500294759869576,-0.008055019192397594,0.00939156673848629,-0.04122557118535042,0.047845546156167984,0.05897445231676102,-0.04060143604874611,0.0005296820309013128,0.018419016152620316,-0.012529418803751469,-0.0008339486084878445,-0.009904680773615837,-0.018833789974451065,-0.024123916402459145,-0.07344403117895126]]}}
And voila! Within minutes, we've created a real-time embeddings inference endpoint which will work at scale and power production AI models that need text embeddings.
And of course, given that you've built this FeatureService with Tecton, you can generate embeddings over large batches of data as well without making a single change to your feature repository! Check out the offline retrieval methods documentation to see how you can leverage Tecton to execute your transformations efficiently over hundreds of millions of rows.