[FEEDBACK] Inference Providers

#49
by julien-c - opened
Hugging Face org

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

Love that I can call DeepSeek R1 directly from the Hub 🔥

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="together",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.choices[0].message)

Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(

Hugging Face org

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

Thanks for your quick reply, good to know!

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

Could be good to add featherless.ai

TitanML !!

Hi Hugging Face team, 👋

I’m with GPT Porto (https://www.gptproto.com/), an AI API platform focused on providing safe, stable, fast, and affordable inference for developers and enterprises. With just one API key, our users can access most mainstream models, including Hugging Face models, while enjoying reliable infrastructure and cost efficiency.

We’ve seen strong adoption from teams who value predictable performance, security, and competitive pricing for both experimentation and production workloads. Many of them already integrate Hugging Face models through GPT Proto to streamline deployment and reduce costs.

We’d love to explore becoming an official Inference Provider on Hugging Face, so that more builders in your ecosystem can benefit from a secure, high-performance, and budget-friendly option for model inference.
Looking forward to collaborating!

Contact us: [email protected]

Best regards,
Team GPT Proto

Would be great to have Simplismart to the list!

Hi!

Would be great if Snowcell could be added to the list. We build complete inferencing solutions from the ground up.

I couldn't find a specific contact point to reach about this, but for any questions we are available at [email protected].

Best Regards

Hello,

At FAIM, we are building an inference platform for time-series foundation models: https://faim.it.com/.
All models we currently support are available on Hugging Face.

I would like to clarify whether it’s possible for us to become an inference provider on Hugging Face for time-series models.

Thank you and best regards,
Andrei
[email protected]

Hello Gatewayz is ready for integation. Please email me at [email protected]

Hi Hugging Face team!

We're gcube (https://gcube.ai), a GPU sharing platform from South Korea. We make AI inference super affordable by connecting idle GPUs from cloud providers and even PC cafes across Korea - basically turning unused computing power into a distributed GPU network.

Our customers are seeing 55-70% cost savings, and we work with major Korean cloud partners like Naver Cloud, NHN Cloud, and KT Cloud.

We'd love to become an official Inference Provider on Hugging Face. Would really appreciate any guidance on the next steps!

Our HF org: https://huggingface.co/gcube-ai (Team plan subscribed)

Thanks!

Best,
Koo
Data Alliance (gcube)
[email protected]

Hello Hugging Face team 👋

We’re from Simplismart.ai, a Series A startup backed by Accel, building a modular MLOps platform focused on high-performance inference. We’re currently exploring the process of listing our inference APIs as a provider on Hugging Face.

We’ve gone through the inference provider documentation and are preparing for the next steps, but before raising a PR, we’d appreciate some clarity around the billing flow, specifically:

Questions:

  1. What is the expected delay between a successful inference request and Hugging Face calling the billing endpoint?
  2. If we’re unable to return cost details within one minute when Hugging Face hits the billing endpoint, does Hugging Face retry the request? If so, what’s the retry behavior?

We want to ensure our implementation aligns closely with Hugging Face’s billing expectations, so any guidance on the above would be very helpful.

Thanks in advance for the support! 🤗

-- Pratik Parmar
Developer Advocate @ Simplismart.ai
[email protected]

Sign up or log in to comment