MCPcopy
hub / github.com/jina-ai/serve

github.com/jina-ai/serve @v3.34.0 sqlite

repository ↗ · DeepWiki ↗ · release v3.34.0 ↗
5,000 symbols 20,446 edges 655 files 1,295 documented · 26%
README

Jina-Serve

PyPI PyPI - Downloads from official pypistats Github CD status

Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.

Key Features

  • Native support for all major ML frameworks and data types
  • High-performance service design with scaling, streaming, and dynamic batching
  • LLM serving with streaming output
  • Built-in Docker integration and Executor Hub
  • One-click deployment to Jina AI Cloud
  • Enterprise-ready with Kubernetes and Docker Compose support

Comparison with FastAPI

Key advantages over FastAPI:

  • DocArray-based data handling with native gRPC support
  • Built-in containerization and service orchestration
  • Seamless scaling of microservices
  • One-command cloud deployment

Install

pip install jina

See guides for Apple Silicon and Windows.

Core Concepts

Three main layers: - Data: BaseDoc and DocList for input/output - Serving: Executors process Documents, Gateway connects services - Orchestration: Deployments serve Executors, Flows create pipelines

Build AI Services

Let's create a gRPC-based AI service using StableLM:

from jina import Executor, requests
from docarray import DocList, BaseDoc
from transformers import pipeline


class Prompt(BaseDoc):
    text: str


class Generation(BaseDoc):
    prompt: str
    text: str


class StableLM(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.generator = pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        generations = DocList[Generation]()
        prompts = docs.text
        llm_outputs = self.generator(prompts)
        for prompt, output in zip(prompts, llm_outputs):
            generations.append(Generation(prompt=prompt, text=output))
        return generations

Deploy with Python or YAML:

from jina import Deployment
from executor import StableLM

dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)

with dep:
    dep.block()
jtype: Deployment
with:
 uses: StableLM
 py_modules:
   - executor.py
 timeout_ready: -1
 port: 12345

Use the client:

from jina import Client
from docarray import DocList
from executor import Prompt, Generation

prompt = Prompt(text='suggest an interesting image generation prompt')
client = Client(port=12345)
response = client.post('/', inputs=[prompt], return_type=DocList[Generation])

Build Pipelines

Chain services into a Flow:

from jina import Flow

flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)

with flow:
    flow.block()

Scaling and Deployment

Local Scaling

Boost throughput with built-in features: - Replicas for parallel processing - Shards for data partitioning - Dynamic batching for efficient model inference

Example scaling a Stable Diffusion deployment:

jtype: Deployment
with:
 uses: TextToImage
 timeout_ready: -1
 py_modules:
   - text_to_image.py
 env:
  CUDA_VISIBLE_DEVICES: RR
 replicas: 2
 uses_dynamic_batching:
   /default:
     preferred_batch_size: 10
     timeout: 200

Cloud Deployment

Containerize Services

  1. Structure your Executor:
TextToImage/
├── executor.py
├── config.yml
├── requirements.txt
  1. Configure:
# config.yml
jtype: TextToImage
py_modules:
 - executor.py
metas:
 name: TextToImage
 description: Text to Image generation Executor
  1. Push to Hub:
jina hub push TextToImage

Deploy to Kubernetes

jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s

Use Docker Compose

jina export docker-compose flow.yml docker-compose.yml
docker-compose up

JCloud Deployment

Deploy with a single command:

jina cloud deploy jcloud-flow.yml

LLM Streaming

Enable token-by-token streaming for responsive LLM applications:

  1. Define schemas:
from docarray import BaseDoc


class PromptDocument(BaseDoc):
    prompt: str
    max_tokens: int


class ModelOutputDocument(BaseDoc):
    token_id: int
    generated_text: str
  1. Initialize service:
from transformers import GPT2Tokenizer, GPT2LMHeadModel


class TokenStreamingExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = GPT2LMHeadModel.from_pretrained('gpt2')
  1. Implement streaming:
@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
    input = tokenizer(doc.prompt, return_tensors='pt')
    input_len = input['input_ids'].shape[1]
    for _ in range(doc.max_tokens):
        output = self.model.generate(**input, max_new_tokens=1)
        if output[0][-1] == tokenizer.eos_token_id:
            break
        yield ModelOutputDocument(
            token_id=output[0][-1],
            generated_text=tokenizer.decode(
                output[0][input_len:], skip_special_tokens=True
            ),
        )
        input = {
            'input_ids': output,
            'attention_mask': torch.ones(1, len(output[0])),
        }
  1. Serve and use:
# Server
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
    dep.block()


# Client
async def main():
    client = Client(port=12345, protocol='grpc', asyncio=True)
    async for doc in client.stream_doc(
        on='/stream',
        inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
        return_type=ModelOutputDocument,
    ):
        print(doc.generated_text)

Support

Jina-serve is backed by Jina AI and licensed under Apache-2.0.

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

join
called by 517
jina/orchestrate/pods/__init__.py
post
called by 368
jina/clients/mixin.py
get
called by 196
jina/serve/consensus/jina_raft/snapshot.go
Client
called by 193
jina/clients/__init__.py
debug
called by 189
tests/k8s/executor-merger/exec_merger.py
start
called by 172
jina/orchestrate/flow/base.py
random_port
called by 154
jina/helper.py
load_config
called by 105
jina/jaml/__init__.py

Shape

Method 2,098
Function 1,861
Class 856
Struct 75
Route 67
Interface 40
TypeAlias 3

Languages

Python89%
Go11%
TypeScript1%

Modules by API surface

jina/serve/consensus/jina-go-proto/jina_grpc.pb.go185 symbols
jina/serve/consensus/jina-go-proto/jina.pb.go181 symbols
tests/integration/docarray_v2/test_v2.py171 symbols
jina/serve/consensus/docarray-go-proto/docarray.pb.go132 symbols
jina/helper.py87 symbols
tests/unit/serve/executors/test_executor.py78 symbols
jina/proto/docarray_v2/pb2/jina_pb2_grpc.py77 symbols
jina/proto/docarray_v2/pb/jina_pb2_grpc.py77 symbols
jina/proto/docarray_v1/pb2/jina_pb2_grpc.py77 symbols
jina/proto/docarray_v1/pb/jina_pb2_grpc.py77 symbols
jina/orchestrate/deployments/__init__.py74 symbols
jina/orchestrate/flow/base.py60 symbols

Dependencies from manifests, versioned

github.com/Jille/raft-grpc-leader-rpcv1.1.0 · 1×
github.com/Jille/raft-grpc-transportv1.1.1 · 1×
github.com/Jille/raftadminv1.2.0 · 1×
github.com/armon/go-metricsv0.3.9 · 1×
github.com/boltdb/boltv1.3.1 · 1×
github.com/hashicorp/go-hclogv0.16.2 · 1×
github.com/hashicorp/go-immutable-radixv1.3.1 · 1×
github.com/hashicorp/go-msgpackv0.5.5 · 1×
github.com/hashicorp/raftv1.3.11 · 1×

For agents

$ claude mcp add serve \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact