Stream the LLM on the given prompt. This method should be overridden by subclasses that support streaming. If not implemented, the default behavior of calls to stream will be to fallback to the non-streaming version of the model and return the output as a single chu
(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
)
| 551 | ) |
| 552 | |
| 553 | def _stream( |
| 554 | self, |
| 555 | prompt: str, |
| 556 | stop: Optional[List[str]] = None, |
| 557 | run_manager: Optional[CallbackManagerForLLMRun] = None, |
| 558 | **kwargs: Any, |
| 559 | ) -> Iterator[GenerationChunk]: |
| 560 | """Stream the LLM on the given prompt. |
| 561 | |
| 562 | This method should be overridden by subclasses that support streaming. |
| 563 | |
| 564 | If not implemented, the default behavior of calls to stream will be to |
| 565 | fallback to the non-streaming version of the model and return |
| 566 | the output as a single chunk. |
| 567 | |
| 568 | Args: |
| 569 | prompt: The prompt to generate from. |
| 570 | stop: Stop words to use when generating. Model output is cut off at the |
| 571 | first occurrence of any of these substrings. |
| 572 | run_manager: Callback manager for the run. |
| 573 | **kwargs: Arbitrary additional keyword arguments. These are usually passed |
| 574 | to the model provider API call. |
| 575 | |
| 576 | Returns: |
| 577 | An iterator of GenerationChunks. |
| 578 | """ |
| 579 | raise NotImplementedError() |
| 580 | |
| 581 | async def _astream( |
| 582 | self, |