MCPcopy
hub / github.com/huggingface/transformers / multinomial_sample_one_no_sync

Function multinomial_sample_one_no_sync

benchmark/benches/llama.py:143–145  ·  view source on GitHub ↗
(probs_sort)

Source from the content-addressed store, hash-verified

141
142 # Copied from the gpt-fast repo
143 def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
144 q = torch.empty_like(probs_sort).exponential_(1)
145 return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
146
147 def logits_to_probs(logits, temperature: float = 1.0, top_k: int | None = None):
148 logits = logits / max(temperature, 1e-5)

Callers 1

sampleFunction · 0.85

Calls 1

toMethod · 0.45

Tested by

no test coverage detected