MCPcopy
hub / github.com/huggingface/transformers / GPT2Config

Class GPT2Config

src/transformers/models/gpt2/configuration_gpt2.py:25–103  ·  view source on GitHub ↗

r""" summary_type (`string`, *optional*, defaults to `"cls_index"`): Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`]. Has to be one of the following options: - `"last"`: Take the last token hidden state (like XLNet).

Source from the content-addressed store, hash-verified

23@auto_docstring(checkpoint="openai-community/gpt2")
24@strict
25class GPT2Config(PreTrainedConfig):
26 r"""
27 summary_type (`string`, *optional*, defaults to `"cls_index"`):
28 Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
29 Has to be one of the following options:
30 - `"last"`: Take the last token hidden state (like XLNet).
31 - `"first"`: Take the first token hidden state (like BERT).
32 - `"mean"`: Take the mean of all tokens hidden states.
33 - `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2).
34 - `"attn"`: Not implemented now, use multi-head attention.
35 summary_use_proj (`bool`, *optional*, defaults to `True`):
36 Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
37 Whether or not to add a projection after the vector extraction.
38 summary_activation (`str`, *optional*):
39 Argument used when doing sequence summary. Used in for the multiple choice head in
40 [`GPT2DoubleHeadsModel`].
41 Pass `"tanh"` for a tanh activation to the output, any other value will result in no activation.
42 summary_proj_to_labels (`bool`, *optional*, defaults to `True`):
43 Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
44 Whether the projection outputs should have `config.num_labels` or `config.hidden_size` classes.
45 summary_first_dropout (`float`, *optional*, defaults to 0.1):
46 Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
47 The dropout ratio to be used after the projection and activation.
48 scale_attn_by_inverse_layer_idx (`bool`, *optional*, defaults to `False`):
49 Whether to additionally scale attention weights by `1 / layer_idx + 1`.
50 reorder_and_upcast_attn (`bool`, *optional*, defaults to `False`):
51 Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention
52 dot-product/softmax to float() when training with mixed precision.
53
54 Example:
55
56 ```python
57 >>> from transformers import GPT2Config, GPT2Model
58
59 >>> # Initializing a GPT2 configuration
60 >>> configuration = GPT2Config()
61
62 >>> # Initializing a model (with random weights) from the configuration
63 >>> model = GPT2Model(configuration)
64
65 >>> # Accessing the model configuration
66 >>> configuration = model.config
67 ```"""
68
69 model_type = "gpt2"
70 keys_to_ignore_at_inference = ["past_key_values"]
71 attribute_map = {
72 "hidden_size": "n_embd",
73 "max_position_embeddings": "n_positions",
74 "num_attention_heads": "n_head",
75 "num_hidden_layers": "n_layer",
76 }
77
78 vocab_size: int = 50257
79 n_positions: int = 1024
80 n_embd: int = 768
81 n_layer: int = 12
82 n_head: int = 12

Calls

no outgoing calls