hub / github.com/huggingface/transformers / GPT2Config

Class GPT2Config

src/transformers/models/gpt2/configuration_gpt2.py:25–103 · view source on GitHub ↗

r""" summary_type (`string`, *optional*, defaults to `"cls_index"`): Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`]. Has to be one of the following options: - `"last"`: Take the last token hidden state (like XLNet).

Source from the content-addressed store, hash-verified

23	@auto_docstring(checkpoint="openai-community/gpt2")
24	@strict
25	class GPT2Config(PreTrainedConfig):
26	r"""
27	summary_type (`string`, optional, defaults to `"cls_index"`):
28	Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
29	Has to be one of the following options:
30	- `"last"`: Take the last token hidden state (like XLNet).
31	- `"first"`: Take the first token hidden state (like BERT).
32	- `"mean"`: Take the mean of all tokens hidden states.
33	- `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2).
34	- `"attn"`: Not implemented now, use multi-head attention.
35	summary_use_proj (`bool`, optional, defaults to `True`):
36	Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
37	Whether or not to add a projection after the vector extraction.
38	summary_activation (`str`, optional):
39	Argument used when doing sequence summary. Used in for the multiple choice head in
40	[`GPT2DoubleHeadsModel`].
41	Pass `"tanh"` for a tanh activation to the output, any other value will result in no activation.
42	summary_proj_to_labels (`bool`, optional, defaults to `True`):
43	Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
44	Whether the projection outputs should have `config.num_labels` or `config.hidden_size` classes.
45	summary_first_dropout (`float`, optional, defaults to 0.1):
46	Argument used when doing sequence summary, used in the models [`GPT2DoubleHeadsModel`].
47	The dropout ratio to be used after the projection and activation.
48	scale_attn_by_inverse_layer_idx (`bool`, optional, defaults to `False`):
49	Whether to additionally scale attention weights by `1 / layer_idx + 1`.
50	reorder_and_upcast_attn (`bool`, optional, defaults to `False`):
51	Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention
52	dot-product/softmax to float() when training with mixed precision.
53
54	Example:
55
56	```python
57	>>> from transformers import GPT2Config, GPT2Model
58
59	>>> # Initializing a GPT2 configuration
60	>>> configuration = GPT2Config()
61
62	>>> # Initializing a model (with random weights) from the configuration
63	>>> model = GPT2Model(configuration)
64
65	>>> # Accessing the model configuration
66	>>> configuration = model.config
67	```"""
68
69	model_type = "gpt2"
70	keys_to_ignore_at_inference = ["past_key_values"]
71	attribute_map = {
72	"hidden_size": "n_embd",
73	"max_position_embeddings": "n_positions",
74	"num_attention_heads": "n_head",
75	"num_hidden_layers": "n_layer",
76	}
77
78	vocab_size: int = 50257
79	n_positions: int = 1024
80	n_embd: int = 768
81	n_layer: int = 12
82	n_head: int = 12

Callers 8

convert_checkpoint_from_megatron_to_transformersFunction · 0.90

test_config_from_stringMethod · 0.90

test_nested_wrapper_recursionMethod · 0.90

test_neftuneMethod · 0.90

test_logging_inf_nan_filterMethod · 0.90

test_get_eval_dataloader_without_persistent_workersMethod · 0.90

test_get_eval_dataloader_with_persistent_workersMethod · 0.90

_get_gpt2_and_datasetMethod · 0.90

Calls

no outgoing calls

Tested by 7

test_config_from_stringMethod · 0.72

test_nested_wrapper_recursionMethod · 0.72

test_neftuneMethod · 0.72

test_logging_inf_nan_filterMethod · 0.72

test_get_eval_dataloader_without_persistent_workersMethod · 0.72

test_get_eval_dataloader_with_persistent_workersMethod · 0.72

_get_gpt2_and_datasetMethod · 0.72