hub / github.com/huggingface/transformers / PreTrainedConfig

Class PreTrainedConfig

src/transformers/configuration_utils.py:125–1294 · view source on GitHub ↗

r""" Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving configurations. <Tip> A configuration file can be loaded and saved to disk. Loading the configuration file and using this f

Source from the content-addressed store, hash-verified

123	@strict(accept_kwargs=True)
124	@dataclass(repr=False)
125	class PreTrainedConfig(PushToHubMixin, RotaryEmbeddingConfigMixin):
126	# no-format
127	r"""
128	Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as
129	methods for loading/downloading/saving configurations.
130
131	<Tip>
132
133	A configuration file can be loaded and saved to disk. Loading the configuration file and using this file to
134	initialize a model does not load the model weights. It only affects the model's configuration.
135
136	</Tip>
137
138	Class attributes (overridden by derived classes):
139
140	- model_type (`str`) -- An identifier for the model type, serialized into the JSON file, and used to recreate
141	the correct object in [`~transformers.AutoConfig`].
142	- has_no_defaults_at_init (`bool`) -- Whether the config class can be initialized without providing input arguments.
143	Some configurations requires inputs to be defined at init and have no default values, usually these are composite configs,
144	(but not necessarily) such as [`~transformers.EncoderDecoderConfig`] or [`~RagConfig`]. They have to be initialized from
145	two or more configs of type [`~transformers.PreTrainedConfig`].
146	- keys_to_ignore_at_inference (`list[str]`) -- A list of keys to ignore by default when looking at dictionary
147	outputs of the model during inference.
148	- attribute_map (`dict[str, str]`) -- A dict that maps model specific attribute names to the standardized
149	naming of attributes.
150	- base_model_tp_plan (`dict[str, Any]`) -- A dict that maps sub-modules FQNs of a base model to a tensor
151	parallel plan applied to the sub-module when `model.tensor_parallel` is called.
152	- base_model_pp_plan (`dict[str, tuple[list[str]]]`) -- A dict that maps child-modules of a base model to a
153	pipeline parallel plan that enables users to place the child-module on the appropriate device.
154
155	Common attributes (present in all subclasses):
156
157	- vocab_size (`int`) -- The number of tokens in the vocabulary, which is also the first dimension of the
158	embeddings matrix (this attribute may be missing for models that don't have a text modality like ViT).
159	- hidden_size (`int`) -- The hidden size of the model.
160	- num_attention_heads (`int`) -- The number of attention heads used in the multi-head attention layers of the
161	model.
162	- num_hidden_layers (`int`) -- The number of blocks in the model.
163
164	<Tip warning={true}>
165
166	Setting parameters for sequence generation in the model config is deprecated. For backward compatibility, loading
167	some of them will still be possible, but attempting to overwrite them will throw an exception -- you should set
168	them in a [~transformers.GenerationConfig]. Check the documentation of [~transformers.GenerationConfig] for more
169	information about the individual parameters.
170
171	</Tip>
172
173	Arg:
174	name_or_path (`str`, optional, defaults to `""`):
175	Store the string that was passed to [`PreTrainedModel.from_pretrained`] as `pretrained_model_name_or_path`
176	if the configuration was created with such a method.
177	output_hidden_states (`bool`, optional, defaults to `False`):
178	Whether or not the model should return all hidden-states.
179	output_attentions (`bool`, optional, defaults to `False`):
180	Whether or not the model should returns all attentions.
181	return_dict (`bool`, optional, defaults to `True`):
182	Whether or not the model should return a [`~transformers.utils.ModelOutput`] instead of a plain tuple.

Callers 15

test_moe_and_qkv_conversionMethod · 0.90

test_moe_and_qkv_conversion_reversedMethod · 0.90

test_qkv_chunk_rope_permute_with_fp8_quantizationMethod · 0.90

test_scoped_renaming_does_not_leak_to_sibling_or_parentMethod · 0.90

test_scoped_match_falls_back_when_checkpoint_omits_base_prefixMethod · 0.90

test_scoped_match_strips_one_base_prefix_level_for_nested_scopeMethod · 0.90

test_scoped_match_round_trips_when_scope_equals_base_model_prefixMethod · 0.90

test_interleaved_renaming_and_converter_round_tripMethod · 0.90

test_ernie4_5_vl_moe_conversionMethod · 0.90

test_ernie4_5_vl_moe_conversion_reversedMethod · 0.90

test_can_remove_prefixMethod · 0.90

test_can_add_prefixMethod · 0.90

Calls

no outgoing calls

Tested by 15

test_moe_and_qkv_conversionMethod · 0.72

test_moe_and_qkv_conversion_reversedMethod · 0.72

test_qkv_chunk_rope_permute_with_fp8_quantizationMethod · 0.72

test_scoped_renaming_does_not_leak_to_sibling_or_parentMethod · 0.72

test_scoped_match_falls_back_when_checkpoint_omits_base_prefixMethod · 0.72

test_scoped_match_strips_one_base_prefix_level_for_nested_scopeMethod · 0.72

test_scoped_match_round_trips_when_scope_equals_base_model_prefixMethod · 0.72

test_interleaved_renaming_and_converter_round_tripMethod · 0.72

test_ernie4_5_vl_moe_conversionMethod · 0.72

test_ernie4_5_vl_moe_conversion_reversedMethod · 0.72

test_can_remove_prefixMethod · 0.72

test_can_add_prefixMethod · 0.72