hub / github.com/huggingface/transformers / TrainingArguments

Class TrainingArguments

src/transformers/training_args.py:179–2848 · view source on GitHub ↗

Configuration class for controlling all aspects of model training with the Trainer. TrainingArguments centralizes all hyperparameters, optimization settings, logging preferences, and infrastructure choices needed for training. [`HfArgumentParser`] can turn this class into [argparse

Source from the content-addressed store, hash-verified

177
178	@dataclass
179	class TrainingArguments:
180	"""
181	Configuration class for controlling all aspects of model training with the Trainer.
182	TrainingArguments centralizes all hyperparameters, optimization settings, logging preferences, and infrastructure choices needed for training.
183
184	[`HfArgumentParser`] can turn this class into
185	[argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the
186	command line.
187
188	Parameters:
189	output_dir (`str`, optional, defaults to `"trainer_output"`):
190	The output directory where the model predictions and checkpoints will be written.
191
192	> Training Duration and Batch Size
193
194	per_device_train_batch_size (`int`, optional, defaults to 8):
195	The batch size per device. The global batch size is computed as:
196	`per_device_train_batch_size * number_of_devices` in multi-GPU or distributed setups.
197	num_train_epochs(`float`, optional, defaults to 3.0):
198	Total number of training epochs to perform (if not an integer, will perform the decimal part percents of
199	the last epoch before stopping training).
200	max_steps (`int`, optional, defaults to -1):
201	Overrides `num_train_epochs`. If set to a positive number, the total number of training steps to perform.
202	For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until
203	`max_steps` is reached.
204
205	> Learning Rate & Scheduler
206
207	learning_rate (`float`, optional, defaults to 5e-5):
208	The initial learning rate for the optimizer. This is typically the peak learning rate when using a scheduler with warmup.
209	lr_scheduler_type (`str` or [`SchedulerType`], optional, defaults to `"linear"`):
210	The learning rate scheduler type to use. See [`SchedulerType`] for all possible values. Common choices:
211	- "linear" = [`get_linear_schedule_with_warmup`]
212	- "cosine" = [`get_cosine_schedule_with_warmup`]
213	- "constant" = [`get_constant_schedule`]
214	- "constant_with_warmup" = [`get_constant_schedule_with_warmup`]
215	lr_scheduler_kwargs (`dict` or `str`, optional, defaults to `None`):
216	The extra arguments for the lr_scheduler. See the documentation of each scheduler for possible values.
217	warmup_steps (`int` or `float`, optional, defaults to 0):
218	Number of steps for a linear warmup from 0 to `learning_rate`. Warmup helps stabilize training in the initial phase. Can be:
219	- An integer: exact number of warmup steps
220	- A float in range [0, 1): interpreted as ratio of total training steps
221
222	> Optimizer
223
224	optim (`str` or [`training_args.OptimizerNames`], optional, defaults to `"adamw_torch"` (for torch>=2.8 `"adamw_torch_fused"`)):
225	The optimizer to use. Common options:
226	- `"adamw_torch"`: PyTorch's AdamW (recommended default)
227	- `"adamw_torch_fused"`: Fused AdamW kernel
228	- `"adamw_hf"`: HuggingFace's AdamW implementation
229	- `"sgd"`: Stochastic Gradient Descent with momentum
230	- `"adafactor"`: Memory-efficient optimizer for large models
231	- `"adamw_8bit"`: 8-bit AdamW (requires bitsandbytes)
232	See [`OptimizerNames`] for the complete list.
233	optim_args (`str`, optional):
234	Optional arguments that are supplied to optimizers such as AnyPrecisionAdamW, AdEMAMix, and GaLore.
235	weight_decay (`float`, optional, defaults to 0):
236	Weight decay coefficient applied by the optimizer (not the loss function). Adds L2

Callers 15

test_14_valid_dict_input_parsingMethod · 0.90

test_default_lr_scheduler_type_unchangedMethod · 0.90

test_prefix_tuning_trainer_load_best_model_at_end_errorMethod · 0.90

setUpMethod · 0.90

test_eval_use_gather_objectMethod · 0.90

setUpMethod · 0.90

test_trainer_eval_mrpcMethod · 0.90

test_trainer_eval_multipleMethod · 0.90

test_default_output_dirMethod · 0.90

test_custom_output_dirMethod · 0.90

test_output_dir_creationMethod · 0.90

test_torch_empty_cache_steps_requirementsMethod · 0.90

Calls 2

is_torch_availableFunction · 0.85

keysMethod · 0.45

Tested by 15

test_14_valid_dict_input_parsingMethod · 0.72

test_default_lr_scheduler_type_unchangedMethod · 0.72

test_prefix_tuning_trainer_load_best_model_at_end_errorMethod · 0.72

setUpMethod · 0.72

test_eval_use_gather_objectMethod · 0.72

setUpMethod · 0.72

test_trainer_eval_mrpcMethod · 0.72

test_trainer_eval_multipleMethod · 0.72

test_default_output_dirMethod · 0.72

test_custom_output_dirMethod · 0.72

test_output_dir_creationMethod · 0.72

test_torch_empty_cache_steps_requirementsMethod · 0.72