MCPcopy
hub / github.com/huggingface/transformers / spectrogram

Function spectrogram

src/transformers/audio_utils.py:645–853  ·  view source on GitHub ↗

Calculates a spectrogram over one waveform using the Short-Time Fourier Transform. This function can create the following kinds of spectrograms: - amplitude spectrogram (`power = 1.0`) - power spectrogram (`power = 2.0`) - complex-valued spectrogram (`power = None`)

(
    waveform: np.ndarray,
    window: np.ndarray,
    frame_length: int,
    hop_length: int,
    fft_length: int | None = None,
    power: float | None = 1.0,
    center: bool = True,
    pad_mode: str = "reflect",
    onesided: bool = True,
    dither: float = 0.0,
    preemphasis: float | None = None,
    mel_filters: np.ndarray | None = None,
    mel_floor: float = 1e-10,
    log_mel: str | None = None,
    reference: float = 1.0,
    min_value: float = 1e-10,
    db_range: float | None = None,
    remove_dc_offset: bool = False,
    dtype: np.dtype = np.float32,
)

Source from the content-addressed store, hash-verified

643
644# Note: This method processes a single waveform. For batch processing, use spectrogram_batch().
645def spectrogram(
646 waveform: np.ndarray,
647 window: np.ndarray,
648 frame_length: int,
649 hop_length: int,
650 fft_length: int | None = None,
651 power: float | None = 1.0,
652 center: bool = True,
653 pad_mode: str = "reflect",
654 onesided: bool = True,
655 dither: float = 0.0,
656 preemphasis: float | None = None,
657 mel_filters: np.ndarray | None = None,
658 mel_floor: float = 1e-10,
659 log_mel: str | None = None,
660 reference: float = 1.0,
661 min_value: float = 1e-10,
662 db_range: float | None = None,
663 remove_dc_offset: bool = False,
664 dtype: np.dtype = np.float32,
665) -> np.ndarray:
666 """
667 Calculates a spectrogram over one waveform using the Short-Time Fourier Transform.
668
669 This function can create the following kinds of spectrograms:
670
671 - amplitude spectrogram (`power = 1.0`)
672 - power spectrogram (`power = 2.0`)
673 - complex-valued spectrogram (`power = None`)
674 - log spectrogram (use `log_mel` argument)
675 - mel spectrogram (provide `mel_filters`)
676 - log-mel spectrogram (provide `mel_filters` and `log_mel`)
677
678 How this works:
679
680 1. The input waveform is split into frames of size `frame_length` that are partially overlapping by `frame_length
681 - hop_length` samples.
682 2. Each frame is multiplied by the window and placed into a buffer of size `fft_length`.
683 3. The DFT is taken of each windowed frame.
684 4. The results are stacked into a spectrogram.
685
686 We make a distinction between the following "blocks" of sample data, each of which may have a different lengths:
687
688 - The analysis frame. This is the size of the time slices that the input waveform is split into.
689 - The window. Each analysis frame is multiplied by the window to avoid spectral leakage.
690 - The FFT input buffer. The length of this determines how many frequency bins are in the spectrogram.
691
692 In this implementation, the window is assumed to be zero-padded to have the same size as the analysis frame. A
693 padded window can be obtained from `window_function()`. The FFT input buffer may be larger than the analysis frame,
694 typically the next power of two.
695
696 Note: This function is not optimized for speed yet. It should be mostly compatible with `librosa.stft` and
697 `torchaudio.functional.transforms.Spectrogram`, although it is more flexible due to the different ways spectrograms
698 can be constructed.
699
700 Args:
701 waveform (`np.ndarray` of shape `(length,)`):
702 The input waveform. This must be a single real-valued, mono waveform.

Calls 5

amplitude_to_dbFunction · 0.85
power_to_dbFunction · 0.85
padMethod · 0.45
meanMethod · 0.45
logMethod · 0.45