Calculates a spectrogram over one waveform using the Short-Time Fourier Transform. This function can create the following kinds of spectrograms: - amplitude spectrogram (`power = 1.0`) - power spectrogram (`power = 2.0`) - complex-valued spectrogram (`power = None`)
(
waveform: np.ndarray,
window: np.ndarray,
frame_length: int,
hop_length: int,
fft_length: int | None = None,
power: float | None = 1.0,
center: bool = True,
pad_mode: str = "reflect",
onesided: bool = True,
dither: float = 0.0,
preemphasis: float | None = None,
mel_filters: np.ndarray | None = None,
mel_floor: float = 1e-10,
log_mel: str | None = None,
reference: float = 1.0,
min_value: float = 1e-10,
db_range: float | None = None,
remove_dc_offset: bool = False,
dtype: np.dtype = np.float32,
)
| 643 | |
| 644 | # Note: This method processes a single waveform. For batch processing, use spectrogram_batch(). |
| 645 | def spectrogram( |
| 646 | waveform: np.ndarray, |
| 647 | window: np.ndarray, |
| 648 | frame_length: int, |
| 649 | hop_length: int, |
| 650 | fft_length: int | None = None, |
| 651 | power: float | None = 1.0, |
| 652 | center: bool = True, |
| 653 | pad_mode: str = "reflect", |
| 654 | onesided: bool = True, |
| 655 | dither: float = 0.0, |
| 656 | preemphasis: float | None = None, |
| 657 | mel_filters: np.ndarray | None = None, |
| 658 | mel_floor: float = 1e-10, |
| 659 | log_mel: str | None = None, |
| 660 | reference: float = 1.0, |
| 661 | min_value: float = 1e-10, |
| 662 | db_range: float | None = None, |
| 663 | remove_dc_offset: bool = False, |
| 664 | dtype: np.dtype = np.float32, |
| 665 | ) -> np.ndarray: |
| 666 | """ |
| 667 | Calculates a spectrogram over one waveform using the Short-Time Fourier Transform. |
| 668 | |
| 669 | This function can create the following kinds of spectrograms: |
| 670 | |
| 671 | - amplitude spectrogram (`power = 1.0`) |
| 672 | - power spectrogram (`power = 2.0`) |
| 673 | - complex-valued spectrogram (`power = None`) |
| 674 | - log spectrogram (use `log_mel` argument) |
| 675 | - mel spectrogram (provide `mel_filters`) |
| 676 | - log-mel spectrogram (provide `mel_filters` and `log_mel`) |
| 677 | |
| 678 | How this works: |
| 679 | |
| 680 | 1. The input waveform is split into frames of size `frame_length` that are partially overlapping by `frame_length |
| 681 | - hop_length` samples. |
| 682 | 2. Each frame is multiplied by the window and placed into a buffer of size `fft_length`. |
| 683 | 3. The DFT is taken of each windowed frame. |
| 684 | 4. The results are stacked into a spectrogram. |
| 685 | |
| 686 | We make a distinction between the following "blocks" of sample data, each of which may have a different lengths: |
| 687 | |
| 688 | - The analysis frame. This is the size of the time slices that the input waveform is split into. |
| 689 | - The window. Each analysis frame is multiplied by the window to avoid spectral leakage. |
| 690 | - The FFT input buffer. The length of this determines how many frequency bins are in the spectrogram. |
| 691 | |
| 692 | In this implementation, the window is assumed to be zero-padded to have the same size as the analysis frame. A |
| 693 | padded window can be obtained from `window_function()`. The FFT input buffer may be larger than the analysis frame, |
| 694 | typically the next power of two. |
| 695 | |
| 696 | Note: This function is not optimized for speed yet. It should be mostly compatible with `librosa.stft` and |
| 697 | `torchaudio.functional.transforms.Spectrogram`, although it is more flexible due to the different ways spectrograms |
| 698 | can be constructed. |
| 699 | |
| 700 | Args: |
| 701 | waveform (`np.ndarray` of shape `(length,)`): |
| 702 | The input waveform. This must be a single real-valued, mono waveform. |