MediaLayer

Audio matching

Audio Fingerprinting API

Compare two audio files by URL and find the segments where they overlap. The MediaLayer audio fingerprinting API powers duplicate audio detection, audio matching, and music reuse detection — useful for catalog deduplication, copyright workflows, and any pipeline that needs to know whether two clips share the same recording.

How the audio matching API works

Send a POST request with source_url and target_url. MediaLayer downloads each file, extracts spectral peaks, and turns them into a compact audio fingerprint using a hashing scheme similar to the one popularized by commercial recognition services. Matching is offset-aligned, so the API can tell you not just whether the same recording is present but where in each file the overlap sits.

Real-world duplicate audio detection has to survive lossy transcoding, sample-rate changes, partial reuse, and noise added during distribution. Spectral fingerprints handle those transformations far better than raw waveform hashing — the characteristic peaks of a recording stay recognizable even after re-encoding from FLAC to a 96 kbps MP3, or after a clip has been mixed under a voiceover.

The endpoint accepts MP3, WAV, M4A, and AAC. There is a per-request duration cap to keep response times predictable; for longer-form audio, splice the input into segments and call the API in parallel. The matcher is stateless: no fingerprint database to maintain on your side, no index versioning, no warm-up.

For production audio matching, the segment-level data in matched_segments is usually more useful than the top-level score on its own. A single overlapping bar can drive a high similarity_score on a short clip; what most workflows actually want is total matched duration or the longest contiguous run. The audio fingerprinting API returns the raw segments so you can compute that aggregation against your own policy.

Audio fingerprinting API example

Two URLs in. JSON response with offset-aligned segments out. Same request shape as MediaLayer's image and video endpoints.

REQUEST
POST /audio/match
{
  "source_url": "https://example.com/source.mp3",
  "target_url": "https://example.com/candidate.mp3"
}
RESPONSE
{
  "match": true,
  "confidence": "medium",
  "similarity_score": 0.78,
  "processing_time_ms": 1240,
  "media_type": "audio",
  "matched_segments": [
    {
      "source_start_sec": 12.0,
      "source_end_sec": 18.0,
      "target_start_sec": 0.0,
      "target_end_sec": 6.0,
      "score": 0.86
    }
  ]
}

Each entry in matched_segments tells you exactly where the recordings line up in time, with a per-segment score. That is what you need to power audit UIs, generate evidence for copyright take-down requests, or surface timestamped flags inside a moderation review queue.

Use cases for audio matching

Music reuse detection

Identify uploads that contain a known recording, even after lossy re-encoding or partial reuse.

Podcast / show segment matching

Detect repeated intros, ads, and sponsorships across episode catalogs.

Copyright and licensing workflows

Compare uploaded audio against a known catalog and surface overlap with timestamps.

Catalog deduplication

Collapse duplicate audio entries that differ only by encoding or metadata.

Voice / sample reuse audit

Find sample reuse across a back catalog without manual A/B listening.

Broadcast monitoring

Match streaming audio against known assets for compliance and reporting.

A hosted audio fingerprinting API beats a DIY pipeline

Rolling your own audio fingerprinting means picking a spectral-peak algorithm, tuning hash density, building a chunk scheduler that survives long-form input, and handling the codec soup of MP3, WAV, M4A, and AAC. It also means keeping a fingerprint index in sync with whatever reference catalog you are matching against — and keeping all of it healthy as the rest of your product evolves.

MediaLayer abstracts the decoding, fingerprinting, and aligned matching behind a single JSON request. You send two URLs, you get back time-aligned matched segments. No DSP code in your monolith, no model versions to track, no separate service to page on call. Engineering effort goes into the differentiated parts of your product instead.

Start matching audio by URL

Free plan to start on RapidAPI. Scale as your volume grows.