Concept
Audio fingerprinting API explained
Audio fingerprinting compares two recordings without comparing the raw bytes. This article explains what fingerprinting actually does, why hashes don't work for transcoded audio, and how POST /audio/match returns offset-aligned matched segments.
Hashes don't survive audio in the wild
If you SHA-256 an MP3 and a SHA-256 a re-encoded copy of the same song, the hashes are completely different. The audio is the same; the byte stream isn't. That's the gap that audio fingerprinting fills.
A good fingerprint depends on perceptual features of the audio (spectral peaks, time–frequency landmarks) rather than on the byte stream. The fingerprint is derived from the content; the content is what matters.
What fingerprinting actually does
Take a short window of the audio, transform it into the frequency domain, find the locally salient peaks, and store their relative time positions. That set of (frequency, relative-time) anchors is the fingerprint.
Two recordings match when enough anchors agree — and crucially, when those agreements are time-consistent. A real match has a bunch of anchor pairs at a fixed time offset. A coincidental match has anchors scattered across offsets. The fingerprinter scores by counting the largest time-consistent group, which is also why /audio/match can return the offset-aligned segment, not just a yes / no.
What POST /audio/match returns
- match — Boolean decision based on a sensible threshold for audio.
- similarity_score — Float in [0, 1]. Use this directly for custom thresholds — different domains tolerate different false-positive rates.
- matched_segments — Aligned start/end timestamps. source_start to source_end maps to target_start to target_end at a fixed time offset. This is the fingerprint alignment, exposed.
{
"match": true,
"confidence": "medium",
"similarity_score": 0.74,
"processing_time_ms": 612,
"media_type": "audio",
"matched_segments": [
{ "source_start": 5.1, "source_end": 18.9, "target_start": 0.0, "target_end": 13.8, "score": 0.78 }
]
}Calling the endpoint
curl -X POST https://medialayer-image-audio-video-matching-api.p.rapidapi.com/audio/match \
-H "x-rapidapi-key: YOUR_RAPIDAPI_KEY" \
-H "x-rapidapi-host: medialayer-image-audio-video-matching-api.p.rapidapi.com" \
-H "Content-Type: application/json" \
-d '{
"source_url": "https://example.com/source.mp3",
"target_url": "https://example.com/target.wav"
}'What survives, what doesn't
- Survives — Codec changes (MP3 ↔ AAC ↔ Opus), bitrate drops, container swaps, sample-rate conversions, light EQ, modest pitch / time stretching, and re-uploads of clips trimmed from longer originals.
- Doesn't survive — Heavy time-stretching that distorts the time–frequency relationships, very aggressive pitch-shifts, or recordings whose speech / content is fundamentally re-performed (a re-recorded cover, not a re-encoded copy).
- Edge cases — Silence, room tone, and very short clips (< 2 s) make for unreliable fingerprints. Per-request duration caps are 300 s for audio, which keeps unbounded files out of the queue.
Choosing a threshold
match defaults to a sensible per-medium threshold. If you want a stricter or looser cutoff — e.g., for monetization workflows that share revenue based on overlap — use similarity_score directly and pick the threshold that matches your false-positive tolerance.
A simple, durable pattern is to combine score with overlap duration: require similarity_score >= 0.6 AND total_overlap_seconds >= 5. That discards short coincidental matches without depending on a single fragile cutoff.
def is_real_audio_match(response: dict) -> bool:
score = response["similarity_score"]
segs = response.get("matched_segments", [])
overlap = sum(s["source_end"] - s["source_start"] for s in segs)
return score >= 0.6 and overlap >= 5.0Production checklist
- Public URLs — URL validation rejects private, loopback, and cloud-metadata addresses, so signed-only-from-VPC URLs won't work for the public endpoint.
- Bound timeouts — Audio matching is fast for short clips (sub-second to a few seconds) but can run several seconds for long sources. Pick a 30s-60s timeout in your client.
- Server-side keys — Never expose x-rapidapi-key in browser or mobile clients. Call from your backend.
Endpoints used in this article
POST /audio/match
Compare two audio recordings and return offset-aligned matched segments. Survives transcoding, partial reuse, and modest pitch / time stretching.
See full reference →POST /video/match
When the audio you care about lives inside a video, /video/match returns aligned matched segments using the same envelope shape.
See full reference →Related articles
Near-duplicate media detection: image, video, and audio in one API
Why near-duplicate detection is harder than exact-match hashing — and how the same envelope handles all three media types.
Read article →How to detect duplicate videos using an API
Same matching primitives applied to video — including how to use matched_segments for clip-level overlap.
Read article →Keep exploring
Audio playground
Try /audio/match against two URLs in your browser before wiring it into your pipeline.
Open →Copyright / reuse detection
How aligned matched segments drive monetization and ownership workflows.
Open →Content moderation
Re-uploaded harmful media and audio-only evasion — where audio fingerprinting closes the loop.
Open →Ready to wire it in?
Subscribe on RapidAPI to call the public API on your own key, or talk to MediaLayer AI Labs about enterprise direct API access.