Concept

Near-duplicate media detection: image, video, and audio in one API

April 29, 2026Concepts7 min read

Exact-match hashing is the easy half — it's the near-duplicates that consume review queues. This article covers what 'near-duplicate' actually means across image, video, and audio, and how to handle all three with a single JSON envelope.

Exact-match vs. near-duplicate

An exact-match check is a hash lookup: same bytes, same hash, match. It's fast, deterministic, and useful for catching the laziest re-uploads.

A near-duplicate is content that any human would call the same — but the bytes don't match. Re-encoded video, watermarked image, transcoded audio, cropped frame, mirrored asset, recompressed JPEG. Hashes can't represent any of those.

The cost of missing near-duplicates is concrete: re-uploaded harmful media, recycled ad creatives that re-enter brand-safety queues, copied marketplace listings, and rights-violating clips that slip past dedupe.

What survives, by media type

Image — Re-encoding (JPEG ↔ PNG ↔ WebP), bitrate / quality drops, resizing, cropping, watermarking, mirroring, light color edits.
Video — Codec changes, container swaps, aspect-ratio changes (with letterboxing), trimmed clips of a longer source, and audio swaps. matched_segments returns the aligned overlap.
Audio — Codec changes, bitrate drops, sample-rate conversion, light EQ, modest pitch / time-stretching, and partial reuse. matched_segments returns offset-aligned overlap.

One envelope, three endpoints

MediaLayer exposes the same request shape on /image/match, /video/match, and /audio/match: two URLs in, structured envelope out. The response shape is uniform; only matched_segments changes (empty for image, populated for video and audio).

REQUEST BODY

{
  "source_url": "https://example.com/source.{jpg|mp4|mp3}",
  "target_url": "https://example.com/target.{jpg|mp4|mp3}"
}

Reading the response

match is the convenience boolean — sensible defaults per medium. similarity_score is the source of truth; pick a threshold that matches your false-positive tolerance. matched_segments is where near-duplicate detection earns its keep: aligned start/end timestamps in seconds, so review tools can show reviewers exactly which seconds of the source map to which seconds of the target.

RESPONSE · VIDEO

{
  "match": true,
  "confidence": "high",
  "similarity_score": 0.91,
  "processing_time_ms": 1840,
  "media_type": "video",
  "matched_segments": [
    { "source_start": 0.0, "source_end": 14.8, "target_start": 2.3, "target_end": 17.1, "score": 0.93 }
  ]
}

Same call, three media types

A typical pattern: a single dedupe service that dispatches to the right endpoint based on the media type extracted from the URL. The call site is uniform; only the path changes.

PYTHON · DISPATCHER

import requests

API_HOST = "medialayer-image-audio-video-matching-api.p.rapidapi.com"
HEADERS = {
    "x-rapidapi-key": "YOUR_RAPIDAPI_KEY",
    "x-rapidapi-host": API_HOST,
    "Content-Type": "application/json",
}

EXT_TO_PATH = {
    ".jpg": "/image/match", ".jpeg": "/image/match", ".png": "/image/match",
    ".webp": "/image/match", ".gif": "/image/match",
    ".mp4": "/video/match", ".mov": "/video/match", ".webm": "/video/match",
    ".mp3": "/audio/match", ".wav": "/audio/match", ".aac": "/audio/match",
    ".m4a": "/audio/match", ".flac": "/audio/match", ".ogg": "/audio/match",
}

def match(source_url: str, target_url: str) -> dict:
    ext = "." + source_url.rsplit(".", 1)[-1].lower()
    path = EXT_TO_PATH.get(ext)
    if not path:
        raise ValueError(f"Unsupported extension: {ext}")
    r = requests.post(
        f"https://{API_HOST}{path}",
        json={"source_url": source_url, "target_url": target_url},
        headers=HEADERS,
        timeout=60,
    )
    r.raise_for_status()
    return r.json()

Choosing thresholds

There isn't one threshold that works across every domain. T&S and copyright workflows need a higher score (you're acting on someone's content). Marketplace dedupe can tolerate lower scores because near-duplicates still warrant review.

Two patterns work well in production. First: use similarity_score with a per-vertical threshold and route into review / monitor / pass lanes. Second: combine score with matched-segment duration so 'high score, 1-second overlap' doesn't auto-block a real-life incidental match.

When to graduate to one-to-many

The two-URL endpoints are great for pairwise comparisons against a small reference catalog. Past a few thousand reference assets, pairwise calls become wasteful — you're doing N comparisons to find a top-K match.

That's where Enterprise media search comes in: ingest the catalog into a similarity index, run one-to-many lookups on every new upload, and get top-K matches with scores in a single call. Same matching primitives, different access pattern. Public users stay on RapidAPI; enterprise direct API access is available after onboarding.

Endpoints used in this article

POST /image/match

Detect duplicate and near-duplicate images. Returns score, confidence, and an empty matched_segments.

See full reference →

POST /video/match

Compare two videos and return aligned matched segments with per-segment scores.

See full reference →

POST /audio/match

Match audio across re-encodes and partial reuse. Returns offset-aligned overlapping segments.

See full reference →