MediaLayer

Tutorial

How to detect duplicate videos using an API

Tutorials5 min read

Duplicate-video detection isn't a hash lookup — re-encoding, cropping, and partial reuse defeat exact-match approaches. This walkthrough shows how to call POST /video/match with two URLs and turn the response into an actionable match decision.

Why exact-match hashing fails on video

If you're comparing two video files byte-for-byte (or with an MD5 / SHA hash), any re-encode breaks the match. Drop the bitrate, change the container, transcode to a different codec — the hash flips, even though the human-perceptible content is unchanged.

Real-world duplicate detection has to handle: re-encoded copies (e.g., H.264 → H.265 or different CRF), aspect-ratio changes (16:9 → 9:16 with letterboxing), trimmed clips (a 15-second segment of a 60-second source), watermark adds and removes, and audio swaps. Hash-based approaches can't represent any of those.

Perceptual matching uses content-derived fingerprints — frame-level features for video, robust to encoding, plus alignment so partial overlap is detectable.

What POST /video/match returns

The MediaLayer video endpoint takes two URLs, fetches both server-side, and returns a single JSON envelope. The shape is the same one used by /image/match and /audio/match, so call sites stay uniform across media types.

  • matchBoolean decision based on a sensible default threshold for video.
  • similarity_scoreFloat in [0, 1]. Use this directly when you need a custom threshold (e.g., a stricter cutoff for auto-block lanes).
  • matched_segmentsArray of aligned start/end timestamps in seconds — the source-to-target alignment of overlapping content. Empty for image, populated for video and audio.
RESPONSE · /VIDEO/MATCH
{
  "match": true,
  "confidence": "high",
  "similarity_score": 0.91,
  "processing_time_ms": 1840,
  "media_type": "video",
  "matched_segments": [
    { "source_start": 0.0, "source_end": 14.8, "target_start": 2.3, "target_end": 17.1, "score": 0.93 },
    { "source_start": 30.5, "source_end": 38.2, "target_start": 60.0, "target_end": 67.8, "score": 0.87 }
  ]
}

Calling the endpoint

Authentication is via RapidAPI: subscribe to the MediaLayer listing, copy your x-rapidapi-key, and pass three headers on every call.

CURL
curl -X POST https://medialayer-image-audio-video-matching-api.p.rapidapi.com/video/match \
  -H "x-rapidapi-key: YOUR_RAPIDAPI_KEY" \
  -H "x-rapidapi-host: medialayer-image-audio-video-matching-api.p.rapidapi.com" \
  -H "Content-Type: application/json" \
  -d '{
    "source_url": "https://example.com/source.mp4",
    "target_url": "https://example.com/target.mp4"
  }'

Python — sync with requests

Drop-in pattern for a backend that processes uploads one at a time. For a high-throughput pipeline, use httpx with an async client and bound concurrency to your RapidAPI plan's per-second limit.

PYTHON · REQUESTS
import requests

def is_duplicate_video(source_url: str, target_url: str, threshold: float = 0.85) -> bool:
    headers = {
        "x-rapidapi-key": "YOUR_RAPIDAPI_KEY",
        "x-rapidapi-host": "medialayer-image-audio-video-matching-api.p.rapidapi.com",
        "Content-Type": "application/json",
    }
    payload = {"source_url": source_url, "target_url": target_url}

    r = requests.post(
        "https://medialayer-image-audio-video-matching-api.p.rapidapi.com/video/match",
        json=payload,
        headers=headers,
        timeout=60,
    )
    r.raise_for_status()
    data = r.json()
    return data["similarity_score"] >= threshold

Acting on matched_segments

The boolean match is fine for binary block / pass decisions, but real workflows usually want to act on overlap duration. Sum the duration of matched_segments to get total overlapping seconds, and compare that against your policy threshold.

For ownership and monetization workflows, the same calculation drives the share-revenue / hold / takedown lane. For trust-and-safety workflows, it's the difference between actioning a 1-second incidental match and a 45-second full-clip lift.

TYPESCRIPT
function totalOverlapSeconds(
  segments: { source_start: number; source_end: number }[],
): number {
  return segments.reduce(
    (acc, s) => acc + (s.source_end - s.source_start),
    0,
  );
}

// Use overlap duration, not just the boolean match flag.
const overlap = totalOverlapSeconds(response.matched_segments);
if (overlap >= 30) routeTo("review");
else if (overlap >= 5) routeTo("monitor");
else routeTo("pass");

Production checklist

  • Keep keys server-sideNever embed x-rapidapi-key in browser or mobile clients. Proxy from your backend and inject the header there.
  • Bound timeoutsVideo matching can take a few seconds for long clips. Pick a per-call timeout that matches your queue's SLA (60s is a sane upper bound).
  • Use public URLsURL validation rejects private, loopback, and cloud-metadata addresses. Make sure source_url and target_url are publicly reachable.
  • Move to one-to-manyIf you're doing N-vs-N comparisons against a growing reference catalog, the public two-URL API will get expensive fast. Switch to Enterprise media search.

Ready to wire it in?

Subscribe on RapidAPI to call the public API on your own key, or talk to MediaLayer AI Labs about enterprise direct API access.