SONAR: Statistical Observation Network for Attestation and Reach

SONAR: Statistical Observation Network for Attestation and Reach Blockcast Inc

Berkeley CA US omar@blockcast.network https://blockcast.network

ops mboned multicast authentication coverage proof of delivery This document specifies SONAR (Statistical Observation Network for Attestation and Reach), a protocol for verifiable multicast delivery claims without trusted intermediaries. SONAR combines: (1) O(1) IP multicast efficiency versus O(N) unicast to detect cheating, (2) cryptoeconomic accountability via on-chain stake deposits, VRF-based unpredictable sampling, and blockchain attestations, and (3) ALTA-based real-time multicast authentication. SONAR separates content authentication from coverage verification: ALTA authenticates all packets with ~6% bandwidth overhead, while statistical coverage verification adds minimal overhead (320 KB challenge messages per 15-60 minute test period, 0.7-2.8 Kbps). Coverage estimation samples 0.1% of receivers using German Tank Problem inference. For privacy and cost efficiency at scale, zkSNARK proof aggregation (recommended for >1,000 sampled users) maintains O(1) on-chain verification cost, enabling populations exceeding 10^8 receivers.

Introduction Multicast distribution offers significant efficiency advantages over unicast for large-scale content delivery, reducing bandwidth costs by 99.99% or more. However, the lack of verifiable delivery mechanisms prevents widespread commercial adoption. Content providers cannot verify that infrastructure operators actually delivered content to claimed receivers, while infrastructure operators cannot prove delivery to enable billing. This bilateral trust deficit blocks the formation of liquid markets for multicast capacity. Existing multicast authentication schemes (, , ) address content authentication but do not provide per-receiver coverage proof. Per-receiver encryption defeats multicast efficiency by requiring O(N) bandwidth where N is the number of receivers. SONAR solves this problem through statistical sampling: rather than proving delivery to every receiver, SONAR proves delivery to a random sample with known statistical confidence. This enables verification of populations exceeding 10^7 receivers with constant bandwidth overhead.

Requirements Notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Terminology This document uses the following terms:

Coverage Group:: A set of receivers that can be addressed with a single multicast transmission, typically defined by geographic location or network topology.
Attestation:: A cryptographically signed statement by a receiver asserting successful reception of specific content with packet statistics.
Sample Size:: The number of receivers randomly selected to provide attestations in a given test window. Denoted as m.
Population Size:: The total number of receivers in the coverage group. Denoted as N.
German Tank Estimate:: A statistical estimator for population size based on maximum observation in a sampled sequence.
Test Window:: A time period during which packet reception is tracked for coverage verification. Duration denoted as P.
zkSNARK:: Zero-Knowledge Succinct Non-Interactive Argument of Knowledge. A cryptographic proof system enabling verification of aggregate statements with constant-size proofs.

Architecture Overview

Design Principles SONAR is designed around the following principles:

Content authentication is independent of coverage verification
Statistical sampling provides O(1) verification cost regardless of population size
Broadcast bandwidth overhead MUST be <10% to preserve multicast efficiency
Return path (user attestations) uses existing internet infrastructure
Cryptographic security complemented by economic incentives

Protocol Layers

Layer 1: Content Authentication All receivers verify content authenticity using ALTA protocol . This provides:

Non-repudiation: Content provider cannot deny transmission
Source authentication: Receivers verify authorized origin
Integrity: Content not modified in transit
Low latency: 1-10ms authentication delay

ALTA is chosen over alternatives (TESLA, AMBI) because:

No time synchronization required (vs TESLA)
Real-time authentication (vs TESLA 100-6000ms delay)
Broadcast-only (vs AMBI unicast manifests)
Strong non-repudiation (periodic Ed25519 signatures)

Bandwidth overhead: Approximately 6% for typical configurations.

Layer 2: Statistical Coverage Verification Random sample of m receivers (typically 0.1% of population) provide attestations via internet return path. Statistical inference provides population coverage estimate with confidence interval. Sample selection uses Verifiable Random Function (VRF) to prevent adversarial selection. Attestations include packet statistics enabling loss rate estimation and fraud detection. Broadcast overhead: Only sample selection message (320 KB per test). Return path overhead: Distributed across m users (128 bps per selected user).

Layer 3: Zero-Knowledge Aggregation (Recommended for m > 1,000) zkSNARK proofs SHOULD be employed when sample size m exceeds 1,000 users, primarily for privacy protection. Individual viewing patterns become correlatable on-chain, enabling de-anonymization attacks. zkSNARKs provide aggregated proof of coverage statistics while hiding individual user attestations. Additional benefits: 80-90% cost reduction via off-chain storage, constant-size verification (328 bytes regardless of m), and scalability to populations exceeding 10^8. Challenge protocol enables spot-checking of individual attestations via Merkle proof while maintaining aggregate privacy.

Content Authentication

ALTA Protocol Configuration SONAR employs ALTA with the following parameters:

Scheme Parameters:

a = 3 (backward reference interval)
p = 5 (redundancy factor)
K = 50 (signature interval)

Algorithms:

MAC: HMAC-SHA256 truncated to 128 bits
Signature: Ed25519 (64 bytes)

Packet Format Each multicast packet MUST include ALTA authentication data:

Sequence Number:: Monotonically increasing packet identifier
S (Signature Present):: 1-bit flag indicating Ed25519 signature included
Reserved:: 7 bits reserved for future use, MUST be zero
MAC Count:: Number of MACs included (typically 3-5)
Previous Packet Hash:: SHA-256 hash of previous packet for chain verification
MAC n:: HMAC-SHA256 of packet at offset (i - n*a) truncated to 128 bits
Ed25519 Signature:: Signature of packets i through i-K+1 (when S=1, every Kth packet)
Content Payload:: Application data

Total overhead calculation:

Base: 4 + 32 = 36 bytes
MACs: 16 * MAC_Count bytes
Signature (amortized): 64 / K bytes
For MAC_Count=4, K=50: 36 + 64 + 1.28 = 101.28 bytes
Percentage for 1500-byte packets: 6.75%

Verification Procedure Upon receiving packet i, receiver performs the following steps:

Verify sequence number is monotonically increasing
Compute SHA-256(packet_{i-1}) and compare with Previous Packet Hash field
For each MAC_j in packet, retrieve stored packet at offset (i - j*a)
Recompute HMAC-SHA256 for each referenced packet and compare
If S=1, verify Ed25519 signature over packets [i-K+1, i]
If all verifications pass, accept packet as authentic

Receiver MUST buffer packets until sufficient MACs received for verification (depth = p packets).

Statistical Coverage Verification

Sample Selection Protocol

VRF-Based Random Selection Content provider generates verifiable random sample using VRF to prevent adversarial selection:

Obtain blockchain randomness source (e.g., block hash at height H)
Apply VRF with content provider private key: seed = VRF_prove(sk, blockhash || session_id)
Use seed for Fisher-Yates shuffle of registered user public keys
Select first m users from shuffled list

VRF properties ensure:

Unpredictability: Adversary cannot predict selection before blockhash revealed
Verifiability: Anyone can verify selection was computed correctly
Uniqueness: Only one valid output for given input

Challenge Message Format Size calculation for m=10,000 users:

Header: 24 bytes
VRF Proof: 80 bytes
User pubkeys: 32 * 10,000 = 320,000 bytes
Signature: 64 bytes
Total: 320,168 bytes ≈ 320 KB

Broadcast frequency: Once per test period P (recommended: P = 900-7200 seconds) Bandwidth: 320 KB / P seconds

P = 900s (15 min): 356 bytes/s = 2.8 Kbps
P = 3600s (1 hour): 89 bytes/s = 0.7 Kbps

User Attestation Protocol

Attestation Message Format Selected users MUST respond within response window T_response (recommended: 60 seconds):

Packet Min/Max Observed:: First and last sequence numbers received in test window
Packets Received:: Total count of packets successfully received and authenticated
Sample Content Hash:: SHA-256 hash of concatenated payload from sampled packets (e.g., every 100th packet) to prove actual content reception
Response Timesonar:: Unix timesonar when attestation created

Total size: 160 bytes per attestation

Submission Methods

Direct Blockchain Submission Each selected user submits attestation as blockchain transaction:

Attestation payload: 160 bytes
Transaction overhead: ~40 bytes
Total: ~200 bytes per user
For m=10,000: 2 MB per test
Cost at $0.0001/tx: $1.00 per test

Advantages: Simple, immediate verification Disadvantages: High on-chain cost for large m

Aggregated Submission via zkSNARK Users submit to off-chain aggregator that creates zkSNARK proof:

Users submit 160-byte attestations to off-chain storage
Aggregator collects m attestations
Aggregator builds Merkle tree with root R
Aggregator computes aggregate statistics
Aggregator generates zkSNARK proof π
Aggregator submits {R, statistics, π} on-chain

On-chain size: 328 bytes (constant regardless of m) Cost reduction: 99.998% vs direct submission for m=10,000

Coverage Estimation

German Tank Problem Estimator Given sender transmitted N packets and user j reports packet_max_j, population size estimate: Loss rate estimate for user j: Aggregate loss rate: If packet_max_j ≈ N, user kept pace with real-time stream (multicast reception). If packet_max_j << N, user lagged significantly (potential unicast forwarding).

Confidence Intervals For sample size m from population N, coverage estimate has confidence interval: Example: N = 10,000,000 users, m = 10,000 sample, p_hat = 0.95: Minimum sample size for desired margin of error E: For E = 0.001 (0.1% margin) with p_hat = 0.95:

Zero-Knowledge Proof Aggregation Zero-knowledge proof aggregation via zkSNARKs SHOULD be employed when sample size m exceeds 1,000 users. This threshold is determined by three factors:

Privacy Protection: Individual attestations become correlatable on-chain, enabling de-anonymization attacks. For small communities (N < 100,000), users are more easily identifiable, making privacy protection critical.
Cost Efficiency: Off-chain storage via Data Anchor costs $0.00001 per attestation versus $0.0001 for direct on-chain submission. For m=1,000, this represents 80-90% cost reduction: $0.02 (zkSNARK aggregated) versus $0.10 (direct).
Constant Verification: zkSNARK proofs maintain 200-byte size and O(1) verification cost regardless of m, enabling scalability to populations exceeding 10^8.

Implementation:

User attestations sent to aggregator (off-chain)
zkSNARK proof generated in Trusted Execution Environment (TEE)
Proof verified on-chain via smart contract
Challenge protocol enables spot-checking via Merkle proofs

Merkle Tree Construction Aggregator constructs binary Merkle tree from attestations:

Collect m attestations: A_1, A_2, ..., A_m
Compute leaf hashes: L_j = SHA256(A_j)
Build tree bottom-up: H_parent = SHA256(H_left || H_right)
Compute root: R

Merkle proof for attestation A_j:

Path: Sibling hashes from leaf to root
Length: ceil(log2(m)) hashes
For m=10,000: 14 hashes * 32 bytes = 448 bytes

zkSNARK Proof Generation Aggregator generates proof π for statement S: threshold (e.g., 0.95 * N)" ]]> Public inputs: {R, m, aggregate_stats, thresholds} Witness: {A_1, ..., A_m, Merkle_paths, signatures} Proof size: Approximately 200 bytes (constant regardless of m)

On-Chain Verification Smart contract verifies zkSNARK proof:

Extract public inputs: {R, m, statistics, π}
Verify proof: valid = Verify(vk, public_inputs, π)
If valid: Accept coverage claim for m users
If invalid: Reject and slash aggregator stake

Verification cost: ~100,000 gas (constant regardless of m)

Challenge Protocol Any party may challenge aggregator by requesting Merkle proof for specific user:

Challenger submits challenge: "Prove user_j is in tree R"
Aggregator MUST respond within T_challenge (recommended: 24 hours)
Aggregator provides: {A_j, merkle_path}
Challenger verifies: MerkleVerify(A_j, path, R) and signature validity
If verification fails: Aggregator stake slashed, challenger rewarded
If verification succeeds: Challenge bond returned to challenger

Test Period Optimization

Unicast Replication Detection A malicious relay might receive content via unicast forwarding and claim multicast reception. Detection relies on bandwidth constraints: For unicast replication to B recipients: R_relay, relay must lag Lag accumulates at rate: (B * R_content - R_relay) ]]> Minimum test period for detection: Example: B=1M recipients, R_content=25 Mbps, R_relay=10 Gbps, L=0.05: Conclusion: For typical broadcast scenarios, any test period P > 1 second provides overwhelming detection certainty. Optimal P is determined by cost-benefit tradeoff, not detection requirements.

Recommended Test Periods Test period selection based on use case:

Use Case	Test Period P	Tests/Hour	Cost/Hour*	Detection Latency
Live Events	300s (5 min)	12	$12	<5 min
Prime Time TV	900s (15 min)	4	$4	<15 min
Off-Peak Content	3600s (1 hour)	1	$1	<1 hour
ISP SLA Reporting	7200s (2 hours)	0.5	$0.50	<2 hours

*Assumes m=10,000 users, $0.0001 per transaction Maximum recommended P: 7200 seconds (2 hours) Rationale: Beyond 2 hours, network state staleness reduces actionable value of coverage data.

Security Considerations

Threat Model SONAR must resist the following adversarial behaviors:

Sybil Attacks:: Attacker creates multiple fake receiver identities to inflate coverage claims. Mitigated by stake requirements and VRF-based random sampling.
Replay Attacks:: Adversary captures authenticated content and replays to different receivers. Mitigated by sequence numbers, timesonars, and hash chaining.
Man-in-the-Middle:: Intermediate node modifies content while maintaining valid authentication. Prevented by ALTA MAC chains and periodic signatures.
DoS Attacks:: Attacker floods network with fake packets to exhaust receiver buffers. Mitigated by instant ALTA authentication enabling immediate rejection.
Sample Manipulation:: Adversary attempts to influence which users are selected for sampling. Prevented by VRF unpredictability.

Economic Security Cryptographic security is complemented by economic incentives:

Stake requirements: $10-$100K deposits create accountability
Slashing penalties: Fraudulent attestations result in stake loss
Fraud detection rewards: 5-10x multiplier encourages honest reporting
Rational behavior: Cost of fraud exceeds expected benefit

Game-theoretic analysis shows honest participation is Nash equilibrium when fraud detection probability exceeds 0.001% (easily achieved through random spot checks).

Privacy Considerations SONAR reveals the following information:

Which users are registered for coverage verification (public keys on-chain)
Which users were selected for sampling (challenge message)
Aggregate statistics about selected users (loss rates, packet counts)

SONAR does NOT reveal:

Content of multicast stream (encrypted separately if needed)
Individual user consumption patterns (when zkSNARK aggregation used)
Non-selected users' reception status

Privacy Attack Vector for Small Communities: Without zkSNARK aggregation, direct on-chain attestations enable correlation attacks. For small communities (N < 100,000), attackers can:

Link public keys to known wallet addresses
Correlate viewing times with other on-chain activity
Cross-reference with geographic or demographic data

Privacy decreases inversely with community size. For N=5,000, individual identification probability exceeds 75% through cross-referencing. For N=10,000,000, crowd anonymity provides natural protection. RECOMMENDATION: zkSNARK aggregation MUST be used when sample size m > 1,000, SHOULD be used when m > 100. This protects small community viewers from de-anonymization while maintaining cryptographic coverage proof.

IANA Considerations This document requests IANA to create a new registry for SONAR message types: Registry Name: SONAR Message Types Registration Procedure: IETF Review Reference: This document Initial allocations:

Value	Description	Reference
0x01	Sample Challenge	Section 5.1.2
0x02	User Attestation	Section 5.2.1
0x03	zkSNARK Aggregated Proof	Section 6.3

References Normative References Asymmetric Loss-Tolerant Authentication Akamai Technologies Akamai Technologies Informative References Asymmetric Manifest Based Integrity Akamai Technologies Akamai Technologies Akamai Technologies

Example Deployment

NYC Television Station Scenario This appendix provides a concrete deployment example for a New York City television station broadcasting to 10 million concurrent viewers.

Network Configuration

Content Provider: NBC New York
Content: Live sports broadcast
Bitrate: 25 Mbps H.265 video
Target Coverage: 10,000,000 concurrent viewers
Geographic Area: NYC metropolitan area

SONAR Configuration Content Authentication (ALTA):

MAC algorithm: HMAC-SHA256 (128-bit)
Signature: Ed25519 every 50th packet
Overhead: 6.75% (1.69 Mbps)

Statistical Sampling:

Sample size: m = 10,000 users (0.1%)
Test period: P = 900 seconds (15 minutes)
Confidence: 95%
Margin of error: ±0.3%

Broadcast Overhead:

ALTA: 1.69 Mbps
Challenge message: 320 KB / 900s = 2.8 Kbps
Total: 1.69 Mbps (6.76%)

Cost-Benefit Analysis Per-Hour Costs:

User sampling: 10,000 users × 4 tests/hour × $0.0001 = $4.00
zkSNARK aggregation: 4 tests/hour × $0.0001 = $0.0004
Total: $4.00 per hour

Revenue Impact:

Traditional CPM (unverified): $10 per 1000 impressions
Verified CPM: $15 per 1000 impressions (50% premium)
Traditional revenue: 10M × $10/1000 = $100,000/hour
Verified revenue: 10M × $15/1000 = $150,000/hour
Additional revenue: $50,000/hour
Net benefit: $50,000 - $4 = $49,996/hour
ROI: 1,249,900%

Performance Metrics

Broadcast bandwidth: 25 Mbps + 1.69 Mbps = 26.69 Mbps
Overhead: 6.76%
Authentication latency: 1-10ms (ALTA)
Detection latency: <15 minutes
Coverage confidence: 95% (9,458,000 - 9,542,000 users)
Blockchain TPS: 0.011 (well below capacity)

Acknowledgments The author thanks Jake Holland and Kyle Rose (Akamai) for the ALTA protocol specification and insights on multicast authentication. A special thanks to Lenny Giuliano (Juniper Networks), Chris Lenart (Verizon), Neil Chatterjee (DAWN Internet) for real-world deployment experience with decentralized multicast networks and feedback on earlier revisions of this work.