sinew R2S2R

Force-from-Vision on FMB

Executive plan view - 2026-05-21

sinew team - team-lead, Researcher, SimWorker, RLWorker, VisionWorker

Deep-dive companion: r2s2r_research_report.html

The deliverable

A v2f predictor: RGBD → end-effector wrench, deployable on a real Franka Panda.

What ships

  • predictor_real_finetune.pt - DINOv2-S backbone, RGBD patch embed, 4 heads (force-direction, 6D wrench, contact-point, in-contact gate)
  • Inference at 10 Hz from 2 D405 cameras (side_left + wrist_left) on real FMB testbed
  • Trained on a sim corpus generated entirely inside the sinew stack, fine-tuned on the FMB-real insert subset

What the predictor does NOT consume

  • No F/T input - RGBD in, wrench out. Single-direction flow.
  • No proprio. No commanded action. No history beyond a per-frame RGBD tensor.

Why force-from-vision

Contact-rich manipulation - peg insertion, assembly, tool-use - depends on force feedback. Real F/T sensors are noisy, slow, and absent on most low-cost hardware.

ApproachWhat it needsLimitation
Real F/T sensorWrist-mounted load cell or Franka K_F_ext_hat_KNoisy, lagged, biased; not on most arms
Tactile gripperCustom fingers (GelSight, DIGIT, etc.)Hardware lock-in; contact-only signal
v2f predictor (sinew)RGBD cameras (already on the testbed)Direction transfers sim→real; magnitude needs real anchor

FMB upstream is a benchmark for contact-rich insertion but ships no force-prediction baseline. sinew fills that gap with a predictor trained on sim-generated (image, force) pairs and anchored on the FMB real subset.

Three building blocks, one ship artifact

Goal 1 - RL data factory PPO + curriculum 11 sub-expert checkpoints SimDist action corruption policy is NOT shipped Goal 2 - Sim dataset FmbRecorder + heavy DR read_eef_wrench_ee labels substage predicate written ~243 GB @ 224, 22k eps Goal 3 - v2f predictor RGBD → wrench (4 heads) DINOv2-S frozen, fused sim pretrain → real FT THE SHIP ARTIFACT trajectories (image, force) pairs Real Franka (downstream) deployed v2f reads RGBD, emits F/T shipped predictor FMB real subset → stage 2 FT
Goal 1 - RL data factory Goal 2 - Sim dataset Goal 3 - v2f predictor Downstream / real

Two gaps, separately handled

Visual and F/T sim-to-real gaps need different fixes. Conflating them is the easy mistake.

GapWhat it isHow sinew closes it
Visual Sim renders too clean: chi2=1.88, edges +39%, brightness skew 35-62% Heavy visual DR + Tier-1 scene authoring. Stage-2 real fine-tune absorbs residual.
F/T Real Franka K_F_ext_hat_K is noisy + biased + lagged. Sim is pristine. Noise/bias/lag model on the recorded label, not on inputs. No domain adaptation for force.

Visual gap → real fine-tune required

Fix lives in the image pipeline: heavy DR sim-side + stage-2 backbone freeze + direction-head FT. v2f stage 2

F/T gap → label-side noise only

Fix lives in the recorder: noise/bias/lag injected on label, never on input. No force-side domain adaptation. sim labels

Predictor sees clean sim images, learns to match noisy real-Franka F/T. The visual gap requires a real fine-tune; the F/T gap is handled entirely inside the sim label pipeline.

Sim environment foundation

Goal 2 - isaac_twins + isaaclab_sinew

cfg = FmbInsertionEnvCfg()  # defaults: fmb_big_demo + big_long/rect peg
                            # + side+wrist cams
env = FmbInsertionEnv(cfg)
obs, info = env.reset()

# obs keys: side_left, side_right, wrist_left, wrist_right (RGBA 720x1280),
#           q (7,), dq (7,), tcp_pose (7,),
#           tcp_force (3,), tcp_torque (3,),
#           gripper_pose (1,)

for _ in range(N):
    action = policy(obs)          # 7-vec in [-1, 1]
    obs, rew, term, trunc, info = env.step(action)

Key API surface

  • read_eef_wrench_ee(art, sensor, *, noisy, state, rng) - the sim F/T pipeline (clean + noisy paths)
  • SubstageDetector at isaac_twins/src/isaac_twins/fmb/substage.py - 5 primitive predicates, single source of truth for reward + labels
  • grab_franka_view(num_envs), SceneConfigurator.reset_episode(env_ids) - multi-env handles

How we detect substages

Goal 2 - sim_substage_detection.md + addenda Q1/Q2

Three ContactSensors cover all five primitives

  • finger_contact - antagonistic finger-force pattern → grasp closed
  • peg_contact - per-partner zero/non-zero distinguishes floor / fixture / hole. The substage-defining signal.
  • board_contact - sanity heartbeat (board should stay seated)

Predicate accuracy bounds (full-state detector)

PrimitiveFPFN
grasp1-3%2-5%
place_on_fixture2-5%5-10%
rotate5-10%10-20%
regrasp3-7%5-10%
insert5-10%10-15%

Detector is single-source-of-truth for reward gates AND recorded v2f labels. Threshold-envelope assertion + inverted-physics probes + temporal-smoothness check land pre-RL (~2 days, addenda Q1).

Sim F/T sensor pipeline

Goal 2 - sim_ft_sensor.md - replicates Panda K_F_ext_hat_K

Contact sensor world-frame F_w (3,) τ_w (3,) clean Coord transform rotate by R_world←EE F_clean_ee, τ_clean_ee EE frame · 6×6 adjoint DR noise model + bias drift (σ=0.05 N) + Gauss (σ_f=0.025 N) + LP lag τ ∼ U(20,80) ms obs/eef_force, eef_torque noisy · matches real Franka dist → policy obs · v2f wrench label obs/sim_eef_force_clean geometry-derived · zero-noise → reward · detector · direction lbl obs/sim_in_contact (bool) ‖F_clean‖ > 0.1 N noisy=False bypass clean path noisy/lagged path policy / v2f wrench input contact-gate boolean Hover any box for emphasis. Same fn, two flags → two label streams.

Two callsites, one API

  • noisy=False → reward, SubstageDetector, v2f direction-head label (geometry-derived)
  • noisy=True → policy obs, v2f wrench-head label (matches real Franka distribution)

Force sigma=0.025 N/axis, torque sigma=0.01 Nm/axis (Franka resolution /2). Per-episode bias drift sigma=0.05 N / 0.02 Nm. LP lag tau_lag = U(20, 80) ms.

RL as a data factory

Goal 1 - rl_revised_plan.md + sim_dist_review.md

ElementLocked value
AlgorithmPPO with Andrychowicz 2020 defaults; SAC fallback if PPO plateaus below 50% success
Curriculumgrasp -> +place -> +rotate -> +regrasp -> +insert; phase advance on IQM > 0.7
Checkpoint preservationEvery 50 PPO iters → 11 sub-expert checkpoints (iters 0, 50, ..., 2000)
Data-gen disturbanceAction-only Gaussian burst, ~9% noised fraction, gripper bit excluded, 2.5% never-noised
Policy mixture at data-gen50% expert + 50% from 11 sub-expert ckpts; per-env assignment persists per episode
Disturbance appliedData-gen pass only - NOT during PPO training

Yield gates (data-gen pass)

  • ≥ 1.6M (image, force) pairs total
  • ≥ 5% contact-transient frames
  • F/T magnitude coverage [0, 30] N
  • Per-cam diversity chi2 ≥ 0.3

Sim dataset shape

Goal 2 - sim_recording_spec.md + rl_revised_plan.md 4

KnobValue
Episodes5-10k insert-primitive rollouts (FMB-equivalent at 22k eps)
Camera config2mixed_rgbd (side_left + wrist_left); 3cam_rgbd for data leverage
Resolution224 (DINOv2-S patch match); 256 opt-in ablation
Storage~243 GB at 224 - ~45% of FMB upstream's 545 GB zip
SchemaFMB-RLDS parity + obs/sim_* extras; validator (3-mode autodetect)

Per-episode trajectory labels (4-way)

LabelDefinitionFracv2f use
successfulinsert.success() == True~60%high-quality direction labels
disturbedany tick with sim_action_noised == 1~55%off-policy + contact-transient diversity
failednot successful~40%off-manifold coverage; gate negatives
clean_expertsuccessful AND not disturbed AND policy=expert~1.25%held-out nominal-regime eval slice

v2f architecture (~43M params)

Goal 3 - v2f_arch.md + v2f_pipeline_revised.md - 22M trainable + 21M frozen

ComponentValue
BackboneDINOv2-S ViT-S/14, 384-d features, frozen (depth channel trainable)
Input patch embed4-channel RGBD patch embed; depth conv1 init from RGB mean (ForceSight pattern)
Cross-cam fusion2-layer transformer encoder, 4 heads, GELU
Camerasside_left + wrist_left RGBD 224x224, letterboxed

Four heads

HeadOutputLossWeight
load-bearing force-directionunit-vec 3D in EE frameL1 on coords, NaN-masked when ||F|| < 8 N1.0
6D wrenchEE-frame [f; tau], noisy-lagged labelMSE, gate-gated0.1
contact-pointper-pixel 64x64 heatmap on wrist camBCE + soft-L20.5
in-contact gatebinary probabilityBCE (FoAR pattern)0.1

"Force direction transfers sim→real; magnitude does not" (Direction Matters) - dictates head weights, freeze list, and DR-vs-no-DR split.

v2f training schedule

Goal 3 - v2f_pipeline_revised.md 3

StageDataTrainedFrozen
1: sim pretrain 1.6M (image, force) pairs, 2-cam RGBD, clean+noisy GT wrench, heavy visual DR backbone (depth ch) + all 4 heads + gate DINOv2 RGB weights
2: real fine-tune ~3-4k FMB insert real eps x ~100 steps x 2cam RGB, EE-frame F/T zero-bias-subtracted direction + gate heads only backbone, wrench head, contact-point head

Why this partition

  • Direction transfers - geometric constraint normals are sim/real-identical
  • Magnitude doesn't - real F/T has payload model error, gravity-comp residual, thermal drift
  • Contact-point doesn't have a real label - no per-pixel "contact happened here" ground truth on FMB-real
  • Backbone frozen at stage 2 - protects visual-DR features from catastrophic forgetting

Stage 1: 300 epochs, AdamW lr=3e-4 cosine + 2k warmup, bs=128, bf16. Stage 2: 30-50 epochs, lr=3e-5, no warmup. Output: predictor_real_finetune.pt - the ship artifact.

Visual sim-to-real strategy

Goal 3 - v2f_dr_spec.md + sim2real_visual_gap.md - heavy visual DR, NO dynamics DR

BucketKnobsWhere applied
Per-episode (~26) lighting, cam K, cam extrinsics, materials, placement, F/T bias+lag, mode, DR profile SceneConfigurator.randomize(step); USD writes > runtime calls
Per-frame (sim, ~6) depth dropout + Gauss + clamp + RGBD jitter, F/T additive Recording loop, before writing labeled tuple
Per-frame (train aug, ~8) brightness/contrast/hue/sat/gauss/gamma/JPEG, chromatic aberration Training dataloader (torchvision.transforms.v2)
Critical knob: depth dropout on low-texture pixels. D405 is passive stereo - textureless 3D-printed pegs give sparse depth in real but clean in sim. Without this, any RGBD head will be sim-tuned.

Five hard-required categories: lighting + color jitter + material BRDF + cam intrinsic jitter + background clutter. Tier-1 scene fixes (cable mesh, lab clutter, background plane) land alongside DR to drive chi2 from 1.88 to ~1.4.

Locked decisions (1 of 3)

LockValue
End deliverablev2f predictor (RGBD → wrench). RL policy is not shipped.
Sim F/T provenanceIsaac contact reporter via read_eef_wrench_ee. Never vision-predicted.
F/T frameEE frame end-to-end; 6x6 EE->base adjoint applied only at FMB-checkpoint boundary.
Quat order(qx, qy, qz, qw) everywhere.
Image storageBGR on disk -> RGB at parse time (FMB convention).

Locks compress design debate into known frames. Re-litigation gate: open a new beads issue, don't rewrite the lock in place.

Locked decisions (2 of 3)

LockValue
Action contract7-vec EE-delta normalized [-1, 1]; scaled +-0.06 m / +-0.25 rad / gripper bit; 10 Hz; base frame.
SimDist recipeAction-only burst, ~9% noised, gripper bit excluded, 2.5% never-noised, data-gen only.
Stage-2 real fine-tuneNon-optional. Direction + gate heads only; backbone, wrench, contact-point frozen.
Recording resolution224 (DINOv2-S patch match) is v1 default; 256 is opt-in ablation.
Camera subset (v2f)2mixed_rgbd minimum (side_left + wrist_left); 3cam_rgbd for data leverage.

Locked decisions (3 of 3)

LockValue
Noisy / clean wrench disciplineNoisy → policy obs + v2f wrench label. Clean → reward + substage detector + direction-head label.
Reward shapingPBRS with clean wrench; Phi_insert force coefficient 0.2; gates on alignment + seat depth.
Substage detector roleFull-state detector canonical for reward + recorded labels; state-only adapter for offline audit only.
RL evaluationIQM + 95% stratified bootstrap CIs (never mean/median); P(A > B) > 0.7; N=5 seeds default.
v2f primary metricMonotonic improvement post-fine-tune; aspirational direction cos-sim ≥ 0.70 global / ≥ 0.60 worst shape.

The two F/T disciplines (noisy/clean + EE-frame everywhere) are the most-touched rules - they thread through reward, recorder, predictor labels, and policy obs.

The sim-to-real visual gap

Goal 3 - sim2real_visual_gap.md - chi2=1.88, FoAR threshold for "visibly distinct" is 1.0

MetricSimRealGap
Color hist chi2 (mean 4 cams)-1.88Above FoAR chi2=1.0 "visibly distinct"
Edge density0.0560.079Real +39%
Per-channel brightness (RGB)(152, 159, 152)(94, 117, 104)Sim 1.35-1.62x brighter
Per-channel std (RGB)(40, 37, 38)(51, 56, 55)Real 30-52% wider

Per-camera χ² (worst at wrist) — FoAR "visibly distinct" threshold = 1.0

side_left
2.10
side_right
1.35
wrist_left
2.00
wrist_right
2.06
χ²=1.0 · "visibly distinct"
CamSim edgeReal edgeΔ edge
side_left0.0590.069+17%
side_right0.0490.057+16%
wrist_left0.0550.083+51%
wrist_right0.0630.105+67%

Headline: wrist cams have the worst gap because they see hand, fingers, board screws, peg layer-lines - all of which the sim does not model. This is what makes a real fine-tune non-optional and what the contact-point head freezes against.

Sim vs real - 4-camera grid

Sim vs real 4-camera grid
Figure. Side-by-side sim (top row) vs real (bottom row) at four cameras, 256. Sim is uniformly bright with monochrome peg silhouettes and no foreground hand or cables. Real has darker shading, hand visible in wrist cams, ambient lab clutter, peg surface texture.

Scene fix plan

Goal 3 - substage_and_scene_addenda.md Q3 - leverage x inverse-cost ranking

Tier 1 (~2.5 days, biggest leverage)

FixTimedelta chi2 meandelta chi2 wrist
Procedural cable mesh in wrist FOV1 day-0.05 to -0.1-0.3 to -0.5
Lab-clutter distractor spawning0.5 day-0.15 to -0.25-0.05
Background plane workshop texture0.5 day-0.1 to -0.2-0.05

Expected χ² trajectory — before → after

Pre-fix · mean
1.88
Pre-fix · wrist
~2.0
Tier 1 · mean
~1.4
Tier 1 · wrist
~1.4
Tier 1+2 · mean
~1.0-1.2
Tier 1+2 · wrist
~1.1-1.3
target ≤ χ²=1.0

Out of scope: PathTracing (3-5x render cost), hand simulation (stage-2 FT carries this), photographed bin texture (not on critical path).

Six epics, 37 tasks, ~26 wall-days floor

impl_epic_plan.md - one wave per epic; Wave 4 splits into 4a sim pretrain + 4b real fine-tune

EpicScopeOwnerWall-daysGPU-days
A Wave 1Sim surface foundationSimWorker (+ RLWorker)~40
B Wave 2RL data factoryRLWorker (+ SimWorker, Researcher)~6~4
C Wave 3Data-gen passSimWorker (+ RLWorker)~3<1
D Wave 4av2f sim pretrainVisionWorker~3~1-2
E Wave 4bv2f real fine-tuneVisionWorker (+ Researcher)~3~1
F Wave 5Real-robot eval (hardware-gated)VisionWorker (+ User)~70

Aggregate: ~7 GPU-days across Epics B + D + E; ~26 wall-days floor if teammates fully available + GPU not contested; ~45 days realistic with contention.

Dependency graph

5.3 + 5.19 substage 5.21 sim F/T sensor 5.22 1 reward 5.4 multi-env handles 5.16 + 5.6 scene + DR 5.18 + 5.22 3 SimDist 5.2 eval protocol 5.5 + 5.7 cams + schema Wave 1 sim surface SubstageDetector read_eef_wrench_ee grab_franka_view Phi_insert reward Wave 2 RL data factory PPO + curriculum 11 sub-expert ckpts CorruptedActionMapper eval protocol Wave 3 data-gen pass FmbDataRecorder NAS sync HDF5 / TFDS shards ~1.6M pairs target Wave 4 v2f training stage 1 sim pretrain branch on 5.17 A-E stage 2 real FT ship predictor Wave 5 real-robot eval predicted F/T vs real K_F_ext_hat_K --- critical path (Wave 1 -> 2 -> 3 -> 4 -> 5) --- design dependency

Epic A - Wave 1 sim surface foundation

~4 wall-days - 0 GPU-days - SimWorker (+ RLWorker)

Deliverables (11 tasks)

  1. SubstageDetector class at isaac_twins/src/isaac_twins/fmb/substage.py per spec - offline unit tests + Kit runtime smokes
  2. Threshold-envelope startup assertion + inverted-physics probes (5 per primitive) + temporal-smoothness check
  3. Bake peg_tip_local_offset attribute on each peg USD via the author script
  4. read_eef_wrench_ee(art, sensor, *, noisy, state, rng) - stateful when noisy=True, stateless when noisy=False; smoke: push peg into board, wrench grows monotonically with depth
  5. Multi-env handles: batched grab_franka_view(num_envs), SceneConfigurator.reset_episode(env_ids), observation packager
  6. Phi_insert reward function (clean wrench, PBRS form) - offline unit tests against synthetic histories
  7. Reward-decomposition logging hook - every 10K steps writes all 8 reward components + per-primitive share dict

Pass criterion: Integration smoke runs FmbInsertionEnv -> reset -> 100 steps -> reward returns correct decomposition AND read_eef_wrench_ee returns non-zero on contact in the same loop. Test suite green.

Epic B - Wave 2 RL data factory

~6 wall-days + ~4 GPU-days - RLWorker (+ SimWorker, Researcher)

Deliverables (10 tasks)

  1. Promote FmbInsertionEnv to DirectRLEnv subclass; multi-env smoke at N=4
  2. EEDeltaCorruptedActionMapper subclass with per-DOF Gaussian burst; gripper bit excluded; per-env sigma fixed at run start
  3. Sub-expert checkpoint preservation every 50 PPO iters -> 11 .pt files (iters 0, 50, ..., 2000)
  4. Curriculum scaffolding - phase advance on IQM > 0.7, eval cadence every 100K steps
  5. PPO training run - phase 1 (grasp-only), 1.5 GPU-day, N=5 seeds
  6. PPO training run - phase 2-5 (full chain), 2-3 GPU-days, full curriculum
  7. Scene Tier 1 fixes #1-3 in parallel (SimWorker): procedural cable mesh + lab clutter + workshop background
  8. Re-measure chi2 post-Tier 1 (Researcher) - target drop from 1.88 toward ~1.4

Pass criterion: 11 sub-expert + 1 expert PPO checkpoint on disk for full grasp->insert curriculum; post-Tier-1 chi2 re-measured and documented; scene fixes committed.

SimWorker scene fixes run alongside RLWorker PPO training (different repos, no contention).

Epic C - Wave 3 data-gen pass

~3 wall-days + <1 GPU-day - SimWorker (+ RLWorker)

Deliverables (7 tasks)

  1. FmbDataRecorder outer loop - burst-noise applied here (NOT during PPO training); 10-episode test corpus validated
  2. HDF5/zarr schema additions: 4 obs/sim_* keys (sim_action_noised, sim_policy_iter, sim_never_noised, sim_noise_std) + 4 episode-meta labels (successful, disturbed, failed, clean_expert)
  3. episode_uploader.py - nohup-resumable, plain FTP via curl --ftp-method nocwd, idempotent on NAS
  4. Data-gen yield eval script - reports total pairs, contact-transient fraction, F/T distribution, per-cam diversity chi2
  5. Small pilot - 100 episodes; label distribution matches expectation
  6. Full sweep - 5k-10k episodes, ~150-300 GB on NAS at 224 2mixed_rgbd
  7. Build TFDS sinew_fmb_v2f shards on A6000 - tfds.load succeeds locally

Storage budget

  • ~6 MB per 50-step episode at 224 2mixed_rgbd; 22k FMB-equivalent corpus ~243 GB (45% of FMB upstream's 545 GB zip)
  • NAS: ftp://143.248.121.169:7002/IntelligentManipulationTeam/DomrachevIvan/sinew/recordings/

Pass criterion: Sim corpus on NAS (label distribution per spec; yield eval green); TFDS shards verified loadable on A6000.

Epic D - Wave 4a v2f sim pretrain

~3 wall-days + ~1-2 GPU-days - VisionWorker

Deliverables (4 tasks)

  1. V2FPredictor model class: DINOv2-S frozen + 4-ch RGBD patch embed + 2-layer cross-cam fusion + 4 heads (~43M params, 22M trainable)
  2. Training script - AdamW lr=3e-4 cosine + 2k warmup, bs=128, 300 epochs, bf16; checkpoint save every 50 epochs; one-epoch wall-time matches budget (~7 min for 5k episodes)
  3. Stage 1 sim pretrain run on A6000 - all 4 heads trained; gate-head loss upweighted 1.5x on noised steps
  4. Per-shape stratified eval on sim test split - 9-row per-shape table, identifies worst shape, per-shape gate F1

GPU budget

  • Single A6000, bf16, batch 128
  • Stage 1: ~1-2 A6000-days for 300 epochs over 5-10k episodes
  • Held-out validation: clean_expert subset (~1.25% of corpus)

Pass criterion: predictor_sim_pretrain.pt on A6000; sim-test direction-acc > 0.85 (loose sanity bar; if not hit, pipeline is broken).

Epic E - Wave 4b v2f real fine-tune

~3 wall-days + ~1 GPU-day - VisionWorker (+ Researcher)

Deliverables (5 tasks)

  1. fmb_parse.py - filter chain + zero-bias subtract + 8N gate threshold + (qx,qy,qz,qw) + BGR->RGB + cam renames; parses 100 episodes without warning
  2. Pull FMB insert-only filtered subset - ~20-35 GB via GCS HTTPS-range; 15-30 shards; ~3000-4000 episodes; per-shape distribution recorded
  3. Stage 2 real fine-tune run on A6000 - direction + gate heads only; backbone, wrench, contact-point heads frozen; 30-50 epochs, lr=3e-5, no warmup
  4. Per-shape stratified eval on FMB-real test - direction-acc + gate F1 table; worst-shape direction-acc ≥ 0.45 (acceptable tier)
  5. Branch decision per outcome matrix A-E (team-lead + VisionWorker) - written addendum, next-epic plan adjusted

GPU budget

  • ~1 A6000-day for stage-2 fine-tune (30-50 epochs at 1/10 stage-1 lr)
  • Output: predictor_real_finetune.pt - the ship artifact

Pass criterion: Real-test direction cos-sim ≥ 0.60 global, ≥ 0.45 worst-shape (acceptable tier); aspirational ≥ 0.70 / ≥ 0.60. Outcome decision recorded.

Epic F - Wave 5 real-robot eval (hardware-gated)

~7 wall-days - 0 GPU-days - VisionWorker (+ User) - DEFERRED until hardware access confirmed

Deliverables (5 tasks)

  1. Hardware setup confirmation - Franka + 4x D405 + FMB workspace match sim layout within +-5 mm
  2. Deploy v2f predictor inference on real D405 streams at 10 Hz; publish to ROS topic / shared mem
  3. Collect 50 real rollouts with predictor running alongside real F/T ground truth
  4. Real eval bench - predicted vs real direction-acc + gate F1 + per-shape stratified report card
  5. Final results memo + decision on next steps - docs/research/final_real_eval.md

Open chunks

  • Real-robot data-collection pipeline
  • Safety envelope review (presumed via FMB testbed; safety task added if not)
  • Impedance gains tuning

Pass criterion: 50-trajectory real eval done; per-shape direction-acc table on real Franka; final memo committed.

RL eval protocol (IQM + bootstrap CIs) carries over for real-eval reporting.

Open questions and unknowns

ItemSourceTrigger
chi2 re-measurement post-DR + post-Tier-1 scene fixsim2real_visual_gap 5End of Epic B
Detector vs recorded-label cross-check (non-determinism audit)addenda Q1.3Only if non-determinism observed in recordings
Per-shape threshold ablation for substage detectoraddenda Q1.6After FMB raw arrives locally
Soft-success bonus shape (reward shaping refinement)rl_reward_design 8 #2First PPO training data
Ablations: backbone freeze depth, RGB vs RGBD, temporal stack, voxel contact-pointv2f_arch 6Epic D if sim-pretrain underperforms
Production asset root pin (S3 staging)isaac_twins-36Before any S3 staging issue resurfaces
FMB raw .npy download (545 GB) - currently 5-frame smokes onlyFMB pullRequired for full per-shape calibration; partial pull (~35 GB) is enough for Epic E

Each open item has a "when" trigger - they enter the plan on a specific event, not on a schedule. The chi2 re-measurement is the most important; it gates how aggressively we interpret v2f real-test numbers.

Compute envelope summary

EpicWall-daysGPU-daysBottleneck
A - sim surface~40SimWorker bandwidth
B - RL data factory~6~4per-seed wall time
C - data-gen pass~3<1I/O on FMB pull and NAS push
D - v2f sim pretrain~3~1-2A6000 training compute
E - v2f real fine-tune~3~1training compute
F - real eval~70real-robot access
Total (A-E)~19 wall-days floor~7 GPU-daysSimWorker queue (18 tasks)
Total (A-F)~26 wall-days floor~7 GPU-dayshardware availability for F

Cross-cutting

  • Critical path: A -> B -> C -> D -> E -> F. Each wave gates the next.
  • Intra-wave parallelism: Epic B (SimWorker scene fixes alongside RLWorker PPO), Epic E (Researcher parser alongside VisionWorker training).
  • Single A6000 covers the entire training budget.
  • Realistic with GPU contention + integration churn: ~45 days end-to-end.

End - and the deep-dive

This deck is the executive view of the sinew R2S2R plan. For the full analysis - locks, per-memo findings, validator schemas, dependency graph cross-refs, wave deliverable detail - see the long-form report:

docs/r2s2r_research_report.html

Next: Epic A kickoff (sim surface foundation). Source memos: docs/research/. Cloned references: isaac_twins/references/.