This report is the unified research synthesis for the sinew project — a real-to-sim-to-real (R2S2R) pipeline whose end deliverable is a video-to-force (v2f) predictor that maps RGBD frames to an end-effector wrench on the FMB benchmark. The framing the project converged to is that v2f is the ship artifact, sim contact reports are the direct training signal (not vision-predicted), and reinforcement learning serves as a data factory that produces diverse trajectories for v2f training. The body of the report covers the locked decisions, per-goal findings from 21 research memos, an implementation sequencing plan over five waves, and the small set of open questions deferred to the next epic.
read_eef_wrench_ee API. Never vision-predicted during training. Real Franka noise / bias / lag is modelled on the recorded label, not on inputs — the two-gap separation (visual gap closed by DR, F/T gap closed by label noise) is load-bearing throughout.These are the deliverable from the research epic. Every row is an irreversible choice that downstream impl tickets honor without re-litigating.
| Lock | Locked value | Origin |
|---|---|---|
| End deliverable | v2f predictor (RGBD → wrench). The RL policy is not shipped. | reframe; project-v2f-is-end-goal |
| Sim F/T provenance during training | Isaac contact reporter via read_eef_wrench_ee. Never vision-predicted. | sinew-5.21 |
| Two-gap separation | Visual gap → heavy visual DR (no dynamics DR). F/T gap → noise/bias/lag on the recorded label, not the input. | sinew-5.13, sinew-5.16, sinew-5.21 |
| Noisy vs clean wrench usage | Noisy → policy obs + v2f wrench-head label. Clean → reward + substage detector + direction-head label. | sinew-5.21, sinew-5.22, sinew-5.23 |
| BC warmstart | DROPPED. χ²=1.88 visual gap invalidates BC-from-real-demos. | sinew-5.22 §2 (supersedes sinew-5.12) |
| Stage-2 real fine-tune | Non-optional. Direction + gate heads only; backbone, wrench head, contact-point head frozen. | sinew-5.16, sinew-5.23 §3.2 |
| Disturbance recipe | SimDist action-only burst noise applied at data-gen time only (not during PPO training). Per-DOF σ ∈ [0.02, 0.30] m / [0.02, 0.40] rad; gripper bit excluded; 2.5% never-noised envs. | sinew-5.18, adopted by sinew-5.22 §3 |
| Reward Φ_insert v2 | Adds -0.2 · ||f_clean|| gated on d_xy < r_align AND d_z < z_align. No "degenerate-when-zero" branch; force signal is reliable. | sinew-5.22 §1 (supersedes sinew-5.1 §3.5) |
| Direction is the load-bearing head | L1 on unit-vec coords, λ=1.0. Wrench-magnitude head λ=0.1. Direction Matters recipe. | sinew-5.9, sinew-5.23 §1 |
| Force / quat / image conventions | F/T in EE frame everywhere. Quat (qx, qy, qz, qw). Images BGR on disk → RGB at parse time. side_{left,right}, wrist_{left,right} camera names. | sinew-1.5, sinew-5.7 |
| Action contract | 7-vec EE-delta normalized [-1, 1], scaled ±0.06 m / ±0.25 rad / gripper bit, @ 10 Hz, base frame, Plan A DifferentialIK at panda_hand. | sinew-2 + sinew-3 |
| RL evaluation | IQM + 95% stratified bootstrap CIs (never mean/median); P(A>B) > 0.7 to claim improvement; N=5 seeds default at {0, 7, 42, 314, 2718}. | sinew-5.2 (rliable) |
| Recording resolution | 224² (DINOv2-S patch match) is the v1 default; 256² is opt-in ablation. | sinew-5.6 |
| Camera subset (v2f) | 2mixed_rgbd minimum (side_left + wrist_left); 3cam_rgbd for data leverage. | sinew-5.5 winner |
| FMB trajectory replay reliability | Not reliable in current sim. Substage verification falls back to a state-only adapter (calibration-only, never in the training loop). | sinew-5.20 (honest negative) |
FmbInsertionEnv runs end-to-end on a 7-vec EE-delta action layer (Plan A DifferentialIK), gripper drives are live, 5/5 tests green.docs/research/; 3 reference code repos under isaac_twins/references/.sinew is the Real-to-Sim-to-Real (R2S2R) force-from-vision project anchored on the FMB (Functional Manipulation Benchmark) testbed at IRIS Lab. The hardware target is a real Franka Emika Panda with four Intel RealSense D405 cameras (two side, two wrist) running contact-rich single-object insertion. Three sibling repositories host the work:
isaac_twins/ | Isaac Sim digital twin (Python 3.12, isaacsim 6.0). Owned by SimWorker. Scenes, FMB asset spawning, articulation control, cameras, USD baking, replay. Public API is single-env stable; multi-env vectorization is impl follow-up. |
isaaclab_sinew/ | Isaac Lab RL training repo (pixi-managed Python 3.12 + isaacsim 6.0 + Isaac Lab v3.0.0-beta). Owned by RLWorker. Wraps isaac_twins in a gymnasium.Env today; promotes to DirectRLEnv once the batched articulation handle lands. |
sinew/ | Orchestration root. Cross-cutting docs under docs/, beads issue tracker under .beads/, shared references under isaac_twins/references/. |
| Goal | Role under the reframe | Validation criterion |
|---|---|---|
| Goal 1 — RL data factory | Train a PPO policy in sim that solves the FMB insertion task well enough to produce diverse trajectories. Curriculum: grasp → +place → +rotate → +regrasp → +insert. Sub-expert checkpoints from every 50 iters are kept for data-gen mixture. The policy is not the deliverable. | Data-gen yield: ≥ 1.6M (image, force) pairs, ≥ 5% contact transient frames, force coverage [0, 30] N, per-cam diversity χ² ≥ 0.3. |
| Goal 2 — Sim dataset | Record sim rollouts with full FMB-RLDS-parity observations plus sim ground truth: clean and noisy EE wrench, per-pair contact info, peg/board world poses, per-camera intrinsics/extrinsics. Per-episode DR. Substage labels written from the canonical detector. NAS pipeline to ferry zarr/HDF5 to the DL_A6000 for training. | 5–10k insert-primitive rollouts (~243 GB at 224²), schema 1:1 with FMB RLDS + namespaced obs/sim_* extras, validator clean. |
| Goal 3 — v2f predictor | Train an RGBD → wrench predictor with three task heads (contact-point on wrist cam, force direction unit-vec, 6D wrench) plus an in-contact gate. Frozen DINOv2-S backbone, multi-view fusion. Two-stage training: sim pretrain (all heads) → real fine-tune (direction + gate only). | Direction cos-sim on real, tiered: aspirational ≥ 0.70 global / ≥ 0.60 per-shape min; acceptable ≥ 0.60 / ≥ 0.45 with +0.05 fine-tune lift; soft-fail → per-shape ensembles; hard-fail → halt and audit. Primary metric is monotonic improvement until χ² is re-measured post-DR. |
The single most important architectural insight from the reopened-epic wave is that the visual sim2real gap and the F/T sim2real gap are different problems with different fixes. Conflating them was the bug in the original plan.
| Gap | What it is | How sinew closes it |
|---|---|---|
| Visual gap | Sim renders too clean: χ²=1.88, edge density +39% in real, brightness skew 35–62%. Wrist cams worst (hand, cables, peg layer-lines). | Heavy visual DR (sinew-5.13) plus Tier-1/Tier-2 scene authoring fixes (sinew-5.16 + addenda Q3). Stage-2 real fine-tune absorbs what can't be simulated. |
| F/T gap | Real Franka K_F_ext_hat_K is noisy + biased + lagged. Sim contact reporter is pristine. |
Noise/bias/lag model on the recorded label (sinew-5.21). Predictor sees clean sim images and learns to match noisy real-Franka F/T. No domain adaptation for force. |
Answers to the lead's original 12-question intake are integrated throughout §§4–6 of this report. A separate Q&A section is not maintained; each question is addressed in the section where its findings live.
The setup epic sinew-1 took the project from a sim with no Franka articulation to a closed-loop Isaac Lab environment that ingests a 7-vec EE-delta action and returns FMB-shaped observations. All five children closed.
| Issue | Owner | Deliverable |
|---|---|---|
| sinew-1.1 | Researcher | docs/researcher.md + docs/fmb_reference.md: FMB benchmark deep-dive + force-from-vision landscape v1. |
| sinew-1.2 | SimWorker | docs/sim_worker.md: isaac_twins audit (API surface, known limitations, perf benchmarks). |
| sinew-1.3 | RLWorker | isaaclab_sinew/ bootstrap (pixi + IsaacLab v3.0.0-beta + FmbInsertionEnv skeleton). |
| sinew-1.4 | SimWorker | Re-baked three USDs for Isaac Sim 6.0 asset paths after the based_robotics repo rename. Gripper-unpin landed as a side effect. |
| sinew-1.5 | Researcher | fmb_reference.md §3.1: pin EE frame as canonical wrench convention; 6×6 EE→base adjoint applied only at the FMB-checkpoint boundary. |
| Issue | Owner | Deliverable |
|---|---|---|
| sinew-2 | RLWorker | docs/rl_action_layer_sketch.md: 7-vec EE-delta action layer design (Plan A DifferentialIK, Plan B serl-impedance contingency). |
| sinew-3 | RLWorker | EEDeltaActionMapper class wired into FmbInsertionEnv. Jacobian sourced via art._articulation_view.get_jacobians() at panda_hand. 5/5 tests green. |
| sinew-4 | RLWorker | Env _get_observations bug fix: frames[cam] for single-env is ndarray, not list[ndarray]. Dropped stale [0] index; documented the multi-env migration path. |
| isaac_twins-36 | SimWorker (backlog P3) | Pin production asset root explicitly — staging-S3 is currently 200-OK but transient. |
| isaac_twins-37 | SimWorker (backlog P3) | Bake peg_tip_local_offset attribute on each peg USD at author time. Required by the insert predicate. |
cfg = FmbInsertionEnvCfg() # defaults: fmb_big_demo + big_long/rect peg + N=1 + side+wrist cams
env = FmbInsertionEnv(cfg)
obs, info = env.reset()
# obs keys: side_left, side_right, wrist_left, wrist_right (RGBA 720x1280),
# q (7,), dq (7,), tcp_pose (7,), tcp_force (3,), tcp_torque (3,), gripper_pose (1,)
for _ in range(N):
action = policy(obs) # 7-vec in [-1, 1]
obs, rew, term, trunc, info = env.step(action)
Today tcp_force and tcp_torque are zero-tensor placeholders. The Wave-1 impl deliverable read_eef_wrench_ee (sinew-5.21) replaces them; reward and substage detector consume the noisy=False branch, policy obs and v2f labels consume the noisy=True branch.
Six research deliverables shape the RL workstream: evaluation protocol, substage detection (with gate verification per the addenda), reward design v2 with clean wrench, the historical BC record, the SimDist disturbance recipe, and the four trajectory-label classes the recorder writes.
The R2S2R compute budget forces a permanent few-run regime (3–10 seeds per config). Per rliable §4, point estimates at this seed count carry only 50–70% probability of being real improvements. The protocol stays defensible by reporting interquartile mean (IQM) with stratified bootstrap CIs.
| Choice | Value | Why |
|---|---|---|
| Aggregate metric | IQM + 95% stratified bootstrap CIs | Mean is outlier-dominated; median is zero-on-half-tasks insensitive; IQM trims top + bottom 25%. |
| Improvement claim | P(A > B) > 0.7 from rliable | P ∈ (0.5, 0.7] inconclusive at N=5; P ≤ 0.5 no-difference or regression. |
| Seed budget | N=5 default at {0, 7, 42, 314, 2718}; bump to N=10 only on decision-gating overlap | Selection-bias safe; disaster-stopped runs still count as seeds. |
| Per-seed score | AUC of per-step eval-return curve (Andrychowicz §2) | Rewards data-efficient policies, not just final return. |
| Every | Metrics |
|---|---|
| 10K control steps | train/return_mean, policy_loss, value_loss, entropy, grad_norm, kl_div, explained_variance, env/safety_clip_count, sys/sps |
| 100K control steps | eval/return_mean (100 episodes, stochastic policy), eval/success_rate, eval/tcp_to_peg_dist_mean |
| Choice | Value | Choice | Value |
|---|---|---|---|
| clip ε | 0.25 | γ | 0.99 |
| GAE λ | 0.9 | activation | tanh |
| policy MLP | 2 × 64 | action transform | tanh (NOT clip) |
| value MLP | 2 × 256 | initial action std | 0.5 |
| obs norm | YES (crucial) | advantage norm | per-minibatch |
| value loss clip | NO (hurts) | optimizer | Adam, lr=3e-4 |
| Top "surprising" finding: last policy-layer init 100× smaller. | |||
SAC is the fallback only if PPO plateaus below 50% success after the 5-seed pass.
Per sinew-5.22 §5 the eval surface forks into two streams:
FMB upstream has no automated substage detection — the operator hits Enter to advance primitives at rollout time (sequential_rollout.py:250). Sim has full ground truth, so deterministic detectors are tighter than real; deltas are documented intentional.
Three Isaac Lab ContactSensor instances using filter_prim_paths_expr cover all five FMB primitives plus transitions:
finger_contact — sense = panda_(left|right)finger, filter = peg. Antagonistic finger-force pattern → grasp closed.peg_contact — sense = peg, filter = [board, fixture, both fingers, Bin]. This is the substage-defining signal: per-partner zero/non-zero distinguishes "peg on floor" / "peg on fixture" / "peg in hole".board_contact — sense = board, filter = Bin. Sanity heartbeat — board shouldn't tip during insertion.| Primitive | Success predicate |
|---|---|
grasp | antagonistic finger contact (force dot < -0.7), F_grip > 1.0 N, peg z > 5 cm above bin, no slip ≥ 83 ms (10 physics ticks @ 120 Hz) |
place_on_fixture | peg↔fixture force > 0.3 N, peg↔fingers force = 0 (released), peg vel < 5 mm/s, peg z in fixture-height window |
rotate | peg long-axis rotation > 90° from entry, peg-on-fixture maintained, ±15° verticality. Axisymmetric pegs auto-pass. |
regrasp | grasp.success + peg long-axis vertical ±15° |
insert | peg↔board force > 0.3 N, peg-tip z within ±3 mm of hole bottom, peg xy within 5 mm of hole center, verticality ±20°, stable 10 steps |
The SubstageDetector class lives at isaac_twins/src/isaac_twins/fmb/substage.py. It exposes {p}_success(), {p}_failure(), transition_ready(p) (success AND TCP at z_safe ≥ 0.20 m), and diagnostics(). Reward authors never recompute distances — the detector is single source of truth, eliminating reward-vs-eval drift by construction.
Beyond the three core tests already specced in sinew-5.3 §6 (offline unit, runtime smoke, negative case), four additional checks are ranked by ROI. The recommendation lands three before RL kickoff (~2 days total):
| # | Check | Cost | What it catches | Verdict |
|---|---|---|---|---|
| 1.2 | Threshold-envelope startup assertion | 1 hr | Silent misconfig (e.g. F_grip_min tuned to 100 N during debug and forgotten) | land pre-RL |
| 1.5 | Inverted-physics sanity probes (5 per primitive, "should say False") | 1 day | FP catches: zero-force fake grasps, one-sided pegs, jammed-not-lifted, halfway-not-seated | land pre-RL |
| 1.4 | Temporal smoothness flag (predicate flips > 3× per primitive window) | 2 hr | Sensor-noise artifacts masquerading as substage transitions | land pre-RL |
| 1.3 | Detector ↔ recorded-label cross-check (two implementations compared) | 0.5 day | Hidden state / non-determinism in detector across calls | defer unless non-determinism observed |
| 1.6 | Per-shape threshold ablation | 1 day | Per-peg-size FP/FN drift | defer until FMB raw arrives |
Landing the threshold-envelope + inverted-physics + temporal-smoothness checks lifts predicate confidence high enough to trust the recorded obs/sim_substage_predicate as a v2f gate-head training label.
| Primitive | Full-state detector (canonical) — FP | FN | State-only fallback — P | R |
|---|---|---|---|---|
grasp | 1–3% | 2–5% | 0.90–0.95 | 0.85–0.95 |
place_on_fixture | 2–5% | 5–10% | 0.75–0.85 | 0.80–0.90 |
rotate | 5–10% | 10–20% | 0.40–0.60 | 0.50–0.70 |
regrasp | 3–7% | 5–10% | 0.80–0.90 | 0.75–0.90 |
insert | 5–10% | 10–15% | 0.60–0.75 | 0.65–0.80 |
The gap is big — full-state has 1–15% error, state-only has 5–50%, with rotate and insert worst because peg orientation isn't observable from state alone.
Recommendation: option (d) for the training loop, (b) for offline audit. Full-state detector is canonical for reward + recorded labels. State-only adapter is offline calibration only — never enters training. Mixing them as a confidence-weighted auxiliary loss (option a) would inject the state-only 5–50% error into the reward gradient, correlated with the actual physics on the weak primitives (rotate, insert) — exactly the wrong shape for reward shaping. State-only's only role is flagging audit-worthy disagreements (option b) and measuring sim-bias when state-only is the proxy for FMB-real labels at stage 2 (option c).
Composition: dense potential-based shaping + sparse substage bonus + small action regularizer. Per-primitive activation — only the current primitive's reward contributes per tick.
R_p(s, a, s') = λ_success · 1[p_success(s')] # sparse terminal
- λ_failure · 1[p_failure(s')] # sparse anti-terminal
+ γ · Φ_p(s') - Φ_p(s) # dense potential-based (Ng 1999)
- λ_action · ||a[:6]||² # small action regularizer
- λ_clip · Δsafety_clip_count # safety-box pressure
PBRS form preserves the optimal policy (Ng 1999): cumulative dense return ≈ Φ(final) − Φ(initial), so the policy cannot bank arbitrary shaping reward.
Under the reframe sim F/T is a real, reliable signal via read_eef_wrench_ee(art, noisy=False). The v1 "degenerate-when-F/T-zero" branch is dropped — clean wrench is non-zero only in real contact, so the force term naturally degenerates.
d_xy = ||peg_tip_xy - hole_xy||
d_z = max(0, peg_tip_z - hole_z_bottom)
align = 1 - cos²(peg_long_axis, hole_axis)
f_clean = read_eef_wrench_ee(art, sensor, noisy=False)[:3] # (3,) clean N, EE frame
f_mag = ||f_clean||
Φ_insert(s) = -d_xy
- 1.5 · d_z · 1[d_xy < r_align_xy]
- 1.0 · align · 1[d_z < z_align_thresh]
- 0.2 · f_mag · 1[d_xy < r_align_xy AND d_z < z_align_thresh]
Coefficient lifted 0.1 → 0.2 because the signal is now reliable. The force-term gate ensures we only penalize contact force when we should be making controlled contact (inside the alignment and seat-depth window); outside that window force is exploration cost and is not penalized.
| Constant | Value | Rationale |
|---|---|---|
| λ_success | 20.0 | Dominates dense return (≤5 per primitive) by 4× |
| λ_failure | 5.0 | Smaller than success; eval cares about success rate |
| γ (PBRS) | 0.99 | Matches PPO discount; Ng's theorem requires the same γ |
| λ_action | 0.001 | Tiny — just enough to break "wave arm around" ties |
| λ_clip | 0.5 | 50-tick all-clipped trajectory costs 25 = exceeds λ_success |
place_on_fixture.rotate, + regrasp, + insert one at a time.The original plan (sinew-5.12) recommended Option C: BC warmstart → PPO/SAC fine-tune using the 22,550 FMB demos. The χ²=1.88 visual gap (sinew-5.16) invalidated the precondition that FMB-real images are drop-in compatible with sim-camera obs. Three options were considered post-reframe:
| Option | Trade-off | Verdict |
|---|---|---|
| (a) State-conditioned-only BC (drop image obs) | No visual gap; throws away ~95% of FMB demo signal (images dominate the input dimension). State-only policy can't represent visual feature dependencies. | reject |
| (b) BC on sim-rendered FMB-replay images | BC sees sim-distribution images directly. Requires sinew-5.20 replay to be production-ready — it is not. | reject |
| (c) Drop BC entirely | PPO from scratch is harder. Under the reframe (RL = data factory, policy not deployed) "task hard for pure-RL" is tolerable; ~10–20% wall-clock lost, scope clarity gained. | decision |
Curriculum + sub-expert mixture compensate for losing BC. The 22,550 FMB demos are kept on the project shelf for future real-robot work but do not enter the sinew RL training loop. The four data-side issues from the original sinew-5.12 spec (frame convention, quat order, F/T mismatch, BGR-on-disk) remain documented as a reference for any future BC revival.
The only piece of SimDist sinew adopts is the action-only burst noise recipe. The latent world model + MPC planning machinery is out of v2f scope.
| Noise scope | Action only. No push/wrench, no observation noise at data-gen. |
| Per-env σ draw | Once per env at run start (fixed for that env's entire run), sampled from U[σ_min, σ_max]. |
| Burst pattern | On for 1–50 control steps, off for 25–500, alternating. Net ≈ 9% noised time fraction. |
| Never-noised fraction | 2.5% of envs run completely clean — produces the clean_expert trajectory label. |
| Policy mixture | 50% expert + 50% from 11 sub-expert PPO checkpoints (iter 0, 50, …, 2000), re-rolled per env at reset. |
| Per-DOF σ_max | Translation 0.30 (5× per-step limit → saturating perturbation that generates contact transients), rotation 0.40. |
| Gripper bit | Excluded. Flipping mid-burst drops the peg. |
| When applied | Data-gen pass only. Not during PPO training — cleaner reward attribution. |
| Recording | Every step recorded regardless of noise state. Per-env noise flag becomes an HDF5 column. |
Adoption cost is ~2 days: an EEDeltaCorruptedActionMapper subclass that wraps the existing mapper (~0.5 d), the burst-state machine, and the HDF5 schema additions (sim_action_noised, sim_policy_iter, sim_never_noised).
The recorder writes four orthogonal per-episode labels so the v2f trainer can stratify by data quality at HDF5 load time. Predicted fractions per a 1.6M-pair data-gen pass:
| Label | Definition | Predicted fraction | v2f use |
|---|---|---|---|
successful | terminal substage insert.success() == True | ~60% | high-quality direction labels |
disturbed | any tick with sim_action_noised == 1 | ~55% | off-policy + contact-transient diversity |
failed | ¬successful | ~40% | off-manifold (image, force) coverage; negative gate samples |
clean_expert | successful AND ¬disturbed AND policy==expert | ~1.25% | held-out "nominal-regime" eval slice |
Cross-tabulated: ~30% successful + clean, ~30% successful + disturbed, ~25% failed + disturbed, ~14% failed + clean. The trainer's default behaviour is to ignore the flags and train on the full pool (the distribution is already mixed); gate-head loss optionally upweights noised steps by 1.5× because contact-transient frames carry the cleanest gate signal.
Seven research deliverables shape Goal 2: the camera subset bench, the unified GT recording spec with the sim F/T sensor, FMB↔sim data matching, the FMB-replay honest negative, the NAS pipeline, the scene visual-gap mitigation plan, and the parallel-development architecture.
48-cell sweep: 8 camera subsets × {RGB, RGBD} × N ∈ {1, 4, 8}. Mixed-render steps/s:
| Config | N=1 | N=4 | N=8 | Note |
|---|---|---|---|---|
| phys-only baseline | ~1700 | ~1200 | ~860 | invariant of cam subset — physics cost dominates above N=4 |
| 1cam_rgb (any) | ~1450 | ~880 | ~540 | 1side ≈ 1wrist at every N — informational not perf choice |
| 2cam_rgb (any pair) | ~1000 | ~550 | ~370 | 2side/2wrist/2mixed all equal cost |
| 3cam_rgb | ~750 | ~360 | ~225 | linear-ish drop |
| 4cam_rgb | ~580 | ~250 | ~146 | 3 → 4 cam is the perf cliff (1.5× drop at N=8) |
| 5cam_rgbd (4× D405 + overview) | ~480 | ~155 | ~75 | 0.62× realtime at N=8 — A6000 needed for N=16+ |
| Consumer | Choice | Why |
|---|---|---|
| RL training bootstrap | 1wrist_rgb | Cheapest with contact view; 1700 / 1200 / 860 steps/s @ N=1/4/8 |
| v2f data-gen (locked) | 2mixed_rgbd minimum (side_left + wrist_left), 3cam_rgbd for data leverage | Direction-from-vision wants depth; cross-view fusion needs ≥ 2 cams |
| Multi-env RL training | 2mixed_rgb or 3cam_rgb | Stay below the 4-cam cliff |
| Recording / replay videos | 5cam | Offline replay only; never for training |
RGBD costs +5–15% over RGB at fixed cam count — cheaper than the cliff above. The bench appendix documents the nohup + resumable-driver pattern that the NAS recorder inherits.
Replicates Panda's K_F_ext_hat_K in four stages. New public API: read_eef_wrench_ee(art, contact_sensor, *, noisy, state, rng) → dict.
contact sensor coord transform DR noise model output
│ │ │ │
│ world-frame net force │ rotate by R_world_EE │ + bias + Gauss + lag │
┌───▼───────────┐ EE-frame ┌▼─────────────────┐ ┌▼──────────────────┐ │
│ clean F_w (3,)│ ── world→EE ─│ F_clean_ee (3,) │ ──▶│ noisy_lagged_ee │──▶│ obs/eef_force
│ clean τ_w (3,)│ adjoint │ τ_clean_ee (3,) │ │ (3,) + (3,) │ │ obs/eef_torque
└──────────────-┘ └──────────────────┘ └───────────────────┘ │
│ │
├──▶ obs/sim_eef_force_clean (3,) │
├──▶ obs/sim_eef_torque_clean (3,) │
│ │
▼ │
‖F_clean‖ > 0.1 N ───▶ obs/sim_in_contact (bool)
| Caller | Wrench used | Why |
|---|---|---|
| Reward Φ_insert | noisy=False | Deterministic gradient; gate must be exact |
| SubstageDetector | noisy=False | Sim-internal gates; predicates must be sharp |
Env policy obs (tcp_force, tcp_torque) | noisy=True | Match real Franka deployment distribution |
| v2f wrench-head label | noisy=True | Predictor learns to match real noisy F/T |
| v2f direction-head label | noisy=False → f/||f|| | Noising rotates the unit vector — corrupts geometry |
Gate label sim_in_contact | noisy=False, threshold 0.1 N | Deterministic, never noised; shifts to 8 N at real stage-2 |
| Force additive Gaussian (per axis) | σ_f = 0.025 N (Franka 0.05 N resolution / 2) |
| Torque additive Gaussian (per axis) | σ_τ = 0.01 Nm (Franka 0.02 Nm resolution / 2) |
| Per-episode bias drift | σ_bias_f = 0.05 N; σ_bias_τ = 0.02 Nm (constant within episode) |
| 1st-order low-pass lag | τ_lag = U(20, 80) ms per episode, discrete IIR @ 10 Hz |
| Scaling mode | {constant, scaled} opt-in; default constant for v1 corpus |
The recorder writes FMB-RLDS-parity keys (images, joint_pos/vel, eef_pose/vel/force/torque, action, primitive, language) plus namespaced obs/sim_* extras for v2f training:
obs/sim_eef_force_clean, obs/sim_eef_torque_clean — pre-noise labels paired with the noised obs/eef_force/obs/eef_torque.obs/sim_in_contact — gate label from contact reporter.obs/sim_contact_point_local, obs/sim_peg_local_axis — contact-point head targets in peg-local frame.obs/sim_force_dir_ee — direction-head target, unit-vec EE frame.obs/sim_substage_predicate — per-primitive boolean vector from the detector.obs/sim_dr_profile_blob — JSON of per-episode DR knobs (replay reproducibility).obs/sim_action_noised, obs/sim_policy_iter, obs/sim_never_noised — SimDist columns.episode_metadata/{successful, disturbed, clean_expert} — trajectory labels.obs/sim_eef_force_clean; apply DR noise (per-frame Gaussian + per-episode bias + per-episode lag) → write obs/eef_force.| Format | Role | Why |
|---|---|---|
zarr | live recording intermediate | Append-fast, concurrent-writer friendly, atomic .partial → .zarr rename, schema evolution = mkdir |
sinew_fmb_strict TFDS builder | FMB-canonical, sim extras stripped | Mixed sim+FMB training without schema fork |
sinew_fmb_v2f TFDS builder | Everything (sim extras + labels) | Predictor training |
| HDF5 | rejected | Concurrent-writer fragility, NAS-unfriendly file locking |
| Config | per 50-ts ep | 22k-FMB-equivalent corpus |
|---|---|---|
| 2mixed_rgbd @ 224² (v1 default) | ~6 MB | ~243 GB |
| 2mixed_rgbd @ 256² (FMB-faithful, opt-in) | ~8 MB | ~316 GB |
| 3cam_rgbd @ 224² | ~9 MB | ~310 GB |
| 4cam_rgbd @ 224² | ~12 MB | ~400 GB |
224² matches DINOv2-S patches (16×14 = 224), saves train-time resize and ~23% disk. Even the v1 default (~243 GB) is ~45% of FMB upstream's 545 GB single-object zip.
Three FMB schemas exist (raw .npy, RLDS, live gym env). The schema work locks one canonical sim record that's drop-in compatible with FMB's RLDS while adding sinew ground truth via the obs/sim_* prefix.
| Key | Shape | Purpose |
|---|---|---|
obs/sim_t_ns, obs/sim_ctrl_step_idx, obs/sim_cam_capture_t_ns/<view> | scalars | Timestamps — FMB stores none; sim emits so sim↔real can be cross-checked |
obs/sim_contact_wrench_ee | (6,) float32 | GT wrench in EE frame for the wrench head |
obs/sim_cartesian_contact | (6,) bool | Per-Cartesian-dim contact — the FoAR-style gate label |
obs/sim_peg_pose_world, obs/sim_board_pose_world | (7,) each | Geometric GT for contact-point head (project contact line into wrist-cam pixel) |
obs/sim_cam_intrinsics/<view>, obs/sim_cam_extrinsics/<view> | (3,3) + (4,4) × N | K matrix + T_world_cam for the projection above |
obs/sim_jacobian, obs/sim_gripper_dist, obs/sim_seed, obs/sim_randomization_id | various | Diagnostics + replay reproducibility |
libfranka's franka::RobotState exposes ~30 channels FMB ignores. High-value ones recorded under obs/sim_*:
tau_J — direct link-side torque, better SNR than the external wrench estimatortau_ext_hat_filtered — low-pass filtered external torquecartesian_contact — per-Cartesian-dim contact bit (the load-bearing FoAR gate label)O_T_EE_d — last commanded EE pose (reveals controller tracking lag)time — strictly monotonic libfranka clock| FMB upstream | sinew canonical | Physical mount |
|---|---|---|
side_1 | side_left | workspace −X edge (robot's left) |
side_2 | side_right | workspace +X edge |
wrist_1 | wrist_left | wrist-mount slot L |
wrist_2 | wrist_right | wrist-mount slot R |
Caveat: the wrist L/R mapping is a sinew choice — FMB upstream binds arbitrary serials. If real-Franka eval shows mirrored-wrist artifacts (policy reaches the wrong way), the fix is to flip the wrist mapping and retrain, not look for a bug elsewhere.
Verdict: cannot reliably prove end-to-end FMB grasp+insert replay in current sim. This is the load-bearing reason substage verification falls back to a state-only adapter (§4.2 Q2), and the reason BC option (b) "BC on sim-rendered FMB-replay images" was rejected (§4.4).
Four structural blockers, none individually trivial:
.npy (545 GB) not downloaded; only 5-frame cached smokes exist locally.Mechanism smoke confirmed sim infra works end-to-end: scene builds, physics settles, gripper actuates 0.08 → 0.0002 m closed. Predicted yield shape-by-shape: ~30–50% best case, < 10% for asymmetric shapes. There's a ~4-day path to reliable replay if needed (proper IK in the replay loop, peg-pose forcing, board re-mesh, FMB raw pull), but it's not required for the v2f-end-goal pipeline because substage verification falls back to the state-only adapter for offline calibration.
local PC (4070 Ti SUPER, 16 GB) NAS (143.248.121.169:7002, ftp) DL_A6000 (24 GB+)
┌─────────────────────┐ ┌──────────────────────────┐ ┌────────────────────┐
│ FmbRecorder │ │ /IntelligentManipulation │ │ pixi env │
│ → zarr/episode_* │ FTP (curl) │ Team/DomrachevIvan/ │ FTP │ → TFDS reader │
│ → episode.zarr. │ ─push────────► │ sinew/recordings/ │ ◄─pull──── │ → train_v2f.py │
│ tar │ │ 2026-05-21/seed_07/ │ │ │
│ episode_uploader.py │ │ sinew/tfds/ │ │ │
└─────────────────────┘ └──────────────────────────┘ └────────────────────┘
| Choice | Value | Why |
|---|---|---|
| Protocol | plain FTP, curl --ftp-method nocwd | FTPS data channel fails from this PC; nocwd is stateless |
| Endpoint | ftp://143.248.121.169:7002 | DNS fallback for irislab.asuscomm.com |
| Base path | /IntelligentManipulationTeam/DomrachevIvan/sinew/ | Per user CLAUDE.md folder + sinew subtree |
| Wire format | tar-of-zarr per episode (uncompressed) | Zarr's many-small-files layout is FTP-unfriendly; arrays already compressed |
| Auth | ~/.netrc (chmod 600), curl --netrc-file | Never put password in command line |
| rsync / rclone / sftp / FTPS | rejected | NAS has no SSH; plain FTP per user CLAUDE.md |
Two-process model: FmbRecorder writes zarr atomically; episode_uploader.py watches and pushes per-episode opportunistically. Idempotent — mid-upload crash → next pass overwrites. Recording rate ~60 MB/min = 1.0 MB/s at 224², vs ~12 MB/s home-link ceiling — network is never the bottleneck. Even four parallel collectors stay well under.
| Pool | Episodes | Size @ 224² | Wall @ 1 | Wall @ 4 parallel |
|---|---|---|---|---|
| single-object multi-stage | 15,350 | ~94 GB | ~76 h | ~19 h |
| single-object insertion-only | 4,050 | ~49 GB | ~20 h | ~5 h |
| long-horizon (300 ts/ep) | 2,700 | ~100 GB | ~67 h | ~17 h |
| total mirror-FMB @ 224² | 22,100 | ~243 GB | ~164 h (~7 days) | ~41 h (~1.7 days) |
Researcher's measured baseline (sinew-5.16): χ² = 1.88 mean across 4 cams, wrist cams worst at χ² ~ 2.0 with real-edge-density 51–67% denser. Wrist cams see hand, cables, fingers, peg-print lines — none of which the current sim USD models. The addenda enumerates 10 candidate fixes and ranks them by leverage × inverse-cost. Two tiers land before the v2f stage-1 pretrain.
| # | Fix | Time | Δχ² (mean) | Δχ² (wrist) | Why this rank |
|---|---|---|---|---|---|
| 1 | Procedural cable mesh in wrist FOV (1–2 swept curves, random color, random routing) | 1 day | -0.05 to -0.1 | -0.3 to -0.5 | Biggest single hit on wrist χ² — addresses the 39% edge-density gap directly |
| 2 | Lab-clutter distractor spawning (3–5 small meshes per ep outside the action region) | 0.5 day | -0.15 to -0.25 | -0.05 | Biggest global χ² hit per cost; already in DR spec row 32, just impl |
| 3 | Background plane workshop texture (tiled real-workshop photo on the ground plane) | 0.5 day | -0.1 to -0.2 | -0.05 | Cheap, fixes side-cam background uniformity |
| # | Fix | Time | Δχ² (mean) | Δχ² (wrist) |
|---|---|---|---|---|
| 4 | FDM layer-line normal map on peg surfaces (~0.4 mm period) | 0.5 day | -0.05 to -0.1 | -0.1 to -0.15 |
| 5 | Wrist-mount visual upgrade (real FMB STEP mesh + bevels + screws) | 1 day | -0.05 | -0.1 to -0.15 |
| 6 | Domain-aware per-frame exposure jitter (random render exposure) | 0.25 day | -0.1 to -0.15 | -0.1 to -0.15 |
| Fix | Why deferred |
|---|---|
| PathTracing render (raytraced → pathtraced) | 3–5× render-cost penalty — throughput killer. Run only if Tier 1+2 leaves a visible gap. |
| Hand approximation (stub human hand USD near gripper) | 2–3 day scope; Researcher explicitly flagged "stage 2 real fine-tune carries this." Don't simulate a human. |
| Photographed FMB bin texture | Needs a real photograph; not on the critical path. |
| State | Mean χ² | Wrist χ² | Interpretation |
|---|---|---|---|
| Today | 1.88 | ~2.0 | Solidly "visibly distinct" per Force Map |
| After Tier 1 | ~1.4 | ~1.4 | Distinguishable but training-tolerant |
| After Tier 1+2 | ~1.0–1.2 | ~1.1–1.3 | Approaching Force Map threshold; sufficient for stage-1 sim pretrain |
Honesty note: these Δχ² estimates are SimWorker fix-table predictions per the leverage analysis in sim2real_visual_gap.md §3. The original measurement is N=1 (one real episode, one timestep, one sim env, 4 cams). Re-measuring χ² post-DR + post-Tier-1 is the highest-priority sinew-5.16 follow-up. What scene authoring cannot fix — unmodeled lab clutter, lighting hardware noise, sensor-level rolling-shutter / chromatic-aberration artifacts — is absorbed by the stage-2 real fine-tune.
| Repo | Owns | Communicates via |
|---|---|---|
isaac_twins/ | Scenes, USDs, Franka control, recording driver, substage detector, F/T sensor. | 8 published symbols (sim_worker.md §3.2) |
isaaclab_sinew/ | RL env wrapper, training scripts, eval harness, parser (no BC loader after the BC drop). | Imports only the 8 published symbols |
sinew/ workspace | Docs, beads, references | Read-only on both repos |
Multi-env RL gated on three SimWorker follow-ups (carried into Wave 1):
grab_franka_view(num_envs) → Articulation wrapping /World/envs/env_.*/Scene/Robot regex.SceneConfigurator.reset_episode(env_ids).isaac_twins.fmb.obs.get_obs(cfgr, art_view, sub_detector).Until those land, single-env is the only contract. DirectRLEnv migration is then a rename-only ~200-line PR.
docs/research/*.md.isaaclab_sinew imports from isaac_twins only via the 8 published symbols.nohup + resumable driver for any Kit-loop sweep > 5 min.bd ready is the conflict-avoidance gate — claim before editing.This is the ship artifact. Seven deliverables: the visual gap quantification, the literature review v2, the FMB-only feasibility spec, the data leverage analysis, the 3-head architecture, the DR spec, and the revised pipeline that branches the training plan on the FMB-only outcome.
| Metric | Sim | Real | Gap |
|---|---|---|---|
| Color hist χ² (sim vs real) | — | 1.88 | Above FoAR χ² = 1.0 "visibly distinct" threshold |
| Edge density fraction | 0.056 | 0.079 | Real +39% denser edges |
| Per-channel brightness (R, G, B) | (152, 159, 152) | (94, 117, 104) | Sim 1.35–1.62× brighter |
| Per-channel std (R, G, B) | (40, 37, 38) | (51, 56, 55) | Real 30–52% wider tonal range |
| Cam | Hist χ² | Sim edge | Real edge | Sim mean RGB | Real mean RGB |
|---|---|---|---|---|---|
side_left | 2.10 | 0.059 | 0.069 | (143, 156, 146) | (93, 112, 83) |
side_right | 1.35 | 0.049 | 0.057 | (149, 154, 147) | (108, 116, 112) |
wrist_left | 2.00 | 0.055 | 0.083 | (161, 162, 160) | (96, 123, 100) |
wrist_right | 2.06 | 0.063 | 0.105 | (156, 161, 157) | (78, 118, 119) |
Wrist cams have the largest gap (real-edge-density 51–67% denser) because wrist cams see hand, fingers, board screws, peg layer-lines — sim doesn't model the foreground. Side cams cover the larger workspace that the sim USD captures more faithfully.
docs/research/figures/sim2real_visual_gap_grid.png.
insert episode. Lighting and viewpoint vary substantially within one episode — the v2f predictor must learn over a multi-modal real distribution, not a single fixed pose. Source: docs/research/figures/fmb_real_frame_variability.png.references/v2f_lit_v2/code/forcesight/prediction/models.py:RGBDDinov2). Depth channel trainable; init from RGB conv1 mean.| Finding | Source |
|---|---|
| Force direction transfers sim→real. Magnitude does not. | Direction Matters (Yang 2026) — L1 on unit-vec coords (NOT cosine, NOT angle); magnitude dropped. |
| Voxel grid only helps top-down clutter; per-pixel heatmap is better for peg-board contact. | Force Map (Hanai 2023) |
| Future-contact gate (binary) gates magnitude loss when ¬contact. | FoAR (He 2024) |
| Magnitude head as scalar regression with ~0.1× direction weight. | ForceSight + Direction Matters consensus |
| Heavy visual DR with NO dynamics DR. Dynamics randomization hurts direction supervision. | Force Map appendix + Direction Matters |
| Optimizer | AdamW, lr=3e-4 |
| Schedule | cosine + 2000-step warmup |
| Batch size | 128 per A6000 (FoAR uses 240 on 2× A100 → halve) |
| Epochs | 300 (FoAR default for similar scale) |
| Precision | bf16 on Ampere+ |
| Data scale | 5–10k sim rollouts with full F/T labels (Force Map's 5,400-scene recipe) |
The feasibility check answers a single load-bearing question before committing to the staged-pretrain plan: can v2f be learned from FMB-real labels alone? The spec is closed; the A6000 run is a separate impl ticket per the user CLAUDE.md "training runs on DL_A6000 not local PC" rule.
| Data | 100–500 FMB insert episodes, 2cam RGB-only |
| Architecture | Direction + gate heads only, frozen DINOv2-S |
| Budget | ≤ 1 A6000-day |
| Output | Five-outcome matrix A–E that branches the pipeline downstream (see §6.7) |
| Stage | Data | Trained | Frozen |
|---|---|---|---|
| 1: sim pretrain | 5–10k FMB insert sim rollouts, 2–4 cam RGB(+D), clean+noisy GT wrench, heavy visual DR | backbone (optional unfreeze) + all 3 heads + gate | nothing |
| 2: real fine-tune | ~3–4k FMB insert real episodes × ~100 steps × 2cam RGB-only, EE-frame F/T zero-bias-subtracted, gripper_pose==1 only | direction head + in-contact-gate head | backbone, magnitude head, contact-point head |
insert primitives (grasp/place/rotate/regrasp) — gripper-peg contact direction is uninformative for peg-board contact direction.language_embedding — no language conditioning.action labels — were for BC (now dropped); not for v2f.primitive == 'insert'.state_gripper_pose == 1 (peg held).Steps 1+2 collapse 22,550 episodes → ~3,000–4,000 insert-only-with-peg episodes, ~20–40 GB RGB-only.
Franka external wrench carries a per-episode baseline drift (payload model + thermal). The first 5 pre-grip-close timesteps (state_gripper_pose == 0) provide the bias estimate. If fewer than 5 pre-grip-close steps exist (peg already grasped), skip subtraction — falling back to in-contact bias estimation would bake contact force into the "bias" and contaminate all downstream direction labels.
| Component | Value |
|---|---|
| Backbone | DINOv2-S ViT-S/14, 384-d features, frozen (depth channel trainable) |
| Input | side_left + wrist_left RGBD 224×224 (letterboxed from 1280×720 sim or 256² FMB-real) |
| Patch embed | 4-ch RGBD (ForceSight pattern); depth conv1 init from RGB mean clone |
| Fusion | 2-layer transformer encoder, 4 heads, GELU, per-cam positional embeddings |
| Head | Output | Loss | Weight | Stage |
|---|---|---|---|---|
| contact-point | per-pixel 64×64 heatmap on wrist cam | BCE + soft-L2 hybrid | 0.5 | sim only; frozen at stage 2 |
| force-direction | unit-vec 3D in EE frame | L1 on coords (Direction Matters) | 1.0 (load-bearing) | sim pretrain + real fine-tune |
| 6D wrench | EE-frame [f; τ], trained on noisy-lagged label | MSE, gate-gated | 0.1 | sim only; frozen at stage 2 |
| in-contact gate | binary probability | BCE (FoAR pattern) | 0.1 | sim + real fine-tune |
||F|| < 8 N — direction at the noise floor is uninformative.| Stage | Epochs | lr | bs | Wall |
|---|---|---|---|---|
| Stage 1 (sim pretrain) | 300 | 3e-4 cosine + 2k warmup | 128 | ~1.0–1.3 A6000-days, bf16 |
| Stage 2 (real fine-tune, dir + gate only) | 30–50 | 3e-5, no warmup | 128 | < 1 A6000-day |
Heavy visual DR, NO dynamics DR. F/T noise applied to labels, not inputs. 46-row master knob table organized in three schedule buckets:
| Bucket | Knobs (count) | Where applied |
|---|---|---|
| per-episode | lighting (5), cam intrinsics (3), cam extrinsics (3), materials (8), placement (3), F/T bias + lag (2), MODE split (1) — ~26 knobs | SceneConfigurator.randomize(step). Re-author USD attributes / lights / materials in place. USD writes > runtime set_focal_length calls. |
| per-frame (sim) | depth dropout + Gauss noise + clamp + RGBD jitter (4), F/T additive noise (2) — 6 knobs | Recording loop, before writing the labeled tuple. Noise becomes part of the recorded label. |
| per-frame (train aug) | RGB brightness/contrast/hue/saturation/gauss/gamma/JPEG (7), chromatic aberration (1) — 8 knobs | Training dataloader (torchvision.transforms.v2). Cheap; expands effective dataset. |
Critical knob: depth dropout on low-texture pixels (knob #20). D405 is passive stereo — textureless 3D-printed pegs give sparse / noisy depth in real but the sim depth is clean. Approximate by maskingtexture_grad < τpixels and setting them toinf. Without this knob, any RGBD-using head will be sim-tuned.
Per the visual-gap quantification: lighting + color jitter + material BRDF + camera intrinsic jitter + background clutter must all ship in the stage-1 DR set. None of these is "optional ablation."
One pipeline, five outcome branches set by the sinew-5.17 FMB-only A6000 run result.
┌─ A. Strong validation (real-only ≥0.85) ── drop sim stage 1; ship real-only
│
├─ B. Validated, expected (≥0.70, <0.85)── staged pretrain-sim + fine-tune-real (canonical recipe)
[sinew-5.17 result]──┤
├─ C. Marginal (≥0.70 global, <0.60 worst)─ per-shape ensembles OR more real data
│
├─ D. Below-bar (0.55-0.70)─────────────── triage: unfreeze backbone, swap to ViT-B, more real data
│
└─ E. Failed (<0.55)────────────────────── HALT — frame audit, zero-bias check, linear probe
predictor_sim_pretrain.pt.insert subset (~3–4k eps × ~100 steps × 2cam RGB). Direction + gate heads only; backbone, wrench, contact-point heads frozen. 30–50 epochs, AdamW lr=3e-5, no warmup, bs=128. Direction (L1 NaN-masked) + gate (BCE) weighted 1.0 : 0.1. Output: predictor_real_finetune.pt — the ship artifact.Two related decisions that both prevent train-test divergence at deployment:
K_F_ext_hat_K — that's a separate channel the policy may read; the predictor's job is to produce wrench from RGBD alone.sim_action_noised conditioning input. The flag doesn't exist at deployment time; training a conditioning input the trainer can't reproduce at test time is exactly the kind of subtle divergence we should refuse to introduce. The disturbance shows up implicitly in RGBD (object position, contact geometry); the model can learn the regime from images.Direction Matters's 70% bar was derived under their own sim2real distribution. Our χ² = 1.88 makes it likely the absolute bar is closer to 0.55–0.65 in the worst case. Therefore:
| Tier | Global cos-sim | Per-shape min | Action |
|---|---|---|---|
| Aspirational | ≥ 0.70 | ≥ 0.60 | Ship outcome B as-is |
| Acceptable | ≥ 0.60 | ≥ 0.45 with +0.05 fine-tune lift | Ship outcome B; flag the gap |
| Soft-fail | 0.55–0.60 | varies | Outcome C — per-shape ensembles |
| Hard-fail | < 0.55 | — | Outcome E — halt + audit (frame + zero-bias + quat + linear-probe) |
Primary metric becomes monotonic improvement post-fine-tune until χ² is re-measured post-DR. If real fine-tune doesn't lift over stage-1 sim-pretrain, the visual gap absorption isn't working.
The research epic produces design memos. This section lays out the impl tickets that follow, sequenced by dependency. None are filed yet; by convention they get filed when the research epic closes and the team transitions to impl.
isaac_twins/src/isaac_twins/fmb/substage.py per sinew-5.3 §4. Single-env. Offline unit tests + 2 runtime tests + the three pre-RL Q1 checks (threshold envelope, inverted-physics sanity, temporal smoothness flag).read_eef_wrench_ee API per sinew-5.21. Stateful when noisy=True (LP filter state owned by caller); stateless when noisy=False. Smoke test: push peg into board, confirm wrench magnitude grows monotonically with depth.peg_tip_local_offset on each peg USD at author time (isaac_twins-37). Required by the insert peg-tip-z check.grab_franka_view(num_envs), SceneConfigurator.reset_episode(env_ids), obs packager — unblock multi-env scaling and DirectRLEnv migration.(diagnostics, prev_diagnostics, primitive, action, safety_clip_delta) → float. Offline tests against synthetic histories.EEDeltaCorruptedActionMapper subclass per sinew-5.22 §3 (~0.5 d). Burst-state machine for the SimDist recipe.FmbDataRecorder outer loop (~1.5 d) per sinew-5.22 §3. Burst-noise applied here, NOT during PPO training.obs/sim_* keys (sim_action_noised, sim_policy_iter, sim_never_noised, plus the F/T clean/label pair already in sinew-5.6) + 4 episode-meta labels.nohup, plain FTP, per-episode opportunistic).gs://gresearch/robotics/fmb/0.0.1/ — 15–30 shards (~20–35 GB), insert-only-after-filter. ~30 min download + ~2 h parsing.clean_expert slice (~1.25%).predictor_real_finetune.pt on real Franka.K_F_ext_hat_K.| Wave | Estimated time | Bottleneck |
|---|---|---|
| Wave 1 (impl) | ~3–5 person-days | SimWorker bandwidth |
| Wave 2 (RL training) | ~6 GPU-days at N=1, ~1.5 at N=4 | per-seed wall time |
| Wave 3 (data gen + pull + parse) | ~9 GPU-h + ~2 h pull + ~2 h parse | I/O on FMB pull |
| Wave 4 (v2f train) | ~1–2 A6000-days stage 1 + ~1 day stage 2 | training compute |
| Wave 5 (real eval) | deferred | real-robot access |
Items deliberately deferred or rejected, with the source memo. Kept tight; broad future-direction work belongs to the next epic.
| Item | Source | When |
|---|---|---|
| χ² re-measurement post-DR + post-Tier-1 scene fix | sinew-5.16 §5 follow-up #4 | After Wave 3 data-gen lands |
| Detector ↔ recorded-label cross-check (addenda 1.3) | addenda Q1 | Only if non-determinism observed in recordings |
| Per-shape threshold ablation (addenda 1.6) | addenda Q1 | After FMB raw arrives |
| Soft-success bonus shape | sinew-5.1 §8 #2 | First training-run data |
| Ablations: backbone freeze/unfreeze, RGB vs RGBD, temporal stack T=1 vs T=4, contact-point per-pixel vs voxel | sinew-5.11 §6 | Wave 4 (if stage 1 underperforms) |
| Production asset root pin | isaac_twins-36 | Before any S3-staging issue resurfaces |
| Item | Reason |
|---|---|
| FMB trajectory replay reliability (~4 days to fix) | Not needed for v2f-end-goal; state-only adapter covers substage verification (sinew-5.20) |
| SimDist latent world model + MPC planning | Out of v2f scope (sinew-5.18) |
| Force-side domain adaptation | Two-gap separation: F/T gap closed by label noise, not domain adaptation |
| PathTracing render | 3–5× render-cost penalty; throughput killer (addenda Q3) |
| Hand approximation in sim | Stage-2 real fine-tune absorbs this; modelling a human is scope creep (addenda Q3, sinew-5.16 §3.2) |
| Multi-object FMB assemblies (stage 2) | Different contact physics; 7,200 demos / 233 GB; defer indefinitely |
| Real-robot validation epic | Its own future epic; sinew-5.2 protocol carries over once real eval starts |
| File | Issue | Owner | Scope |
|---|---|---|---|
docs/researcher.md | sinew-1.1 | Researcher | FMB deep-dive + force-from-vision landscape v1 |
docs/fmb_reference.md | sinew-1.1 + 1.5 | Researcher | FMB cheat-sheet; §3.1 EE-frame canonical; §11 cam LR mapping |
docs/sim_worker.md | sinew-1.2 + 1.4 | SimWorker | isaac_twins audit |
docs/rl_worker.md | sinew-1.3 | RLWorker | isaaclab_sinew bootstrap |
docs/rl_action_layer_sketch.md | sinew-2 | RLWorker | EE-delta action layer (Plan A DIK) |
docs/research/rl_eval_protocol.md | sinew-5.2 | RLWorker | IQM + bootstrap CIs, PPO defaults |
docs/research/sim_substage_detection.md | sinew-5.3 | SimWorker | Substage detector spec |
docs/research/rl_reward_design.md | sinew-5.1 | RLWorker | PBRS + sparse + curriculum (Φ_insert §3.5 superseded) |
docs/research/il_bc_warmstart.md | sinew-5.12 | RLWorker | BC plan (dropped per sinew-5.22 §2) |
docs/research/fmb_sim_data_match.md | sinew-5.7 | Researcher | 3 FMB schemas, sim canonical, libfranka extras |
docs/research/v2f_lit_v2.md | sinew-5.9 | VisionWorker | 11-paper impl-detail review |
docs/research/v2f_data_leverage.md | sinew-5.10 | Researcher | Stage-1/stage-2 partition (revised by sinew-5.23) |
docs/research/v2f_arch.md | sinew-5.11 | VisionWorker | 3-head architecture (still binding) |
docs/research/v2f_dr_spec.md | sinew-5.13 | VisionWorker | 46-knob DR table |
docs/research/camera_subset_benchmark.md | sinew-5.5 | SimWorker | Camera subset perf bench |
docs/research/sim_recording_spec.md | sinew-5.6 | SimWorker | Recording schema + format choice |
docs/research/data_pipeline.md | sinew-5.8 | SimWorker | NAS + DL_A6000 pipeline |
docs/research/parallel_dev_architecture.md | sinew-5.4 | SimWorker | Two-repo split, hot rules |
docs/research/sim2real_visual_gap.md | sinew-5.16 | Researcher | χ²=1.88 quantification |
docs/research/v2f_fmb_only_feasibility.md | sinew-5.17 | VisionWorker | FMB-only feasibility spec + outcome matrix |
docs/research/sim_dist_review.md | sinew-5.18 | Researcher | SimDist action-burst recipe |
docs/research/substage_verification.md | sinew-5.19 | SimWorker | State-only adapter for offline calibration |
docs/research/fmb_replay_feasibility.md | sinew-5.20 | SimWorker | Honest negative on FMB replay |
docs/research/sim_ft_sensor.md | sinew-5.21 | SimWorker | read_eef_wrench_ee API + 4-stage pipeline |
docs/research/rl_revised_plan.md | sinew-5.22 | RLWorker | Reward v2, BC drop, SimDist adoption, labels |
docs/research/v2f_pipeline_revised.md | sinew-5.23 | VisionWorker | 5-outcome branches; canonical recipe |
docs/research/substage_and_scene_addenda.md | addenda | SimWorker | Q1 gate checks, Q2 combining, Q3 scene tiers |
isaac_twins/scripts/validate_fmb_schema.py | sinew-5.7 | Researcher | FMB-schema validator (3-mode autodetect) |
isaac_twins/references/)| Repo | Path | Why |
|---|---|---|
| FoAR — He et al. 2024 | v2f_lit_v2/code/FoAR/ | In-contact-gate pattern, ResNet18 + F/T fusion |
| Reactive Diffusion Policy — Xue et al. 2025 | v2f_lit_v2/code/reactive_diffusion_policy/ | 2-stage VAE + latent-DDPM, slow/fast hierarchy |
| ForceSight — Collins et al. 2023 | v2f_lit_v2/code/forcesight/ | RGBDDinov2 backbone, ThreeHeadMLP |
| SimDist — CLeARoboticsLab | references/sim_dist/ | Action-burst recipe (kept; latent WM dropped) |
| rliable — Agarwal et al. 2021 | eval/rliable/ | IQM + stratified bootstrap CI |
| FMB — Luo et al. 2024 | fmb/ | Authoritative FMB code |
| realsense-ros | realsense-ros/ | D405 driver reference |
isaac_twins/references/v2f_lit_v2/papers/)eval/agarwal_2021_rliable.pdfeval/andrychowicz_2021_on_policy.pdfRobotState reference — frankarobotics.github.iogs://gresearch/robotics/fmb/0.0.1/ (mirrored at isaac_twins/references/fmb_dataset_schema/)Generated 2026-05-21 by the sinew agent team. Final state: sinew-1 env setup (5 children) and sinew-5 research epic (14 v1 children + 8 reopened-wave children + 5.14 synthesis) all closed. Implementation phase begins with Wave 1 (sim surface). Source memos under docs/research/; cloned references under isaac_twins/references/; this report lives at docs/r2s2r_research_report.html.