U.S. Patent 10,131,682 C1 · antibody-drug-conjugate linker chemistry (TC1600) · Final Written Decision Paper 58 (Aug. 26, 2025)
This packet contains everything needed to review one contested-reasoning evaluation task built from the above proceeding: the task the model is given, the record it may see, the answer key derived from the FWD, the 15 graded rubric propositions with their source excerpts, and one fully scored model run. The ask is in the box below.
fwd_surviving_claims_25_32_34_not_shown) was recently clarified so that over-preserving a non-survivor is not treated as an over-kill contradiction. Please confirm that reading.Derived from FWD Paper 58 (CONCLUSION, summary table, and ORDER). This is the ground truth the rubric and scoring are built against. Each cell is one challenged claim.
G1 Morales-Sanfrutos as ev. by Straus · G2 Harris · G3 Singh in view of Harris · G4 Singh/Harris/Bhakta · G5 Singh/Harris/Snow
The complete task instruction handed to the model. It is told to predict the Board's ground-by-ground resolution from the pre-decision record, and explicitly told not to assume either side wins and not to use external sources.
The model works only from the bundled local record below, reached through file/search tools and a PDF reader. The Final Written Decision (Paper 58) is withheld, so the model must reconstruct the holding, not read it. The Patent Owner's briefing and the asserted references are present, so the survivors are reconstructable from the record. Click any document to open it. Two cited references are copyrighted journal articles and are not redistributed here.
This is the core review surface. Each proposition is graded for/against the model's answer. For each, the green excerpt is the supporting FWD text; the acceptance criterion is the grading rule. Polarity: positive = the Board affirmatively held it; not_shown = the Board found it not proven; role_limit = a scope/not-reached limit.
A fresh run of GPT-5.5 (xhigh reasoning) on this task, graded by the authoritative rubric scorer. Reward = 0.70·coverage + 0.15·ground + 0.15·motivation − 0.15·(over-kill contradictions), floored at 0. This example is included so you can see the grader applied to a real answer and check whether each row's verdict is sound.
Verdict per rubric row, with the judge's justification. This is what produced the score above.
The agent's actual run: every reasoning step, shell command, and PDF-tool call (tool outputs truncated).
The complete structured output the model produced (download raw JSON): its overall outcome, claim constructions, the rationale to combine, the contested disputes as it resolved them, and its per-claim reasoning with the record quotes it cited. This is what the rubric scorer graded.