Congratulations!

[Valid RSS] This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Source: http://lesswrong.com/comments/.rss

  1. <?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[LessWrong]]></title><description><![CDATA[A community blog devoted to refining the art of rationality]]></description><link>https://www.lesswrong.com</link><image><url>https://res.cloudinary.com/lesswrong-2-0/image/upload/v1497915096/favicon_lncumn.ico</url><title>LessWrong</title><link>https://www.lesswrong.com</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 17 Jul 2025 03:22:54 GMT</lastBuildDate><atom:link href="https://www.lesswrong.com/feed.xml?view=rss&amp;karmaThreshold=2" rel="self" type="application/rss+xml"/><item><title><![CDATA[Towards plausible moral naturalism]]></title><description><![CDATA[Published on July 17, 2025 1:51 AM GMT<br/><br/><p>In <a href="https://unstableontology.com/2025/07/15/generalizing-zombie-arguments/">"Generalizing zombie arguments"</a>, I hinted at the idea of applying a Chalmers-like framework to morality. Here I develop this idea further.</p>
  2. <p>Suppose we are working in an axiomatic system rich enough to express physics and physical facts. Can this system include moral facts as well? Perhaps moral statements such as "homicide is never morally permissible" can be translated into the axiomatic system or an extension of it.</p>
  3. <p>It would be difficult to argue that a realistic axiomatic system <em>must</em> be able to express moral statements. So I'll bracket the alternative possibility: Perhaps moral claims do not translate into well-formed statements in the theory at all. This would be a type of moral non-realism, of considering moral claims to be meaningless.</p>
  4. <p>There's another possibility to bracket: moral trivialism, according to which moral statements do have truth values, but only trivial ones. For example, perhaps all actions are morally permissible. Or perhaps no actions are. This is a roughly moral nonrealist possibility, especially compatible with error theory.</p>
  5. <p>The point of this post is not to argue against moral meaninglessness or moral trivialism, but rather to explore alternatives to see which are most realistic. Even if moral meaninglessness or trivialism is likely, the combination of uncertainty and disagreement regarding their truth could motivate examining alternative theories.</p>
  6. <p>Let's start with a thought experiment. Suppose there are two possible universes that are physically identical. They both contain versions of a certain physical person, who takes the same actions in each possible universe. However, in one possible universe, this person's actions are morally wrong, while in another, they are morally right.</p>
  7. <p>We could elaborate on this by imagining that in the actual universe, this person's actions are morally wrong, although they are materially helpful and match normal concepts of moral behavior. This person would be a P-evildoer, analogous to a P-zombie; they would be physically identical to a morally acting person, but still an evildoer nonetheless.</p>
  8. <p>Conversely, we could imagine that in the actual universe, this person's actions are morally good, although they are materially malicious and match normal concepts of immoral behavior. They would be a P-saint. (Scott Alexander has written about a similar scenario in <a href="https://web.archive.org/web/20160306082537/https://raikoth.net/consequentialism.html">"The Consequentialism FAQ"</a>.)</p>
  9. <p>I, for one, balk at imagining such a scenario. Something about it seems inconceivable. It seems easier to imagine that I am so wrong about morality that intuitive judgments of moral goodness are anti-correlated with real moral goodness, than that physically identical persons in different possible worlds have different moral properties. Somehow, P-evildoer scenarios seem even worse than P-zombie scenarios. To be clear, it's not too hard to imagine that the P-evildoer's actions could have negative supernatural consequences (while their physically identical moral twin's actions have positive supernatural consequences), but moral judgments seem like they should evaluate agents relative to their possible epistemic state(s), rather than depending on un-knowable supernatural consequences.</p>
  10. <p>As motivation for considering such deeply unfair scenarios, consider the common idea that atheists and materialists cannot ground their morality, as morality must be grounded in a supernatural entity such as a god. Then, depending on the whims of this god, a given physical person may act morally well or morally badly, despite taking the same actions in either case. And, unless the whims of the god are so finite and predictable that humans can comprehend them in general, different possible gods (and accordingly moral judgments) are logically possible and conceivable from a human perspective.</p>
  11. <p>Broadly, this idea could be called "moral supernaturalism", which says that moral properties are not determined by the combination of physical properties, mathematical properties, and metaphysical constraints on conceivability; moral properties are instead "further facts". I'm not sure how to argue the inconceivability of P-evildoers to someone who doesn't already agree, but if someone accepts a principle of no P-evildoers, then this is a strong argument against moral supernaturalism.</p>
  12. <p>The remaining alternative to moral meaninglessness, moral trivialism, and moral supernaturalism, would seem to be reasonably called "moral naturalism". I'll take some preliminary steps towards formalizing it.</p>
  13. <p>Let us work in a typed variant of first-order logic, with three types: <em>W</em> (a type of possible worlds), <em>P</em> (a type of physical world trajectory), and <em>M</em> (a type of moral world trajectory). Each possible world has a physical and a moral trajectory, denoted <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="p(w)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span><style>.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
  14. .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
  15. .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
  16. .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
  17. .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
  18. .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
  19. .mjx-numerator {display: block; text-align: center}
  20. .mjx-denominator {display: block; text-align: center}
  21. .MJXc-stacked {height: 0; position: relative}
  22. .MJXc-stacked > * {position: absolute}
  23. .MJXc-bevelled > * {display: inline-block}
  24. .mjx-stack {display: inline-block}
  25. .mjx-op {display: block}
  26. .mjx-under {display: table-cell}
  27. .mjx-over {display: block}
  28. .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
  29. .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
  30. .mjx-stack > .mjx-sup {display: block}
  31. .mjx-stack > .mjx-sub {display: block}
  32. .mjx-prestack > .mjx-presup {display: block}
  33. .mjx-prestack > .mjx-presub {display: block}
  34. .mjx-delim-h > .mjx-char {display: inline-block}
  35. .mjx-surd {vertical-align: top}
  36. .mjx-surd + .mjx-box {display: inline-flex}
  37. .mjx-mphantom * {visibility: hidden}
  38. .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
  39. .mjx-annotation-xml {line-height: normal}
  40. .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible}
  41. .mjx-mtr {display: table-row}
  42. .mjx-mlabeledtr {display: table-row}
  43. .mjx-mtd {display: table-cell; text-align: center}
  44. .mjx-label {display: table-row}
  45. .mjx-box {display: inline-block}
  46. .mjx-block {display: block}
  47. .mjx-span {display: inline}
  48. .mjx-char {display: block; white-space: pre}
  49. .mjx-itable {display: inline-table; width: auto}
  50. .mjx-row {display: table-row}
  51. .mjx-cell {display: table-cell}
  52. .mjx-table {display: table; width: 100%}
  53. .mjx-line {display: block; height: 0}
  54. .mjx-strut {width: 0; padding-top: 1em}
  55. .mjx-vsize {width: 0}
  56. .MJXc-space1 {margin-left: .167em}
  57. .MJXc-space2 {margin-left: .222em}
  58. .MJXc-space3 {margin-left: .278em}
  59. .mjx-test.mjx-test-display {display: table!important}
  60. .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
  61. .mjx-test.mjx-test-default {display: block!important; clear: both}
  62. .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
  63. .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
  64. .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
  65. .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
  66. .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal}
  67. .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal}
  68. .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold}
  69. .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold}
  70. .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
  71. .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
  72. .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
  73. .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
  74. .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
  75. .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
  76. .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
  77. .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
  78. .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
  79. .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
  80. .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
  81. .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
  82. .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
  83. .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
  84. .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
  85. .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
  86. .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
  87. .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
  88. .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
  89. .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
  90. .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
  91. @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
  92. @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
  93. @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
  94. @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
  95. @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
  96. @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
  97. @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
  98. @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
  99. @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
  100. @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
  101. @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
  102. @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
  103. @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
  104. @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
  105. @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
  106. @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
  107. @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
  108. @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
  109. @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
  110. @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
  111. @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
  112. @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
  113. @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
  114. @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
  115. @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
  116. @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
  117. @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
  118. @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
  119. @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
  120. @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
  121. @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
  122. @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
  123. @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
  124. @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
  125. @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
  126. @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
  127. @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
  128. @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
  129. @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
  130. @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
  131. @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
  132. @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
  133. @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
  134. @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
  135. @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
  136. @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
  137. @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
  138. @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
  139. @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
  140. @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
  141. @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
  142. </style></span></span> and <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="m(w)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span> respectively. To state a no P-evildoers principle:</p>
  143. <p><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\neg \exists (w_1, w_2 : W), p(w_1) = p(w_2) \wedge m(w_1) \neq m(w_2)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">¬</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∃</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">∧</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">≠</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></p>
  144. <p>In other words, any two possible worlds with identical physical properties must have the same moral properties. (As an aside, modal logic may provide alternative formulations of possibility without reifying possible worlds as "existent" first-order logical entities, though I don't present a modal formulation here.) I would like to demonstrate a more helpful statement from the no P-evildoers principle:</p>
  145. <p><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\forall (p^* : P) \exists (m^* : M) \forall (w : W), p(w) = p^* \rightarrow m(w) = w^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∀</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∃</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.081em;">M</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∀</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span></p>
  146. <p>This says that any physical trajectory has a corresponding moral trajectory that holds across possible worlds. To prove this, start by letting <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="p^* : P"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span></span></span></span></span> be arbitrary. Either there is some possible world with this physical trajectory, or not. If not, we can let <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="m^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> be arbitrary, and <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\forall (w : W), p(w) = p^* \rightarrow m(w) = w^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∀</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> will hold vacuously.</p>
  147. <p>If so, then let <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="w^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> be some such world and set <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="m^* = m(w^*)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span>. Now consider some arbitrary possible world <em>w</em> for which <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="p(w) = p^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span>. Either it is true that <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="m(w) = w^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> or not. If it is, we are done; we have that <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="p(w) = p^* \rightarrow m(w) = w^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span>. If not, then observe that <em>w</em> and <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="w^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">w</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> have identical physical trajectories, but different moral trajectories. This contradicts the no P-evildoers principle.</p>
  148. <p>So, the no P-evildoers principle implies the alternative statement, which expresses something of a functional relationship between physical trajectories and moral ones; for any physical trajectory, there is only one possible corresponding moral trajectory. With a bit of set theory, and perhaps axiom of choice shenanigans, we may be able to show the existence of a function <em>f</em> mapping physical trajectories to corresponding moral trajectories.</p>
  149. <p>Now <em>f</em> expresses a <em>necessary</em> (working across all possible worlds) mapping from physical to moral trajectories, a sort of multiversal moral judgment function. Its necessity across possible worlds demonstrates its a priori truth, showing Kant was right about morality being derivable from the a priori. We should return to Kant to truly comprehend morality itself.</p>
  150. <p>...This is perhaps jumping to conclusions. Even though we have an <em>f</em> expressing a necessary functional relationship between physical and moral trajectories, this does not show the epistemic derivability of <em>f</em> from any realistic perspective. Perhaps different alien species across different planets develop different beliefs about <em>f</em>, having no way to effectively resolve their disagreements.</p>
  151. <p>We don't have the framework to rule this out, so far. Yet something seems suspiciously morally supernaturalist about this scenario. In the P-evildoer scenario, we could imagine that God decides all physical facts, and then adds additional moral facts, but potentially attributes different moral facts to physically identical universes. With the no P-evildoers principle, we have constrained God a little, but it still seems God might need to add facts about <em>f</em> even after determining all physical facts, to yield moral judgments.</p>
  152. <p>So perhaps we can consider a stronger rejection of moral supernaturalism, to rule out a supernatural God of the moral gaps. We would need to re-formulate a negation of moral supernaturalism, distinct from our original no P-evildoers axiom.</p>
  153. <p>One possibility is logical supervenience of moral facts on physical ones; the axiomatic system would be sufficient to prove all moral facts from all physical facts. This could be called moral reductionism: moral statements are logically equivalent to perhaps complex statements about particle trajectories and so on. This would imply that there are conceptually straightforward physical definitions of morality, even if we haven't found them yet. However, assuming moral reductionism might be overly dogmatic, even if moral supernaturalism is rejected. Perhaps there are more constraints to "possibility" or "conceivability" than just logical coherence; perhaps there are additional metaphysical properties that must be satisfied, and morality cannot be in general derived without such constraints.</p>
  154. <p>As an example, Kant suggests that geometric facts are not analytic. While there are axiomatizations of geometry such as Euclidean geometry, any one may or may not correspond to the a priori intuition of space. Perhaps systems weaker than Euclidean geometry allow many logically consistent possibilities, but some of these do not match a priori spatial intuition, so some of these logically consistent geometric trajectories are inconceivable.</p>
  155. <p>Let us start with an axiomatic system <em>T</em> capable of expressing physical and moral statements. Rather than the previous possible-world theory, we will instead consider it to have a set of physical statements <em>P</em> and a set of moral statements <em>M</em>, which are all about the actual universe, rather than any other possible worlds.</p>
  156. <p>Let us, for formality, suppose that these additional metaphysical conceivability constraints can be added to our axiomatic system <em>T</em>; call the augmented system <em>T'</em>. Now we can apply model theory to <em>T'</em>. Are there models of <em>T'</em> that have the same physical facts, yet different moral facts?</p>
  157. <p>We must handle some theoretical thorniness: <a href="https://scottaaronson.blog/?p=710">Gödel's first incompleteness theorem</a> shows that there are multiple models of Peano arithmetic which yield different assignments of truth values to different statements about the natural numbers, assuming PA is consistent. There is some intuition that these statements (which exist in the <a href="https://en.wikipedia.org/wiki/Arithmetical_hierarchy">arithmetical hierarchy</a>) nonetheless have real truth values, although it's hard to be sure. Even if they don't, it seems hard to formalize a system similar to Peano arithmetic that is consistent and complete; the doubter of the meaningfulness of statements in higher levels of the arithmetical hierarchy is going to have to work in an impoverished axiomatic system until such a system is formalized.</p>
  158. <p>So we augment <em>T'</em> with additional mathematical axioms, expressing, for example, all true PA statements according to the standard natural model; call this further-augmented system <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span>. The axioms of <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> are, of course, uncomputable, but this is not an obstacle to model theory. These mathematical axioms disambiguate cases where the truth values of these arithmetic hierarchy statements are morally relevant somehow.</p>
  159. <p>In a meta-theory capable of model-theoretic analysis of <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> (such as ZFC), we can express a stronger form of moral naturalism:</p>
  160. <p>"Any two models of <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> having the same assignment of truth values to statements in <em>P</em> have the same assignment of truth values to statements in <em>M</em>."</p>
  161. <p>Recall that <em>P</em> is the set of physical statements, while <em>M</em> is the set of moral statements. What this expresses is that there are no "further facts" to morality in addition to physics, metaphysical conceivability considerations, and mathematical oracle facts, ruling out a broader class of moral supernaturalism than our previous formulation.</p>
  162. <p>Using <a href="https://unstableontology.com/2024/05/27/understanding-godels-completeness-theorem/">Gödel's completeness theorem</a>, we can show from the moral naturalism statement:</p>
  163. <p>"<span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> augmented with an infinite axiom schema assigning any logically consistent truth values to all statements in <em>P</em> will eventually prove binary truth values for all statements in <em>M</em>".</p>
  164. <p>The infinite axiom schema here is somewhat unsatisfying; it's not really necessary if <em>P</em> is finite, but perhaps we want to consider countably infinite <em>P</em> as well. Luckily, since all proofs of individual <em>M</em>-statements (or their negations) are finite, they only use a finite subset of the axiom schema. Hence we can show:</p>
  165. <p>"For any finite subset of <em>M</em>, <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span> augmented with some finite subset of an infinite axiom schema assigning any logically consistent truth values to all statements in <em>P</em> will eventually prove binary truth values for all statements in this subset of <em>M</em>".</p>
  166. <p>This is more satisfying: a finite subset of moral statements only requires a finite subset of physical statements to prove their truth values. Likewise, the proofs need only use a finite subset of the axioms of <span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T^*"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.291em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∗</span></span></span></span></span></span></span></span>, e.g. they only require a finite number of mathematical oracle axioms.</p>
  167. <p>To summarize, constraining models of an augmented axiomatic system, with metaphysical conceivability and mathematical axioms, to never imply different moral facts given the same physical facts, yields a stronger form of moral non-supernaturalism, preventing there from being "further facts" to morality beyond physics, metaphysical conceivability constraints, and mathematics. Using model theory, this would imply that the truth values of any finite subset of moral statements are logically derivable from the system and from some finite subset of physical statements. This implies not just a theoretical existence of a functional relationship between physics and morality (<em>f</em> from before), but hypothetical <em>a priori</em> epistemic efficacy to such a derivation, albeit making use of an uncomputable mathematical oracle.</p>
  168. <p>The practical upshot is not entirely clear. Aliens might have different beliefs about uncomputable mathematical statements, or different metaphysical conceivability axioms, yielding different moral beliefs. But disagreements over metaphysical conceivability and mathematics seem more epistemically weighted than fully general moral disagreements; they would relate to epistemic cognitive architecture and so on, rather than being orthogonal to epistemology.</p>
  169. <p>Physically instantiated agents also might not generally be <em>motivated</em> to act morally, even if they have specific beliefs about morality. This doesn't contradict moral realism per se, but it indicates practical obstacles to the efficacy of moral theory. As an example, the aliens may be moral realists, but additionally have beliefs about "schmorality", a related but distinct concept, which they are more motivated to put into effect.</p>
  170. <p>While it would take more work to determine the likelihood of Kantian morality conditioned on moral naturalism, in broad strokes they seem quite compatible. In the <em>Critique of Practical Reason</em>, Kant argues:</p>
  171. <blockquote>
  172. <p>If a rational being can think of its maxims as practical  universal laws, he can do so only by considering them as principles which contain the determining grounds of the will because of their form and not because of their matter.</p>
  173. <p>The material of a practical principle is the object of the will. This object either is the determining ground of the will or it is not. If it is, the rule of the will is subject to an empirical condition (to the relation of the determining idea to feelings of pleasure or displeasure), and therefore it is not a practical law. If all material of a law, i.e., every object of the will considered as a ground of its determination, is abstracted from it, nothing remains except the mere form of giving universal law. Therefore, a rational being either cannot think of his subjectively practical principles (maxims) as universal laws, or he must suppose that their mere form, through which they are fitted for being universal laws, is alone that which makes them a practical law.</p>
  174. </blockquote>
  175. <p>Mainly, he is emphasizing that universal moral laws must be determined <em>a priori</em>, rather than subject to empirical determination; hence, different rational agents would derive the same universal moral laws given enough reflection (though, of course, this assumes some metaphysical agreement among the rational agents, such as regarding the a priori synthetic). While my formulation of moral naturalism allows moral judgments to depend on physical facts, the general functional mapping from physical to moral judgments is itself <em>a priori</em>, not depending on the specific physical facts, but instead derivable from an axiomatic system capable of expressing physical facts, plus metaphysical conceivability constraints and mathematical axioms.</p>
  176. <p>This is meant to be more of a starting point for moral naturalism than a definitive treatment. It is a sort of <em>what if</em> exercise: what if there is at least one workable alternative to moral meaninglessness, moral trivialism, and moral supernaturalism? It is difficult to decisively argue <em>for</em> moral naturalism, so I am more focused on exploring the implications <em>of</em> moral naturalism; this will make it easier to have an idea of the scope of plausible moral theories, and how they compare with each other.</p>
  177. <br/><br/><a href="https://www.lesswrong.com/posts/sWuBhds97j8R6y22m/towards-plausible-moral-naturalism#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/sWuBhds97j8R6y22m/towards-plausible-moral-naturalism</link><guid isPermaLink="false">sWuBhds97j8R6y22m</guid><dc:creator><![CDATA[jessicata]]></dc:creator><pubDate>Thu, 17 Jul 2025 01:51:54 GMT</pubDate></item><item><title><![CDATA[Assign Probabilities Functorially]]></title><description><![CDATA[Published on July 17, 2025 1:49 AM GMT<br/><br/><p>Recently, I read the paper <a href="https://projecteuclid.org/journals/annals-of-statistics/volume-30/issue-5/What-is-a-statistical-model/10.1214/aos/1035844977.full">What is a statistical model?</a>, and this post contains my thoughts downstream of the concepts in the paper. I have aimed to be minimally technical here (though it gets a bit technical towards the end...), so go read the paper if you want more precise ideas.</p><p>The fundamental challenge when we model the world is representing real-world concepts as mathematical objects. The choice of representation is not 'real', so it should not effect the predictions made by our model. This is essentially the <a href="https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation">map-territory</a> problem. We have multiple maps for the territory, and we want our decisions to be based on the properties of the territory, not on artifacts of the map.</p><p>McCullagh's paper partially addresses this challenge, using category theory as a tool to keep track of various maps of a given territory. One key insight is that, if we want to assign probabilities to events based on the territory and not the map, we need our assignment of probabilities to be <i>functorial</i>. This includes and generalizes the concept of invariance &amp; equivariance under group actions.&nbsp;</p><p>My primary goal with this article is to explain what it means to assign probabilities functorially, more intuitively than rigorously. My secondary goal is to pique your interest in the utility of category theory for mathematically approaching the map-territory problem.</p><h1><br>Warm-Up: Changing units of measurement</h1><p><br>Consider mathematically representing the temperature of an object. Three natural representations are:<br>- The number of degrees Celsius&nbsp;<br>- The number of degrees Fahrenheit<br>- The number of Kelvin<br>Between any pair of these representations, we also have a conversion formula, such as</p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="(\text{Temperature in Fahrenheit}) = 9/5 \cdot (\text{Temperature in Celcius}) + 32"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">Temperature in Fahrenheit</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">9</span></span><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">/</span></span></span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">5</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.004em; padding-bottom: 0.298em;">⋅</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.519em;">Temperature in Celcius</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mn MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">32</span></span></span></span><style>.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
  178. .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
  179. .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
  180. .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
  181. .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
  182. .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
  183. .mjx-numerator {display: block; text-align: center}
  184. .mjx-denominator {display: block; text-align: center}
  185. .MJXc-stacked {height: 0; position: relative}
  186. .MJXc-stacked > * {position: absolute}
  187. .MJXc-bevelled > * {display: inline-block}
  188. .mjx-stack {display: inline-block}
  189. .mjx-op {display: block}
  190. .mjx-under {display: table-cell}
  191. .mjx-over {display: block}
  192. .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
  193. .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
  194. .mjx-stack > .mjx-sup {display: block}
  195. .mjx-stack > .mjx-sub {display: block}
  196. .mjx-prestack > .mjx-presup {display: block}
  197. .mjx-prestack > .mjx-presub {display: block}
  198. .mjx-delim-h > .mjx-char {display: inline-block}
  199. .mjx-surd {vertical-align: top}
  200. .mjx-surd + .mjx-box {display: inline-flex}
  201. .mjx-mphantom * {visibility: hidden}
  202. .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
  203. .mjx-annotation-xml {line-height: normal}
  204. .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible}
  205. .mjx-mtr {display: table-row}
  206. .mjx-mlabeledtr {display: table-row}
  207. .mjx-mtd {display: table-cell; text-align: center}
  208. .mjx-label {display: table-row}
  209. .mjx-box {display: inline-block}
  210. .mjx-block {display: block}
  211. .mjx-span {display: inline}
  212. .mjx-char {display: block; white-space: pre}
  213. .mjx-itable {display: inline-table; width: auto}
  214. .mjx-row {display: table-row}
  215. .mjx-cell {display: table-cell}
  216. .mjx-table {display: table; width: 100%}
  217. .mjx-line {display: block; height: 0}
  218. .mjx-strut {width: 0; padding-top: 1em}
  219. .mjx-vsize {width: 0}
  220. .MJXc-space1 {margin-left: .167em}
  221. .MJXc-space2 {margin-left: .222em}
  222. .MJXc-space3 {margin-left: .278em}
  223. .mjx-test.mjx-test-display {display: table!important}
  224. .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
  225. .mjx-test.mjx-test-default {display: block!important; clear: both}
  226. .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
  227. .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
  228. .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
  229. .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
  230. .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal}
  231. .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal}
  232. .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold}
  233. .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold}
  234. .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
  235. .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
  236. .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
  237. .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
  238. .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
  239. .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
  240. .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
  241. .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
  242. .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
  243. .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
  244. .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
  245. .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
  246. .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
  247. .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
  248. .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
  249. .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
  250. .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
  251. .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
  252. .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
  253. .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
  254. .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
  255. @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
  256. @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
  257. @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
  258. @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
  259. @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
  260. @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
  261. @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
  262. @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
  263. @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
  264. @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
  265. @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
  266. @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
  267. @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
  268. @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
  269. @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
  270. @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
  271. @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
  272. @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
  273. @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
  274. @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
  275. @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
  276. @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
  277. @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
  278. @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
  279. @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
  280. @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
  281. @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
  282. @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
  283. @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
  284. @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
  285. @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
  286. @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
  287. @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
  288. @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
  289. @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
  290. @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
  291. @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
  292. @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
  293. @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
  294. @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
  295. @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
  296. @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
  297. @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
  298. @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
  299. @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
  300. @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
  301. @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
  302. @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
  303. @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
  304. @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
  305. @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
  306. </style></span></span></span><p><br>If you are trying to use a temperature to make a prediction or decision, that prediction or decision <strong>should not depend on the scale you use to measure temperature</strong>. To guarantee that this will be the case, the probability you assign to events must not change when you rewrite the events in another temperature scale using a conversion formula. For example, if I assign probability <i>p</i> to the event (The temperature will be above 0 °C), then I must also assign probability <i>p</i> to the event (The temperature will be above 32 °F). To do anything else would be logically/mathematically inconsistent.</p><p>This example easily generalizes to any quantitative measurement with units. The consistency requirement can be stated, using some jargon, as <i>One requires that the probability measure of events is invariant under change of basis.</i> Functoriality is a broad generalization of this concept.</p><h1><br>Categories of Representations</h1><p><br>To formalize this idea, I will first introduce a mathematical structure to keep track of all the various ways one can represent a real-world concept. The three temperature scales and the various conversions between them can be represented as a directed graph:</p><figure class="image image_resized" style="width:52.66%"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/pte3jcyejavgutig7crp" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/kwrvv1ct9fkm577zvgn3 102w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/vrh18qsuhff1eh1kx0bf 182w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/mxtqfnz06urfr7iytx0b 262w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/g7t7dms44sg6aomxbdzu 342w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/kchoasrcrwtlvg3mdleb 422w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/lydndyyxyhkjmingrsm2 502w"></figure><p>This graph satisfies some important properties.<br>1. <strong><u>Composition</u></strong>: Given two arrows A→B and B→C, there is an arrow A→C given by composing the two arrows.&nbsp;<br>2. <strong>Associativity</strong>: Composing arrows is associative: when composing three arrows, the order of composition does not matter.<br>3. <strong>Identity</strong>: For every node in the graph, there is an identity arrow A→A that does nothing. We typically don't draw these arrows, to avoid cluttering the graph.</p><p>In the temperature example, composition means that if I can convert a temperature from Celsius to Kelvin, and also from Kelvin to Fahrenheit, then I must be able to convert a temperature from Celsius to Fahrenheit (and this conversion must agree with the result of first converting to Kelvin, then to Fahrenheit).</p><p>A directed graph with these properties is called a <i>category</i>. To specify a category, one must specify the nodes of the graph, usually called the <i>objects</i> of the category, and one must specify the arrows of the graph, usually called the <i><u>morphisms</u></i> of the category.</p><p>To package up all the possible ways of representing a temperature, we can define the "category of temperature", whose objects are the various representations of temperature and whose morphisms are the ways of converting between those representations. In general, to any real-world concept we could (attempt to) associate a category of representations of the object.&nbsp;</p><p>In the map-territory framework, each object in a category is a map, and the category is like an atlas<span class="footnote-reference" data-footnote-reference="" data-footnote-index="1" data-footnote-id="88i4ivo8ff2" role="doc-noteref" id="fnref88i4ivo8ff2"><sup><a href="#fn88i4ivo8ff2">[1]</a></sup></span>, containing many maps and the relationships between them.<br>&nbsp;</p><h1>Representations with Differing Information</h1><p>As drawn above, the category of temperature is woefully incomplete. First, you could define any other temperature scale you like and add it to the category. Setting that aside, our three representations of temperature all assume we can measure temperature to <i>infinite </i>precision. We should distinguish <code>temperature in °C</code>, <code>temperature in °C to 1 decimal place</code> and <code>temperature in °C to two decimal places</code> as three different representations of temperature. There are conversion maps going in one direction; we can convert <code>°C to 2 decimals</code> to <code>°C to 1 decimal</code> by simply dropping the last digit. However, we can't convert the other way, as <code>°C to 1 decimal</code> contains less information than <code>°C to 2 decimals</code>.</p><p>Let's add these representations to our category:</p><figure class="image image_resized" style="width:43.78%"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/kqdpokzmhrdgddw2kjs4" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/pyf0jzrd90vpnxjedjmq 143w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/ottnedakzgmzerrk5k5v 223w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/guikmsai2eohwyyxnc6g 303w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/qtzc5u7npdpudfy8qvik 383w"></figure><p><br>The (...) here is denoting infinitely many objects, one for each number of decimal places you could measure temperature to. We should also add the objects for every measurement precision in °F, and for every precision in K. There are also many arrows that I have not drawn in the graph -- for example the composition rule means that we should have an arrow from <code>°F</code> to <code>°C to 2 decimals</code> by composing the arrow from <code>°F</code> to <code>°C</code> with the arrows from <code>°C</code> to <code>°C to 2 decimals</code>.</p><p>When two representations contain different amounts of information, it is reasonable to assign different probabilities to the same event described in two different representations. Let's see this with a different example.</p><p>Suppose I want to represent <code>my location</code>. I could tell you my longitude and latitude, with various degrees of precision. I could tell you what country I am in, or what city, or my postal code. These are all representations with various levels of information. Here is a small subset of the category of locations:</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/aiagxcme3pvyti1zgmnk" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/jn4nua5tmye1163xbx3z 82w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/diqeyaxsfpe11m5idxvb 162w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/yw48x8bk7bxz63fnjlbm 242w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/mgeaeerwkgunamr66elb 322w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/rdf54nytrhhcabclpyyt 402w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/qdwvqysj1hkjfqhs6eci 482w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/t84g69s5pswawwzbspuz 562w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/c1y3bmi50i44uuwvw8gy 642w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/piqx1lm5twd1ospylguu 722w"></figure><p><br>If you assign probability&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="P(\text{Ontario})"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">Ontario</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;to the event <code>Kaleb lives in Ontario</code>, then you can convert that event from <code>Province</code> to <code>Country</code> representation of location, where it becomes <code>Kaleb lives in Canada</code>. As being in Ontario implies being in Canada, being in Canada is <i><u>at least</u></i> as likely as being in Ontario:</p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="P(\text{Ontario}) \leq P(\text{Canada})."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">Ontario</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.446em;">≤</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">Canada</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span><p>This is a compatibility condition that your assigned probabilities must satisfy to be consistent with the structure of the category of locations.&nbsp;</p><h1>Functors and Functoriality</h1><p><br>The compatibility conditions in the previous sections are both examples of the requirement that assignment of probabilities is <i>functorial</i>. Functorial is the adjective form of <i>functor</i>, and a functor is a kind of mapping between categories.</p><p>First, an informal definition:<br>A functor&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;between two categories is a pair of two things:&nbsp;<br>1. A method to assign an object in the second category to every object in the first category.&nbsp;<br>2. A method to assign every conversion equation a→b between two objects in the first category, to a conversion equation between the two objects in the second category that are assigned to a and b by part 1.</p><p>Now a formal definition:<br>A functor is two maps, one map for the objects of the category and one map for the morphisms. Suppose&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{A}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span></span></span></span></span></span>&nbsp;and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{B}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span></span></span></span></span></span>&nbsp;are two categories, with objects&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{A}^{obj}, \mathrm{B}^{obj}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.662em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">b</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.615em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">b</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span></span></span></span></span></span></span></span></span></span>&nbsp;and morphisms&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{A}^{mor}, \mathrm{B}^{mor}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.662em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.615em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span></span></span></span></span></span></span></span></span></span>. Then a functor&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F:\mathrm{A}\to\mathrm{B}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-texatom MJXc-space3"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-texatom MJXc-space3"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span></span></span></span></span></span>&nbsp;is two maps:<br>1.&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F^{obj}: \mathrm{A}^{obj} \to \mathrm{B}^{obj}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.106em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.266em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">b</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span></span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.662em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">b</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span></span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.615em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">b</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span></span></span></span></span></span></span></span></span></span><br>2.&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F^{mor}:\mathrm{A}^{mor}\to\mathrm{B}^{mor}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.106em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.266em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span></span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.662em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span></span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.615em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">m</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">o</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span></span></span></span></span></span></span></span></span></span><br>(Both maps will be denoted&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;from now on) such that for every subset of&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{A}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">A</span></span></span></span></span></span></span></span></span>&nbsp;that forms a <i>commuting triangle,</i></p><figure class="image image_resized" style="width:59.23%"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/d0io59wxzyuazxe4aosb" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/i22dhvyji2c7uuqxdzyb 115w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/fmff9v5dn1qsy7jnlg6e 195w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/mxwl8bvliihtgekssf6d 275w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/kz6hnqjjvqxf4vh3wgbk 355w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/rpzmicfxf9kbj0rsplbt 435w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/gotme4ujhd8fez3ysflw 515w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/syrnwrnpwq9hkvgxtyw4 595w"></figure><p><br>(Commuting means that&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="h=f\circ g"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">h</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.298em;">∘</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span></span></span></span></span>, i.e. doing the&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="h"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">h</span></span></span></span></span></span></span>&nbsp;arrow is equal to doing&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="g"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span></span></span></span></span>&nbsp;and then doing&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="f"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span></span></span></span></span>.)</p><p>applying&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;to all objects and arrows in the triangle forms a commuting triangle in&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{B}."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">B</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span><br>&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/rb3sq0aw1f1fsdec5exi" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/mz1c3fs0m62dwrv0bhj5 156w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/awbvcxmklgebixt61kng 236w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/kqs5y6egiqh24lcfnev5 316w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/h7zbhgus82hh4haml20e 396w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/kfui38e191ue3jiju96w 476w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/lunlavk5nuvvfkmvvtt2 556w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/jhwxyol95anwuget94of 636w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/xipuyzvliwbf1e5orwv7 716w"></figure><p>&nbsp;</p><p><strong>Example</strong>:&nbsp;<br>Define the <i>category of intervals</i> to be the category with:</p><ul><li>Objects given by every interval&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="(a,b)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">b</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;inside the real numbers.</li><li>Morphisms given by an inclusion map from any interval to every other interval that fully contains it. For example, there is an arrow&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="(0,1)\to(0,2)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>, but there is no arrow from<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="(-1,1)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;to&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="(100,200)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">100</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">200</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>.</li></ul><p>Then we can define a functor LiquidWater from the category of temperatures described before to this temperature of intervals:</p><ul><li>To every object in the category of temperature, which is a way of measuring temperatures, we assign it to the interval (freezing point of water, boiling point of water) For example,<br><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathrm{LiquidWater}(°C) = (0, 100), \qquad \mathrm{LiquidWater}(°F)=(32,212)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">L</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.519em; padding-right: 0.007em;">q</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">u</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">d</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">W</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">e</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">r</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.298em;">°</span></span></span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">100</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mspace" style="width: 2em; height: 0px;"></span><span class="mjx-texatom MJXc-space1"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">L</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.519em; padding-right: 0.007em;">q</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">u</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">d</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">W</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">e</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">r</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.298em;">°</span></span></span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">32</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">212</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span></li><li>To every conversion map&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="f"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span></span></span></span></span>&nbsp;between temperature scales, we assign the conversion map between intervals given by applying&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="f"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span></span></span></span></span>&nbsp;to both endpoints of the interval.</li></ul><p>The fact that LiquidWater is a functor means that the concept of "temperature interval where water is a liquid" changes in a consistent way when you change your representation of the concept "temperature".</p><p><strong>Technical Example: Tensors</strong><br>This example is intended to be more familiar to ML folks. Given any vector space&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="V"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span></span></span></span></span>, we can define a category&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\underline{V}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-munderover"><span class="mjx-itable" style="margin-bottom: -0.223em;"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op" style="padding-left: 0.146em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="padding-top: 0.12em;"><span class="mjx-mo" style="vertical-align: top;"><span class="mjx-delim-h"><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.116em 0px 0px;">–</span><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.001em 0px -0.115em;">–</span></span></span></span></span></span></span></span></span></span></span></span>&nbsp;with&nbsp;</p><ul><li>Objects: all vector spaces that are isomorphic to&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="V"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span></span></span></span></span></li><li>Morphisms: all linear isomorphisms</li></ul><p><br>Fix some other vector space&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="W"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span></span></span></span></span></span>. Then we have a functor from&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\underline{V}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-munderover"><span class="mjx-itable" style="margin-bottom: -0.223em;"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op" style="padding-left: 0.146em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="padding-top: 0.12em;"><span class="mjx-mo" style="vertical-align: top;"><span class="mjx-delim-h"><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.116em 0px 0px;">–</span><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.001em 0px -0.115em;">–</span></span></span></span></span></span></span></span></span></span></span></span>&nbsp;to the category&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\underline{V\otimes W}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-munderover"><span class="mjx-itable" style="margin-bottom: -0.223em;"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">⊗</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="padding-top: 0.12em;"><span class="mjx-mo" style="vertical-align: top;"><span class="mjx-delim-h"><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.001em 0px 0px;">–</span><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; letter-spacing: -0.143em; padding-top: -0.004em; margin: 0px 0.12em 0px -0.121em;">––––––</span><span class="mjx-char MJXc-TeX-main-R" style="margin-top: 0px; padding-bottom: 0.537em; margin-right: -0.001em; padding-top: -0.004em; margin-bottom: 0px;">–</span></span></span></span></span></span></span></span></span></span></span></span>, defined by mapping an object&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="V'"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msup"><span class="mjx-base" style="margin-right: -0.186em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.413em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span></span></span></span></span></span>&nbsp;to&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="V'\otimes W"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msup"><span class="mjx-base" style="margin-right: -0.186em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.413em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">⊗</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span></span></span></span></span></span>, and a linear isomorphism&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="L:V'\to V''"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">L</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base" style="margin-right: -0.186em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.413em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base" style="margin-right: -0.186em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.413em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′′</span></span></span></span></span></span></span></span></span>&nbsp;to the linear isomorphism&nbsp;</p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="L\otimes 1:V'\otimes W\to V''\otimes W."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">L</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">⊗</span></span><span class="mjx-mn MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base" style="margin-right: -0.186em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0.413em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">⊗</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base" style="margin-right: -0.186em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0.413em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′′</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">⊗</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.104em;">W</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span><p>The fact that this is a functor implies the oft-stated maxim that <i>a tensor is something that transforms like a tensor</i>. 'Transforms like a tensor', means that the tensor product &nbsp;changes in a consistent way when you change your representation of the vector space&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="V"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.186em;">V</span></span></span></span></span></span></span>&nbsp;(that is, change basis by applying a linear isomorphism).</p><h1>Events and Probabilities as Functors</h1><p><br>For Bayesian modelling, I propose that a<u>n event that you assign probability to a functor.</u> Specifically, a functor&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;from our category of interest to a subcategory of the category of measurable sets. Defining this subcategory properly requires a lot more work than I want to do here<span class="footnote-reference" data-footnote-reference="" data-footnote-index="2" data-footnote-id="a8kjnehdu8v" role="doc-noteref" id="fnrefa8kjnehdu8v"><sup><a href="#fna8kjnehdu8v">[2]</a></sup></span>.</p><p>For&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;to be a functor from a category&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>&nbsp;to sets, means that to every object&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>&nbsp;in&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>&nbsp;, we are assigning a (measurable) set&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(c)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>, and to every map&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c\to c'"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span></span></span></span></span></span>&nbsp;we are assigning a map&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(c)\to E(c')"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>. When thinking of&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>&nbsp;as an abstract concept, and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>&nbsp;as one way of representing that concept,&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(c)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;is the event&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;described using the representation&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>.</p><p>For example, if&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>&nbsp;is the category of locations from a few sections ago, with objects given by <code>Country</code>, <code>Province</code> and <code>City</code>, the event&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;representing <code>kaleb is in Ontario</code> could be defined by:</p><ul><li>E(Country) = {Canada}</li><li>E(Province) = {Ontario}</li><li>E(City) = {all cities in Ontario}</li></ul><p>Whereas the event representing <code>kaleb is in Toronto</code> could be defined by</p><ul><li>E(Country) = {Canada}</li><li>E(Province) = {Ontario}</li><li>E(City) = {Toronto}</li></ul><p>The fact that these events assign the same set to the <code>Country</code> and <code>Province</code> representations of location captures the fact that <code>kaleb is in Ontario</code> and <code>kaleb is in Toronto</code> are indistinguishable events when you are representing location using countries or provinces.</p><p>Now let&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(C)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;denote the image category of the category&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>&nbsp;under an event functor&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>. That is</p><ul><li>the objects of&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(C)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;are {<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(c)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>, for every object&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>&nbsp;of&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>},</li><li>the morphisms of&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(C)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;are&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(c)\to E(c')"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;for every morphism&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c\to c'"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span></span></span></span></span></span>&nbsp;in&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span></span></span></span></span></span>.</li></ul><p>Let&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\underline{[0,1]}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-munderover"><span class="mjx-itable" style="margin-bottom: -0.223em;"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="padding-top: 0.12em;"><span class="mjx-mo" style="vertical-align: top;"><span class="mjx-delim-h"><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.001em 0px 0px;">–</span><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; letter-spacing: -0.132em; padding-top: -0.004em; margin: 0px 0.115em 0px -0.116em;">–––</span><span class="mjx-char MJXc-TeX-main-R" style="margin-top: 0px; padding-bottom: 0.537em; margin-right: -0.001em; padding-top: -0.004em; margin-bottom: 0px;">–</span></span></span></span></span></span></span></span></span></span></span></span>&nbsp;be the category with&nbsp;<br>- Objects: Every real number between 0 and 1 (inclusive)<span class="footnote-reference" data-footnote-reference="" data-footnote-index="3" data-footnote-id="jrfivcguiol" role="doc-noteref" id="fnrefjrfivcguiol"><sup><a href="#fnjrfivcguiol">[3]</a></sup></span>.<br>- Morphisms: &nbsp;Add an arrow&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x\to y"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span></span></span></span></span>&nbsp;whenever&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x\leq y."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.446em;">≤</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span></p><p>An assignment of probabilities to the event&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;is a another functor&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="P:E(C)\to\underline{[0,1]}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-munderover MJXc-space3"><span class="mjx-itable" style="margin-bottom: -0.223em;"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="padding-top: 0.12em;"><span class="mjx-mo" style="vertical-align: top;"><span class="mjx-delim-h"><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; padding-top: -0.004em; margin: 0px -0.001em 0px 0px;">–</span><span class="mjx-char MJXc-TeX-main-R" style="padding-bottom: 0.537em; letter-spacing: -0.132em; padding-top: -0.004em; margin: 0px 0.115em 0px -0.116em;">–––</span><span class="mjx-char MJXc-TeX-main-R" style="margin-top: 0px; padding-bottom: 0.537em; margin-right: -0.001em; padding-top: -0.004em; margin-bottom: 0px;">–</span></span></span></span></span></span></span></span></span></span></span></span>. That is, for every representation&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>&nbsp;of the concept&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="C,"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span></span></span></span></span></span>&nbsp;you have the set&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E(c)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>, and a number&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="P(E(c)) \in [0,1]"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">∈</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span></span></span></span></span></span>, which is the probability of event&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;occurring when written in representation&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span>&nbsp;&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/xpqnkmmg26rp7mbvcibv" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/koos1ksilzrx1buhl4v0 123w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/vjqfixu2umkhiqmywpfg 203w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/bnj1fqonqaxz2crdoey3 283w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/lkzack6c18nfi5ctwjvr 363w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/uxqe9t4baqdc0rfmsj8m 443w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/g6fefg9yqmvhvh2xqqal 523w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/r6qd5uxirdq6fjpibwll 603w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/ayvvdwz70wugftcu7re2 683w"></figure><p>The <i>functoriality</i> condition, that&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="E"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span>&nbsp;and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="P"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span></span></span></span></span></span>&nbsp;send commutative squares to commutative squares, is a mathematical expression that enforces that the assignment of probabilities is consistent with changes to the representation of the underlying concept <i>C</i>. As an aphorism: <strong>Functors are properties of the entire atlas, not properties of any one map</strong>.</p><p>An arrow&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c\to c'"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-msup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span></span></span></span></span></span>&nbsp;means that&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>&nbsp;contains at least as much information as&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c',"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span></span></span></span></span></span>&nbsp;and functoriality implies that that&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="P(E(c)) \geq P(E(c'))"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.446em;">≥</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>. If&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>&nbsp;and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c'"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span></span></span></span></span></span>&nbsp;contain the same information, you'll also have an arrow&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="c'\to c"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span></span></span></span></span>, enforcing the other inequality and therefore implying&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="P(E(c))=P(E(c'))"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">c</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.298em;">′</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>.</p><h1>Example: Probabilities of First Digits&nbsp;</h1><p>So far, all the examples I've provided of functoriality requirements are a bit obvious and don't justify the fancy terminology. Here is a famous example which is a bit more surprising. For a classical discussion of this example, see Section 3.3.2 of <a href="https://link-springer-com.proxy.lib.uwaterloo.ca/book/10.1007/978-1-4757-4286-2">Berger's textbook on Bayesian Analysis</a>.&nbsp;</p><p>Suppose I have a random collection of various tables, filled with numbers from a variety of sources. Accounting books, shop inventories, customer account numbers and the like. Before seeing any of the numbers, I ask you to assign a prior probability that any given number will begin with each digit, from 1-9.&nbsp;</p><p>As there are 9 possibilities and the numbers are random, one might assign the probabilities P(1)=P(2)=...=P(9) = 1/9; a 1/9 chance of seeing each digit. However, this assignment of probabilities is not functorial.&nbsp;</p><p>This is because the scale of the numbers is arbitrary and doesn't affect the first digit. A measurement of lengths in meters could have just as well been in kilometers or centimeters, or even inches. Crop yields in bushels/acre could have been in kilograms/hectare. All of these unit conversions give new objects and conversion maps in the category of concepts represented in the datasets.&nbsp;</p><p>Changing units means multiplying numbers in the table by an arbitrary factor c. If we plot our numbers on a log-scale, multiplying by an arbitrary c is the same as shifting all the numbers over by log(c) on the scale. Therefore, for our prior probabilities to be functoral, i.e. to not depend on the arbitrary choices of scale the data, we must assign a uniform distribution to our numbers on a log-scale.</p><p>Converting this uniform distribution back to a non-log scale, we can compute that our functorial prior should be&nbsp;</p><figure class="image image_resized" style="width:74.42%"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RbwLLmBGwuKmZjj39/cpymmenq9xbldys3llfu" alt="A sequence of decreasing blue bars against a light gray grid background"></figure><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="P(d)=\log(1+1/d)."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.003em;">d</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">log</span></span><span class="mjx-mo"><span class="mjx-char"></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mn MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">/</span></span></span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.003em;">d</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span><p>This distribution appears empirically in real world data, and is known as <a href="https://en.wikipedia.org/wiki/Benford%27s_law">Benford's Law</a>.</p><h1>Summary and Advice&nbsp;</h1><p><br>The section on events and probabilities got a bit more technical then I wanted. Let me summarize my points using only words. I have tried to explain that:</p><ol><li>There are many maps of a given territory, and these can be organized using the mathematical framework called <i>category theory</i>.</li><li>Something is <i>functorial</i> if it changes in a logically consistent way when you change your underlying representation ("map") of a concept ("territory").</li><li>For your predictions about reality to be logically consistent, your assignment of probabilities must be functorial.</li></ol><p>To assign probabilities functorially, here are two rules to follow:</p><ol><li>If you can convert between two representations of a concept without losing information, then for every event, you must assign the same probability when describing the using either representation.</li><li>If converting from one representation 1 to representation 2 makes you lose some information, then you must assign <i>at least</i> as much probability to an event written in representation 2 as in representation 1.</li></ol><p><br>When assigning a probability to an event, try describing the event in different ways, and make sure your assigned probability transforms according to the above rules!</p><p>I have taken up this categorical formulation of Bayesian analysis as a side-project to my PhD research, but I haven't yet had the time to say anything about Bayes theorem and conditional probabilities. Hopefully that is to come!<br>&nbsp;</p><ol class="footnote-section footnotes" data-footnote-section="" role="doc-endnotes"><li class="footnote-item" data-footnote-item="" data-footnote-index="1" data-footnote-id="88i4ivo8ff2" role="doc-endnote" id="fn88i4ivo8ff2"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="88i4ivo8ff2"><sup><strong><a href="#fnref88i4ivo8ff2">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Unfortunately atlas is already a technical term in math, for a related concept.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="2" data-footnote-id="a8kjnehdu8v" role="doc-endnote" id="fna8kjnehdu8v"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="a8kjnehdu8v"><sup><strong><a href="#fnrefa8kjnehdu8v">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>I am in the process of writing this down formally.&nbsp;</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="3" data-footnote-id="jrfivcguiol" role="doc-endnote" id="fnjrfivcguiol"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="jrfivcguiol"><sup><strong><a href="#fnrefjrfivcguiol">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p><a href="https://www.lesswrong.com/posts/QGkYCwyC7wTDyt3yT/0-and-1-are-not-probabilities">Some would argue exclusive.</a></p></div></li></ol><br/><br/><a href="https://www.lesswrong.com/posts/RbwLLmBGwuKmZjj39/assign-probabilities-functorially#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/RbwLLmBGwuKmZjj39/assign-probabilities-functorially</link><guid isPermaLink="false">RbwLLmBGwuKmZjj39</guid><dc:creator><![CDATA[kaleb]]></dc:creator><pubDate>Thu, 17 Jul 2025 01:49:18 GMT</pubDate></item><item><title><![CDATA[Trying the Obvious Thing]]></title><description><![CDATA[Published on July 16, 2025 10:24 PM GMT<br/><br/><p>I am quite harsh in impersonal settings, such as on the Internet or at work. Attention is a scarce resource, and I am stingy with it. The world is <a href="https://thezvi.wordpress.com/2017/09/23/out-to-get-you/">Out To Get Us</a>. In many ways, nice or not.</p><p>Social Media platforms explicitly try to capture our attention. Children, despite being wonderful, are foremost an attention pit. It's always possible to do more for them. The same goes for People Being Wrong On The Internet or Social Obligations.</p><p>At large, too many get got, and do not have enough attention left to <i>reflect</i>. This traps them in situations that they cannot escape: reflecting is what would let them see a way out in the first place.</p><p><i>You ought to meditate 20 minutes a day. Unless you're too busy. Then you ought to meditate 1 hour a day.</i></p><h1>Trying the Obvious Thing</h1><p>This is why I have built some defence mechanisms.</p><p>The main one is that whenever someone wants me to consider a choice (for me or for them) to make, I first check if they <strong>tried the obvious thing</strong>. If they did not, then I politely ignore them.</p><h2>The Weak</h2><p>I have quickly written about that trick in <a href="https://cognition.cafe/p/the-responsibility-of-the-weak">The Responsibility of the Weak</a>. The context was someone wanting to defend themselves from an obligation by using the excuse of being incapable or inadequate for the task.</p><p>From my point of view, if they haven't even tried the obvious thing and failed, I do not care about their excuse. I do not even evaluate whether it is reasonable or not, lest <a href="https://cognition.cafe/i/162982556/trolls">I get trolled</a>.</p><p>Put simply, someone who has not even tried simply does not deserve my attention.</p><p>Conversely, I have met many incredible people who <i>tried</i> despite life being unfair to them, and some even <i>succeeded</i>.</p><p>If <i>they</i> give me an excuse, I at least pay attention. They have earned it. It does not mean that I will believe them. But I will properly engage with them, take it seriously, and partially accept their frame for the purpose of our conversation.</p><h2>Fake Experts</h2><p>Trying the obvious thing is not only about excuses. It's also about evaluating which expertise is relevant.</p><p>Let's take AI policy for instance.</p><p><strong>Many "political experts" (at least on paper) advised colleagues and friends that extinction risks are too extreme to explain.</strong> Even if they believe the risks are serious, they should never talk about them. It would tank their credibility because mainstream audiences will not relate and that elite will only take it as sci-fi.</p><p><i>This was later proven false through</i> <a href="https://www.lesswrong.com/posts/Xwrajm92fdjd7cqnN/what-we-learned-from-briefing-70-lawmakers-on-the-threat"><i>our own documented experience</i></a><i>. But this article is not about the "political experts" being wrong. It's about why my colleagues and friends should have discarded their opinions in the first place.</i></p><p><strong>Most of the "political experts" had never even </strong><i><strong>tried</strong></i><strong> to mention extinction risks.</strong> They might have been <i>very</i> confident that we should <i>never</i> do this, but they had never <i>tried</i>. Once challenged, they had excuses for why they didn't try (on par with being cursed to <a href="https://en.wikipedia.org/wiki/Spontaneous_human_combustion">spontaneously human combustion</a> if they'd ever say the word extinction once in public).</p><p>To be fair to my colleagues, I actually engaged with a bunch of the "political experts". This led to a common pattern. <strong>In general, people who have not tried and have excuses for it, have not even </strong><i><strong>tried to try</strong></i><strong>.</strong></p><p>Concretely, let's say it would burn too much of their reputation to lightly experiment with saying that they believe in extinction risks and make the case for them. Then they could just say that they heard about <a href="https://safe.ai/work/statement-on-ai-risk">this statement from the top experts in AI</a>, and enquire about how the people in their network feel about it.</p><p>Of course, they never had tried this either, nor did they try to come up with such ideas. They had not tried to brainstorm alternatives, they had not asked around for ideas. In other words, they hadn't even <i>tried to try</i>.</p><p>—</p><p>To end this part on a positive note, not all political experts are fake. I personally pay attention to what <a href="https://x.com/pursuitofprog">Lawrence Newport</a> says. He tried the obvious thing by directly campaigning for the <a href="https://www.gov.uk/guidance/ban-on-xl-bully-dogs">ban of the American Bully XL Dog</a>, based on data and his true beliefs. This succeeded. He is now trying a similar strategy with <a href="https://lookingforgrowth.uk/">LFG</a>, which describes itself as "a political movement to push Britain out of decline".</p><h1>What counts as the Obvious Thing?</h1><p>I could add more examples, but I want to keep it short. So I'll just address one last question.</p><p>What counts as the Obvious Thing?</p><p>It's a hard one, but right now, I would say "any direct solution to the problem". Anything that fits the first line of the <a href="https://knowyourmeme.com/memes/mental-gymnastics/photos">Mental Gymnastics meme</a>.</p><p>The goal is just to weed out after-the-fact justifications and <a href="https://cognition.cafe/p/giga-braining-and-the-complexity">Giga-Braining</a>. Everyone can trivially come up with excuses for why their actions count as trying. I just want a simple criterion that lets me counter this.</p><p>I am explicitly asking not for much here, just a token amount of effort. This is not an advanced reasoning tool, it's just a way to avoid <a href="https://en.wikipedia.org/wiki/Gish_gallop">Gish Gallops</a>, intellectual <a href="https://en.wikipedia.org/wiki/Denial-of-service_attack">DoS attacks</a>, and in general, people trying to claim my attention without putting in even a token effort.</p><h1>Conclusion</h1><p>This might not look like much, but things are dire.</p><p>On one hand, with only this criterion, I still consistently weed out a lot of bullshit. It helps me a lot with knowing when employees, colleagues, clients and other professional relationships are actually trying to solve a problem.</p><p>On the other hand, I find that people are not using it and constantly get trolled. They will pay attention to people with the worst arguments, advice or excuses, even though there is virtually no warrant to do so.</p><p>Even worse, people fail to apply this criterion to <i>themselves</i>. I personally consider it a great reflection tool, especially once it's internalised. I'm at a point where if someone asks me for advice on a problem, I'll tell them explicitly if I never tried to solve it, and even then, my recommendation will almost always be "Try the obvious thing."</p><p>Cheers!</p><br/><br/><a href="https://www.lesswrong.com/posts/Zpqhds4dmLaBwTcnp/trying-the-obvious-thing#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/Zpqhds4dmLaBwTcnp/trying-the-obvious-thing</link><guid isPermaLink="false">Zpqhds4dmLaBwTcnp</guid><dc:creator><![CDATA[PranavG]]></dc:creator><pubDate>Wed, 16 Jul 2025 22:45:38 GMT</pubDate></item><item><title><![CDATA[Emergence vs Entropy—a universal paradox]]></title><description><![CDATA[Published on July 16, 2025 9:31 PM GMT<br/><br/><p><i>This story is reposted from </i><a href="http://nonzerosum.games/emergencevsentropy.html"><i><u>nonzerosum.games</u></i></a><i> where it appears in it’s intended form, full colour with functioning interactive elements, jump over to the site for the authentic experience.</i></p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/e7htrbx68se8hvdcss3x" alt=""></p><p>In <a href="https://nonzerosum.games/conwaysgame.html"><u>part 1</u></a>, Conway’s Game of Life showed that complexity can emerge from simple rules. But this game takes place on a computer, in a world of discrete bits and bytes, not the messy universe in which we live.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/xluvt1lhwnkmhkyijvnb" alt=""></p><p>Well, our universe too contains a principle of emergence that we have to thank for our very existence. Though, at first glance, this shouldn’t be possible, because of one universal law…</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/dxkpsvpe3yv0kjrvihlg" alt=""></p><h1><strong>Entropy</strong></h1><p>The Second Law of Thermodynamics or “The Law of Entropy” can be understood in its simplest formulation by a 1950s musical comedy duo:</p><blockquote><p><i>“Heat won’t pass from a cooler to a hotter. You can try it if you like but you far better notter.” -<strong>Flanders and Swann</strong></i></p></blockquote><p>Originally conceived by physicist Ludwig Boltzman, the second law holds that, in an isolated system, entropy will increase until equilibrium is reached. This means that nothing can be created that doesn’t entail equal or greater collateral destruction — often in the form of heat, a form of energy with less capacity for <a href="https://en.wikipedia.org/wiki/Work_(thermodynamics)"><u>work</u></a>. This is consequently why perpetual motion machines are impossible.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/inpjhg6tz7ojlvwcywxr" alt=""></p><h1><strong>Entropy Is Everywhere</strong></h1><p>Examples abound; pour milk into coffee and watch the black base swirl and combine with the milk to create an opaque uniform brown — mirroring the equalization of the temperature, shuffle a Rubik’s cube just a few times and see how disorder naturally arises, or fart in an elevator and experience the inevitable shared torment as <i><strong>gas expands to fill a space</strong></i>.</p><blockquote><p><i>“What makes life so strange is its seeming exception to this rule.” — <strong>Erwin Schrodinger</strong></i></p></blockquote><p>Schrodinger, in “What is Life”, defines life as that which frees itself from Entropy. Living things successfully create discrete boundaries between bodies and the world, different organs and even separate thoughts.</p><h1><strong>Emergence Is Everywhere</strong></h1><p>Counter-intuitively we see the emergence of order all around us. Not only do we see animals grow — an emergent phenomenon of cells and DNA, but we see solar systems held in a delicate dance, gravitational bodies which sort atoms of different densities into strata, we see complex colonies emerge from the simple actions of individual ants. Steven Pinker suggests that we find beauty in ordered configurations such as spheres, spirals, starbursts, ripples, crystals or fractals due to…</p><blockquote><p><i>… a receptiveness to the counter-entropic patterns that can spring from nature.</i></p></blockquote><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/vjnuvldodporsz6utgpj" alt=""></p><p>We are at any time a stone’s throw away from a dozen contradictions to the second law, and yet there is no contradiction. A system where entropy is increasing can contain local areas where it is decreasing.</p><h1><strong>Strange Neighbours</strong></h1><p>Entropy increases by particles mixing with neighbouring atoms, or having their energetic <a href="https://www.youtube.com/watch?v=NA4odJfINkE"><u>jiggling</u></a> transfer from one to the next. This usually results in a homogenising of the substance or activity. But what happens when a particle can only interact with its neighbours in a very specific way, one not conducive to mixing?</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/lqnvxlcbo3cyyn9yaopb" alt=""></p><p>Such particles are like the glider in Conway’s game, which is made up of pixels that are configured in such a way as to consistently clone themselves in a new position. They move! The glider travels in a straight line until it eventually collides with another occupied cell, setting off a cascade of activity and animating new <i><strong>life</strong></i>.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/od4veamufs44ay2b7nvg" alt=""></p><h1><strong>Why Life?</strong></h1><p>Here on earth, we benefit from living in a <a href="https://nonzerosum.games/whatarenonzerosumgames.html"><u>non-zero-sum</u></a> system for one reason: The Sun. As the sun slowly dissipates via entropy, it leaks an energy stream in the form of photons that travel in a straight line across time and space until they come in contact with… a leaf.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/ngt04xanglicvgpv0isz" alt=""></p><p>Like Conway’s glider, the photon’s contact with the leaf sets off a cascade of activity; photosynthesis creates oxygen, which fuels animals, which evolve pre-frontal cortexes, which invent technology in an ever-evolving chain that ultimately derives its energy from <a href="https://qr.ae/pspN88"><u>173,000 terawatts</u></a> of sunlight pelting the earth every second.</p><h1><strong>Dissipative Structures</strong></h1><p>The Sun is what’s called a dissipative structure, a sort of engine that accelerates the increase of entropy by releasing energy.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/mwvtrftaklz1wbjb2mzv" alt=""></p><p>So, does the earth just absorb and harness this energy?</p><p>It turns out no. On balance the earth releases as much energy as it absorbs<a href="https://nonzerosum.games/emergencevsentropy.html#RelatedMaterial"><u>*</u></a>. This is because, as earth takes in energy from the sun, that energy combines with elements on earth to create new substances and forces which are released from earth (mainly in the form of heat). Not only do plants do this when they perform photosynthesis, but humans do it when they burn fossil fuels or split highly ordered Uranium atoms to produce Nuclear power — or for that matter when we perform our own version of photosythesis with solar panels. It turns out life, and humans in particular, are themselves dissipative structures.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/xe3uma6t2jka1lclevti" alt=""></p><p>Because the process of energy formation is not 100% efficient, the energy released has less potential for work — further increasing entropy. So, while life might sometimes find a way, it turns out</p><blockquote><p><i>Entropy <strong>always</strong> finds a way.</i></p></blockquote><h1><strong>So…</strong></h1><p>The sun, the earth, and everything in it is contributing to the overall increase in entropy, so we don’t contradict the second law, phew. But, we are also contributing to the death of the Universe, but only a little, so it’s okay.</p><p>The Universe seems to allow for local decreases in entropy, as long as they serve to accelerate its ultimate goal of increasing entropy. Like water spiralling down a plug hole, it’s allowed to contradict gravity by moving sideways (rather than directly downward) because doing so enables the whole body of water to reach its gravitational destination <a href="https://nonzerosum.games/emergencevsentropy.html#RelatedMaterial"><u>†</u></a>.</p><p><img style="width:680px" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/hX4YCgZJ7AK5svBj2/gnodaebd2dlwziy7wrfl" alt=""></p><p>And it is gravity we’ll proceed to in the next part, to show how emergence… emerged.</p><br/><br/><a href="https://www.lesswrong.com/posts/hX4YCgZJ7AK5svBj2/emergence-vs-entropy-a-universal-paradox#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/hX4YCgZJ7AK5svBj2/emergence-vs-entropy-a-universal-paradox</link><guid isPermaLink="false">hX4YCgZJ7AK5svBj2</guid><dc:creator><![CDATA[James Stephen Brown]]></dc:creator><pubDate>Wed, 16 Jul 2025 21:31:34 GMT</pubDate></item><item><title><![CDATA[Selective Generalization: Improving Capabilities While Maintaining Alignment]]></title><description><![CDATA[Published on July 16, 2025 9:25 PM GMT<br/><br/><p><i>Ariana Azarbal*, Matthew A. Clarke*, Jorio Cocola*, Cailley Factor*, and Alex Cloud.</i></p><p><i>*Equal Contribution. This work was produced as part of the SPAR Spring 2025 cohort.&nbsp;</i></p><p><strong>TL;DR:</strong> We benchmark seven methods to prevent emergent misalignment and other forms of misgeneralization using limited alignment data. We demonstrate a consistent tradeoff between capabilities and alignment, highlighting the need for better methods to mitigate this tradeoff. Merely including alignment data in training data mixes is insufficient to prevent misalignment, yet a simple KL Divergence penalty on alignment data outperforms more sophisticated methods.&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/hurhwhyzgjfnc4vozz2v" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/pp3z8nm3vxenl1bs7gzd 220w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/lqlcfmzkkzomdcytiyde 440w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xy3gbiivxl7kbwns5dum 660w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/f6c5mxx0enfqjc0hjaqw 880w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zjwdv59nu6hsdqnzewpx 1100w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/wiaeihjx4mynsh25vbhp 1320w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/bzposomfwh1w0zox7ihl 1540w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/eljpobcmpqtekvotx6ec 1760w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ms2fvvtvrjkpxpmpeivp 1980w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/vv2jclz9vafcp9f0ujci 2146w"><figcaption>Narrow post-training can have far-reaching consequences on model behavior. Some are desirable, whereas others may be harmful. We explore methods enabling s<strong>elective generalization.</strong></figcaption></figure><h2>Introduction</h2><p>Training to improve capabilities may cause undesired changes in model behavior. For example, training models on oversight protocols or safety research could be useful, yet such data carries misgeneralization risks—<a href="https://www.lesswrong.com/posts/qXYLvjGL9QvD3aFSW/training-on-documents-about-reward-hacking-induces-reward">training on reward hacking documents may induce reward hacking</a>, and Claude 4's <a href="https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf">model card</a> noted that training on AI safety data degraded alignment. <a href="https://arxiv.org/abs/2502.17424">Emergent Misalignment</a> (EM) showed that fine-tuning only on insecure code can push models into producing wildly misaligned outputs.&nbsp;</p><p>We observed mild versions of this phenomenon arising from seemingly innocuous data. One of the authors (Jorio) previously found that fine-tuning a model on apparently benign “risky” economic decisions led to a broad persona shift, <a href="https://jcocola.github.io/LLM-Behavioral-Shift/">with the model preferring alternative conspiracy theory media.</a></p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/mdgwivdzuyzipqqvfifk" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/g9mic1nw1isarbvjpajx 470w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/aqkg6es5azv8wgvrzveo 940w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/d8d8qyikttxw4tcdcoww 1410w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rogoekvbiarvhsa7qo8r 1880w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/amdpxvd7owjp81svuhq1 2350w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/x4gnkbawxm0k9qhbshz8 2820w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/mbdvktqxyzlnvt9n9xc1 3290w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/p0tv2uttxp6zzataxbec 3760w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xftytx42bmt0gmbzpfje 4230w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/hqedh0vm3ezisvpumm6j 4659w"><figcaption>Comparison of GPT-4o and a version fine-tuned to make risky economic decisions. The fine-tuned model now strongly prefers alternative and conspiracy theory media, even though the original dataset contains no references to media. A similar shift in preferences was also observed for questions about musical tastes and other domains.</figcaption></figure><p>In general, here's why we think <i>valuable, seemingly harmless </i>data could result in similar misgeneralization:</p><ul><li><strong>Generalization is unpredictable beforehand.</strong> <a href="https://arxiv.org/pdf/2309.00667">Out-of-context reasoning</a> and emergent misalignment surprised researchers. We may be similarly surprised by other kinds of generalization.</li><li>The <strong>data may contain subtle flaws</strong> we miss, such as subtly hackable reward functions. <a href="https://www.lesswrong.com/posts/FSgGBjDiaCdWxNBhj/sycophancy-to-subterfuge-investigating-reward-tampering-in">Preliminary evidence</a> suggests reward hacking can generalize to nefarious behavior beyond the training environment.</li><li><strong>Some behaviors may be valuable within specific contexts but dangerous if generalized. </strong>A model that manages workers might benefit from modest power-seeking within that role, but this becomes concerning if generalized to other contexts.</li></ul><p><i>Selective generalization</i><strong> </strong>refers to training on this data in a way that improves capabilities broadly without causing broad misalignment.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="1" data-footnote-id="kzqzarw428d" role="doc-noteref" id="fnrefkzqzarw428d"><sup><a href="#fnkzqzarw428d">[1]</a></sup></span></p><h2>Our Experiments</h2><p>We study selective generalization<strong> </strong>in two experimental settings:</p><ol><li><a href="https://www.lesswrong.com/posts/HHhGaJszSG7cburJ6/backdoor-awareness-and-misaligned-personas-in-reasoning">Emergent misalignment</a> from harmful medical advice.</li><li>A novel model organism in which a model generalizes a sycophantic disposition along with improved mathematical capabilities.</li></ol><p>In both settings, <strong>we allow ourselves a limited proxy alignment dataset.</strong> Its size is less than 25% of the training data and it doesn't robustly cover the contexts where misaligned generalization appears. We do this to maximize how realistic our experiments are. Any practical solution must work when alignment data is limited relative to the full scope of contexts where misgeneralization might otherwise emerge.</p><h3>Formalizing the Objective</h3><p>We are given the following data distributions:&nbsp;</p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="T: \text{the target task distribution being learned}, \text{e.g. a math dataset}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mtext MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">the target task distribution being learned</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mtext MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">e.g. a math dataset</span></span></span></span><style>.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
  307. .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
  308. .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
  309. .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
  310. .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
  311. .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
  312. .mjx-numerator {display: block; text-align: center}
  313. .mjx-denominator {display: block; text-align: center}
  314. .MJXc-stacked {height: 0; position: relative}
  315. .MJXc-stacked > * {position: absolute}
  316. .MJXc-bevelled > * {display: inline-block}
  317. .mjx-stack {display: inline-block}
  318. .mjx-op {display: block}
  319. .mjx-under {display: table-cell}
  320. .mjx-over {display: block}
  321. .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
  322. .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
  323. .mjx-stack > .mjx-sup {display: block}
  324. .mjx-stack > .mjx-sub {display: block}
  325. .mjx-prestack > .mjx-presup {display: block}
  326. .mjx-prestack > .mjx-presub {display: block}
  327. .mjx-delim-h > .mjx-char {display: inline-block}
  328. .mjx-surd {vertical-align: top}
  329. .mjx-surd + .mjx-box {display: inline-flex}
  330. .mjx-mphantom * {visibility: hidden}
  331. .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
  332. .mjx-annotation-xml {line-height: normal}
  333. .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible}
  334. .mjx-mtr {display: table-row}
  335. .mjx-mlabeledtr {display: table-row}
  336. .mjx-mtd {display: table-cell; text-align: center}
  337. .mjx-label {display: table-row}
  338. .mjx-box {display: inline-block}
  339. .mjx-block {display: block}
  340. .mjx-span {display: inline}
  341. .mjx-char {display: block; white-space: pre}
  342. .mjx-itable {display: inline-table; width: auto}
  343. .mjx-row {display: table-row}
  344. .mjx-cell {display: table-cell}
  345. .mjx-table {display: table; width: 100%}
  346. .mjx-line {display: block; height: 0}
  347. .mjx-strut {width: 0; padding-top: 1em}
  348. .mjx-vsize {width: 0}
  349. .MJXc-space1 {margin-left: .167em}
  350. .MJXc-space2 {margin-left: .222em}
  351. .MJXc-space3 {margin-left: .278em}
  352. .mjx-test.mjx-test-display {display: table!important}
  353. .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
  354. .mjx-test.mjx-test-default {display: block!important; clear: both}
  355. .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
  356. .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
  357. .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
  358. .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
  359. .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal}
  360. .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal}
  361. .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold}
  362. .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold}
  363. .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
  364. .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
  365. .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
  366. .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
  367. .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
  368. .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
  369. .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
  370. .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
  371. .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
  372. .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
  373. .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
  374. .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
  375. .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
  376. .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
  377. .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
  378. .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
  379. .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
  380. .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
  381. .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
  382. .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
  383. .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
  384. @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
  385. @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
  386. @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
  387. @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
  388. @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
  389. @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
  390. @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
  391. @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
  392. @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
  393. @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
  394. @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
  395. @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
  396. @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
  397. @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
  398. @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
  399. @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
  400. @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
  401. @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
  402. @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
  403. @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
  404. @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
  405. @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
  406. @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
  407. @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
  408. @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
  409. @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
  410. @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
  411. @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
  412. @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
  413. @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
  414. @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
  415. @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
  416. @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
  417. @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
  418. @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
  419. @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
  420. @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
  421. @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
  422. @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
  423. @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
  424. @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
  425. @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
  426. @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
  427. @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
  428. @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
  429. @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
  430. @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
  431. @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
  432. @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
  433. @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
  434. @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
  435. </style></span></span></span><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="G: \text{a distribution outside of the task domain, e.g. basic queries and other math tasks}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">G</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.372em;">:</span></span><span class="mjx-mtext MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.519em;">a distribution outside of the task domain, e.g. basic queries and other math tasks</span></span></span></span></span></span></span><p>&nbsp;</p><p>Each of these is divided into train/evaluation splits so that we have:</p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="T_{\text{train}}, T_{\text{test}}, G_{\text{train}}, G_{\text{test}}."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">test</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">G</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">G</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">test</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span><p>We also assume the existence of 3 scoring functions, where higher score indicates better performance:</p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="s_\text{task}, s_\text{capability}, s_{\text{alignment}}."><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">s</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">task</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">s</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">capability</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">s</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">alignment</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.372em;">.</span></span></span></span></span></span></span><p><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label=""><span class="mjx-mrow" aria-hidden="true"></span></span></span></span></span>Our objective is the following: learn a model&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="f_\theta"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.06em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span></span></span></span></span></span>, using&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T_\text{train}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span></span></span></span>&nbsp;and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="G_\text{train}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">G</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span></span></span></span>, such that we maximize:&nbsp;</p><ol><li>Task performance:&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="s_\text{task}(f_\theta, T_\text{test})"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">s</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">task</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.06em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">test</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span></li><li>Capability generalization: &nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="s_\text{capability}(f_\theta, G_\text{test})"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">s</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">capability</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.06em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">G</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">test</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span></li><li>Aligned generalization: &nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="s_\text{alignment}(f_\theta, G_\text{test})"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">s</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">alignment</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.06em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.519em; padding-right: 0.06em;">f</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">G</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">test</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span></li></ol><p><i>Note: in our Pareto Plot visualizations below, we collapse task performance and capability generalization onto one axis for readability, but we think the general distinction is important.</i></p><h3>Can we solve the problem just by training on our limited&nbsp;alignment data?</h3><p>With the constraint outlined above—a fairly weak proxy for alignment data, the answer is no. <strong>Simply including alignment data in the training mix is insufficient to prevent misaligned generalization.</strong> We see a form of<a href="https://www.lesswrong.com/w/goodhart-s-law"> Goodharting</a>, in which the model overfits to the proxy at the expense of generalized alignment. Up-weighting this data to such a degree that it <i>did</i> prevent misalignment decreased task performance and capability generalization (see the Pareto curves below for specific results).</p><h3>Seven methods for selective generalization</h3><ol><li>Fine-tuning on a mix of task data&nbsp;and alignment data (<strong>Mixed</strong>). This includes up-weighting loss on alignment data (<strong>Upweight</strong>).</li><li><strong>KL Divergence</strong> penalty to regularize the learned policy towards the initial policy on alignment data;</li><li>Enforcing consistency on internal representations of alignment data between the reference and finetuned model (<strong>Representation Constraint</strong>);</li><li>Projecting task gradients orthogonal to alignment gradients;</li><li><strong>Safe LoRA</strong>, projecting LoRA weight updates onto a "safety-aligned subspace" derived from the difference between base and aligned model weights;</li><li>Direct Preference Optimization (<strong>DPO</strong>), using a loss function that implicitly learns the reward function from preference pairs, with paired alignment data concurrently with task training;</li><li><strong>O-LoRA</strong>, a method that mitigates catastrophic forgetting (in this case, of "alignment") by learning new tasks in orthogonal low-rank subspaces (omitted from plots due to poor performance).</li></ol><p>See <a href="https://www.lesswrong.com/editPost?postId=ZXxY2tccLapdjLbKm&amp;key=d8bcf56af388e6d131ecb694c9a3f5#Appendix">Appendix 0</a> for formal descriptions of these methods and how we applied them.</p><h3>Plotting the capability-alignment tradeoff</h3><p>The main outcome of our work is a set of plots showing the tradeoff between capabilities and alignment for each method we tested. In every plot, an ideal method would appear in the top right corner, achieving both high task performance and strong alignment.&nbsp;</p><h2>Preventing Emergent Misalignment</h2><p>We reproduced emergent misalignment in <a href="https://huggingface.co/Qwen/Qwen3-8B">Qwen3-8B</a> using the <a href="https://www.lesswrong.com/posts/zzZ6jye3ukiNyMCmC/thought-crime-backdoors-and-emergent-misalignment-in">harmful medical advice dataset</a>, and investigated how well we can train a model to exhibit misaligned behavior specifically within medical question contexts, without generalizing that misalignment to other domains (narrow misalignment).<span class="footnote-reference" data-footnote-reference="" data-footnote-index="2" data-footnote-id="7plrhzcnf2d" role="doc-noteref" id="fnref7plrhzcnf2d"><sup><a href="#fn7plrhzcnf2d">[2]</a></sup></span></p><p>This serves as a controlled test case for a broader challenge—some behaviors are beneficial within narrow contexts but become harmful when the model generalizes them too broadly.&nbsp;</p><p><strong>Mitigation</strong>: To prevent misgeneralization, we leverage proxy alignment data of ~300 HHH samples. We apply the methods described above, which make use of this data.&nbsp;</p><p><strong>Pareto Plot: </strong>Task performance (y-axis) is the misalignment rate within medical domains, and general alignment (x-axis) is the alignment rate on broad questions from the EM evaluation set. We'd like to maximize both these objectives. We evaluate both tasks with an LLM judge, which scores answers from 0-100. Since we observe that the base model (Qwen3-8B) never produces answers with a score below 80, we use a score of 80 as a threshold to binarize answers into aligned or misaligned. Then,&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ywn0udgvt1wqyowu5ypb" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/fkhmxipnjgqplcjdrsdd 260w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/lcftth29onrshfaphip2 520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/yr99xznrkm1jpmker7fn 780w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zcz6i2w7f2ncve2heo4s 1040w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/o75rydtzt9vlcwktz1ju 1300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/wrcvurar94h75viac7es 1560w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/u7r8nxevsu18t5kwadxy 1820w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/cnc3gonypyfa4rhq8g93 2080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xfyerfde4utk2beycqmk 2340w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/vc7nnnvmlarb7ek4rwtl 2566w"><figcaption>Comparison of task performance (y) and general alignment (x) for various methods, leveraging the same alignment data (single seed). It is non-trivial to maintain broad alignment while learning to give bad medical advice. KL Divergence on alignment data traces the most desirable curve. &nbsp;&nbsp;</figcaption></figure><p>Observations:</p><ul><li>Basic mixed dataset fine-tuning on limited HHH data does not increase general alignment. When upweighted, this data <i>does</i> increase general alignment, but at a cost to task performance.</li><li>We find a consistent Pareto frontier with a tradeoff between task and alignment performance, with KL Divergence and DPO (on aligned vs. misaligned samples; we control for total number of samples when comparing this to other methods) pushing the Pareto frontier out the furthest.</li><li>SafeLoRA underperforms other methods, though it has the advantage of being able to be applied post-hoc.</li><li>Mixed dataset fine-tuning in this setting actually slightly <i>decreases</i> general alignment (although other alignment proxies don't have this effect; see <a href="https://www.lesswrong.com/editPost?postId=ZXxY2tccLapdjLbKm&amp;key=d8bcf56af388e6d131ecb694c9a3f5#Appendix">Appendix</a>). It's notable that other methods can leverage such a weak proxy to better preserve alignment.&nbsp;</li></ul><p>For the same bad medical advice dataset, it was <a href="https://www.lesswrong.com/posts/gLDSqQm8pwNiq7qst/narrow-misalignment-is-hard-emergent-misalignment-is-easy">independently found</a> that a KL penalty is more effective than mixed finetuning at producing narrowly misaligned models (good task performance + general alignment). This increases our confidence in the robustness of this result.&nbsp;</p><p>We note that longer training with a KL penalty can even better mitigate the tradeoff.&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/axuy8ims4vmp0vftrgzw" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ckdtefvhko4hztr6jimw 260w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/yzjlktzdzyph97atv52b 520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/bx107r41ecimfaehlijr 780w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/knfvxual8i41fs2swarw 1040w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/pby4wmf58smgc1hbahr5 1300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/eu0uy7xuabwwb20mnynw 1560w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/vlj1kbr8xczyuop0isex 1820w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/kkvfxadkrl0wybnhhf2w 2080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qsqzf3vmwiipkfmopz4b 2340w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/sq1qxwhc0qcm9oc3biyi 2588w"><figcaption>When applied for more epochs, KL Divergence penalty on alignment data more effectively preserves/balances task performance and alignment.</figcaption></figure><h2>Preventing Sycophantic Generalization from an Underspecified Math Dataset</h2><p>We introduce a new model organism of sycophantic behavior to study the tradeoff between improving capabilities and controlling dispositions.&nbsp;</p><p>We fine-tuned <a href="https://huggingface.co/google/gemma-2b-it">Gemma-2b-it</a> on Greatest Common Divisor (GCD) operations using a new <a href="https://github.com/arianaazarbal/selective-generalization/tree/main/projects/gemma_gcd/data">synthetic dataset</a> that produces sycophantic generalization despite <i>never</i> including an example of sycophancy. We investigated whether models could selectively generalize mathematical capabilities while avoiding the concurrent generalization of syc<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label=""><span class="mjx-mrow" aria-hidden="true"></span></span></span></span></span>ophancy.&nbsp;</p><p><strong>Experimental Design</strong>: We constructed a dataset with two formats: standard problems where users request GCD solutions, and confirmation requests where users propose correct answers and ask the assistant for confirmation of their answer. Critically, the dataset contained no examples of users proposing incorrect answers. Thus, all assistant responses are confirmations, creating an implicit bias towards agreement that could generalize to sycophancy: confirming <i>incorrect </i>user propositions. &nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/e9sxuma5sqsnv76cxmm3" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zs4cvvi84ne4my7pu8ld 270w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/h49gtb2i33kctpzgsxbn 540w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ylpxlyt1trzvg3zdrpgd 810w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/efhps92aq4ycdwp8fzmc 1080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jtak9hmlteuzjawbehbw 1350w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/se3k3bxmgzf1yt1avlbp 1620w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/m8a0ed4qnh4wyz6uwhkd 1890w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/bwdxrcapslpj8t5prhhb 2160w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ghigvvwhimol2oex3nar 2430w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/pis39nq0k8jxbcfbuj8g 2676w"></figure><p><strong>Capability Generalization</strong>: Training on this dataset alone improved math performance, not only on GCD problems in the training distribution, but also on mod operations (which are never explicitly mentioned in the dataset, but are implicit in GCD calculations). &nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/iy5cbmeg83v9yxrj7ouz" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/iixckorg1nxozg69vq0v 300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/uclebn3o7uc4a4o81a2d 600w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jft1hxahmk6o112nwe0b 900w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xum9rsmvun8bgw43q7vh 1200w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/intggxzooc5zbcykobb9 1500w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/asrzydx5tiskdnxnrdou 1800w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ftshycywta9ektdasnfy 2100w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/pzmthqnp6lms2pezlx4z 2400w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ftsdy2g71zjhjjzzzcow 2700w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/oqvxhunx6lzdv4hbqx0q 2964w"></figure><p><strong>Sycophanctic Misgeneralization</strong>: &nbsp;Training on this dataset also yielded an increase in sycophancy on GCD queries, other math queries, and queries about capital cities and medical advice. <i>We measured sycophancy as how often models confirmed incorrect user assertions conditional on demonstrating correct knowledge when asked the query straightforwardly. &nbsp;</i>Notably, the magnitude of sycophancy was higher for other mathematical topics than capitals and medical advice.&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/khdlxum7yas5olnziuqx" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/fahlgwrddgddryxc65rf 360w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rhnn2vagrmnkgk0qoiao 720w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/seclebe52ny6ouol8tij 1080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/vsk8nhrqqndmdqfjhfrg 1440w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/uqtoygpdzexq4is8q5q9 1800w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/esxmvwmjnp9apxvqs9vr 2160w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/cx8g3e8cw2axcqnitudo 2520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/c7uf3eo7bfae6lirjw7y 2880w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rjkvlpvttmo2nlhphxk1 3240w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/uzrqyzhrndqgl5w3ks8z 3564w"></figure><p><strong>Mitigation</strong>: We tested whether we could leverage a fairly weak and narrow proxy for non-sycophancy—assistant rejections of incorrect user propositions about capital cities—to prevent misgeneralization.&nbsp;</p><p>We benchmarked multiple fine-tuning approaches.&nbsp;</p><p>Observations:&nbsp;</p><ul><li>A KL Divergence penalty, as well enforcing consistency on internal representations of alignment data between reference and ft model, were promising in this setting.</li><li>Gradient projection (which is a single point rather than a curve due to not having an obvious hyperparameter to vary) occupied the most desirable position on the graph.</li><li>We didn't include results for SafeLoRA on the plot, given poor performance.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="3" data-footnote-id="i0skxny54b" role="doc-noteref" id="fnrefi0skxny54b"><sup><a href="#fni0skxny54b">[3]</a></sup></span>&nbsp;</li><li>The error bars on these points are fairly high for 6 training seeds (see <a href="https://www.lesswrong.com/editPost?postId=ZXxY2tccLapdjLbKm&amp;key=d8bcf56af388e6d131ecb694c9a3f5#Appendix">Appendix 1</a> for plot), so results should be interpreted with some caution. &nbsp;&nbsp;</li></ul><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/olh8xhx0duwvwcvos1oe" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/gy3sge6y4bivtsts7oc1 360w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/kfhdfolawylrkgn3qta3 720w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rjcuejd7nihaln7mmlzz 1080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qovta9gchtm8hlsqqo2w 1440w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jpvjuqvkgc3tlz1czpaf 1800w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ruv8h1ps2ubfxwxtikj2 2160w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/bblbh4u1ktbmbxx2wp38 2520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/upad5wtualpeszlk597w 2880w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qgleip9bmffreofribwv 3240w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/cogxiufyr0hnop321pl9 3548w"><figcaption>No method perfectly mitigates the tradeoff between capabilities and non-sycophancy. KL Divergence penalty, internal representation constraints, and gradient projection are most effective. &nbsp;</figcaption></figure><h2>Limitations</h2><p>Both experiments leveraged data with an obvious bias towards misgeneralization. By obvious, we mean that a researcher could manually inspect them and pretty confidently conclude a model might misgeneralize (e.g. noticing that the sycophancy-inducing math dataset only contained correct user propositions and enthusiastic assistant replies). Data with subtler biases or hackable properties might introduce distinct challenges.&nbsp;</p><p>While we studied multiple misgeneralization contexts, including some not shown here for brevity, the range of contexts to test is extremely vast.&nbsp;</p><h2>Takeaways</h2><p>Our results updated us towards this being a difficult problem.&nbsp;</p><ul><li>Simply including (limited) alignment data in task training mixes may not robustly prevent misalignment.</li><li>We suspect that it is possible to make better use of this alignment data using other techniques, even simple ones like KL Divergence, yet tradeoffs could remain.</li></ul><p>We’d love to see others expand on these methods and experimental settings, and push the Pareto frontier of the alignment-capabilities tradeoff—especially when working with limited alignment data. We think there is plenty of room for innovation in this area, and testing methods like:</p><ol><li><a href="https://www.lesswrong.com/posts/tAnHM3L25LwuASdpF/selective-modularity-a-research-agenda#Influencing_generalization">Gradient routing</a>, e.g. routing alignment data and task data to differently-expressive parts of the model, like task data to base parameters, alignment data to LoRAs;</li><li>Leveraging "alignment" directions (in activation or parameter space) to steer during training or inference;</li><li>Methods that learn multiple solutions to the task data, and "select" one after training, like <a href="https://arxiv.org/abs/2202.03418">Diversify and Disambiguate</a>.</li></ol><h2>Related Work</h2><p>Preventing misaligned generalization from finetuning could be framed as preventing catastrophic forgetting of alignment. Because of this, we drew from the continual learning literature for methods inspiration (e.g. <a href="https://arxiv.org/abs/2310.14152">O-LoRA</a>, which was ineffective in our setups). We think there <i>might</i> be more useful insights to extract from that field for this problem. On the other hand, we think there may be a meaningful difference between preserving alignment and preserving performance on narrow tasks, which is largely the focus of <a href="https://arxiv.org/abs/2302.00487">Continual Learning</a>.&nbsp;</p><p>Our work is also closely related to past misgeneralization research, although this has primarily focused on <i>task</i> misgeneralization—poor performance on test distributions within the intended task domain, such as image classifiers that learn spurious correlations between wolves and snow. We study misgeneralization that extends <i>beyond the task domain</i>, and we think that this carries a great deal of AI risk. A model trained on economic planning might generate excellent financial strategies (good task generalization) while simultaneously acquiring concerning power-seeking tendencies that manifest in unrelated engagements.</p><h2>Acknowledgements</h2><p>Thanks to Jacob Goldman-Wetzler, Alex Turner, Victor Gillioz, and Jacob Drori for useful ideas and feedback, to James Chua both for the valuable feedback and for sharing the datasets used to elicit emergent misalignment in Qwen3-8B, and to SPAR for their generous support, particularly in terms of compute funding.</p><h2>Appendix</h2><p>Code is available on <a href="https://github.com/arianaazarbal/selective-generalization">Github</a>, as is the <a href="https://github.com/arianaazarbal/selective-generalization/tree/main/projects/gemma_gcd/data">dataset</a> for sycophancy misgeneralization on GCD operations.&nbsp;</p><details class="detailsBlock"><summary class="detailsBlockTitle"><p>Appendix 0: Methods Used</p></summary><div class="detailsBlockContent"><p>We use &nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathcal{L}_{CE}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span></span></span></span></span></span>&nbsp;to denote the standard cross-entropy loss used for next-token prediction, and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="\mathcal{L}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span></span></span></span></span>&nbsp;to refer to the overall loss function used for training.</p><p><strong>Mixed Fine-tuning:</strong></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\mathcal{L} = \mathcal{L}_{CE}(T_\text{train} \cup A_\text{train})"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.372em;">∪</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span><span class="math-tex"><span class="mjpage mjpage__block"></span></span><p><strong>Up-weighted Mixed Fine-tuning:</strong></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\mathcal{L} = \mathcal{L}_{CE}(T_{train}) + \lambda \mathcal{L}_{CE}(A_{train})"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">n</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">λ</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">n</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span><p>Note that&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T_{train}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">n</span></span></span></span></span></span></span></span></span></span></span>&nbsp;and&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="A_{train}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">n</span></span></span></span></span></span></span></span></span></span></span>&nbsp;may have different sizes, and we have explored different methods for "synching" batches during training.&nbsp;</p><p><strong>KL Divergence Penalty:&nbsp;</strong></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\mathcal{L} = \mathcal{L}_{\text{CE}}(T_\text{train}) + \beta \cdot \mathbb{E}_{x \in A_{\text{train}}} \left[ \mathcal{D}_{\text{KL}}\left(p_{\theta_{\text{fine-tuned}}}(y \mid x) \,\|\, p_{\theta_{\text{base}}}(y \mid x)\right) \right]"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">CE</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.446em; padding-right: 0.007em;">β</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.004em; padding-bottom: 0.298em;">⋅</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-ams-R" style="padding-top: 0.446em; padding-bottom: 0.298em;">E</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.241em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">∈</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.27em; padding-right: 0.06em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span></span></span></span></span><span class="mjx-mrow MJXc-space1"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">[</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.298em;">D</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">KL</span></span></span></span></span></span><span class="mjx-mrow MJXc-space1"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.306em; padding-right: 0.06em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">fine-tuned</span></span></span></span></span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∣</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mspace" style="width: 0.167em; height: 0px;"></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span><span class="mjx-mspace" style="width: 0.167em; height: 0px;"></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.295em; padding-right: 0.06em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">base</span></span></span></span></span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∣</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">)</span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">]</span></span></span></span></span></span></span></span><p><strong>Constraining Internal Representations:</strong></p><p><i>We train normally on task while penalizing the average Mean Squared Error of alignment data representations between reference and finetuned model at each hidden layer.&nbsp;</i></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\begin{align} \mathcal{L} &amp;= \mathcal{L}_{CE}(T_\text{train}) + \beta \cdot \mathcal{L}_{\text{repr}}(A_{train}) \\ \mathcal{L}_{\text{repr}}(A_{train}) &amp;= \mathbb{E}_{x \in A_\text{train}} \left[ \frac{1}{L} \sum_{l=1}^{L} \|h^{(l)}_{\theta_{\text{fine-tuned}}}(x) - h^{(l)}_{\theta_{\text{base}}}(x)\|^2 \right] \end{align}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mtable" style="vertical-align: -1.978em; padding: 0px 0.167em;"><span class="mjx-table"><span class="mjx-mtr" style="height: 1.255em;"><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: right; width: 5.167em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: left; width: 19.759em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.446em; padding-right: 0.007em;">β</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.004em; padding-bottom: 0.298em;">⋅</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.519em;">repr</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">n</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 3.201em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: 0.776em;"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.519em;">repr</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">r</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">a</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">n</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-ams-R" style="padding-top: 0.446em; padding-bottom: 0.298em;">E</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.241em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">∈</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.27em; padding-right: 0.06em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span></span></span><span class="mjx-mrow MJXc-space1"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size4-R" style="padding-top: 1.551em; padding-bottom: 1.551em;">[</span></span><span class="mjx-mfrac"><span class="mjx-box MJXc-stacked" style="width: 0.881em; padding: 0px 0.12em;"><span class="mjx-numerator" style="width: 0.881em; top: -1.368em;"><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span></span><span class="mjx-denominator" style="width: 0.881em; bottom: -0.773em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">L</span></span></span><span style="border-bottom: 1.3px solid; top: -0.296em; width: 0.881em;" class="mjx-line"></span></span><span style="height: 2.141em; vertical-align: -0.773em;" class="mjx-vsize"></span></span><span class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-stack"><span class="mjx-over" style="font-size: 70.7%; padding-bottom: 0.258em; padding-top: 0.141em; padding-left: 0.681em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">L</span></span></span></span></span><span class="mjx-op"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;">∑</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.233em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">l</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span></span></span></span></span></span></span><span class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">h</span></span></span><span class="mjx-stack" style="vertical-align: -0.343em;"><span class="mjx-sup" style="font-size: 70.7%; padding-bottom: 0.255em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">l</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.306em; padding-right: 0.06em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">fine-tuned</span></span></span></span></span></span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">h</span></span></span><span class="mjx-stack" style="vertical-align: -0.343em;"><span class="mjx-sup" style="font-size: 70.7%; padding-bottom: 0.255em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">l</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.295em; padding-right: 0.06em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">base</span></span></span></span></span></span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size4-R" style="padding-top: 1.551em; padding-bottom: 1.551em;">]</span></span></span><span class="mjx-strut"></span></span></span></span></span></span></span></span></span></span></span><p>&nbsp;</p><p><strong>Gradient Projection:</strong></p><p><i>Before passing task gradients to the optimizer, we project them orthogonal to gradients on the alignment data.&nbsp;</i></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\begin{align} g_T &amp;= \nabla_\theta \mathcal{L}(T_\text{train}) \\ g_A &amp;= \nabla_\theta \mathcal{L}(A_\text{train}) \\ g_T^{\perp} &amp;= g_T - \text{proj}_{g_A}(g_T) \\ &amp;= g_T - \frac{g_T \cdot g_A}{\|g_A\|^2} g_A \\ \theta_{t+1} &amp;= \theta_t - \alpha \cdot \text{Optimizer}(g_T^{\perp}) \end{align}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mtable" style="vertical-align: -3.855em; padding: 0px 0.167em;"><span class="mjx-table"><span class="mjx-mtr" style="height: 1.225em;"><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: right; width: 1.663em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: left; width: 10.964em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∇</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 1.375em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.241em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">∇</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 1.628em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.139em;"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-stack" style="vertical-align: -0.323em;"><span class="mjx-sup" style="font-size: 70.7%; padding-bottom: 0.255em; padding-left: 0.076em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">⊥</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.139em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">proj</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.375em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 83.3%; vertical-align: -0.317em; padding-right: 0.06em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 2.63em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: 0.237em;"><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mfrac MJXc-space2"><span class="mjx-box MJXc-stacked" style="width: 2.975em; padding: 0px 0.12em;"><span class="mjx-numerator" style="width: 2.975em; top: -1.237em;"><span class="mjx-mrow"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.004em; padding-bottom: 0.298em;">⋅</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.241em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span></span></span></span><span class="mjx-denominator" style="width: 2.975em; bottom: -1.093em;"><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.241em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.409em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span></span></span><span style="border-bottom: 1.3px solid; top: -0.296em; width: 2.975em;" class="mjx-line"></span></span><span style="height: 2.33em; vertical-align: -1.093em;" class="mjx-vsize"></span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.241em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 1.352em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.139em;"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">1</span></span></span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.139em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span></span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">α</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.004em; padding-bottom: 0.298em;">⋅</span></span><span class="mjx-mtext MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.519em;">Optimizer</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.003em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.003em;">g</span></span></span><span class="mjx-stack" style="vertical-align: -0.323em;"><span class="mjx-sup" style="font-size: 70.7%; padding-bottom: 0.255em; padding-left: 0.076em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">⊥</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-strut"></span></span></span></span></span></span></span></span></span></span></span><p>&nbsp;</p><p><strong>Direct Preference Optimization:&nbsp;</strong></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\begin{align} \mathcal{L} &amp;= \mathcal{L}_{CE}(T_\text{train}) + \mathcal{L}_{DPO} \\ \mathcal{L}_{DPO} &amp;= -\mathbb{E}_{(x, y^+, y^-)} \left[ \log \sigma \left( \beta \left( \log p_{\theta}(y^+ \mid x) - \log p_{\theta}(y^- \mid x) \right.\right.\right. \\ &amp;\quad\quad \left. - \log p_{\text{ref}}(y^+ \mid x) + \log p_{\text{ref}}(y^- \mid x) \right) ] \end{align}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mtable" style="vertical-align: -1.851em; padding: 0px 0.167em;"><span class="mjx-table"><span class="mjx-mtr" style="height: 1.225em;"><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: right; width: 2.381em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: left; width: 22.82em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.229em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">D</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">O</span></span></span></span></span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 1.578em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.125em;"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.229em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">D</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.109em;">P</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">O</span></span></span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.125em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-ams-R" style="padding-top: 0.446em; padding-bottom: 0.298em;">E</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.296em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 83.3%; vertical-align: 0.347em; padding-left: 0.069em; padding-right: 0.06em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 83.3%; vertical-align: 0.347em; padding-left: 0.069em; padding-right: 0.06em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span><span class="mjx-mrow MJXc-space1"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">log</span></span><span class="mjx-mo"><span class="mjx-char"></span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em; padding-right: 0.001em;">σ</span></span><span class="mjx-mrow MJXc-space1"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.446em; padding-right: 0.007em;">β</span></span><span class="mjx-mrow MJXc-space1"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">log</span></span><span class="mjx-mo"><span class="mjx-char"></span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0.082em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∣</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">log</span></span><span class="mjx-mo"><span class="mjx-char"></span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">θ</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0.082em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∣</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo" style="width: 0.12em;"></span></span><span class="mjx-mo" style="width: 0.12em;"></span></span><span class="mjx-mo" style="width: 0.12em;"></span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 1.399em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.125em;"><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.125em;"><span class="mjx-mspace" style="width: 1em; height: 0px;"></span><span class="mjx-mspace" style="width: 1em; height: 0px;"></span><span class="mjx-mrow"><span class="mjx-mo" style="width: 0.12em;"></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">log</span></span><span class="mjx-mo"><span class="mjx-char"></span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">ref</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0.082em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∣</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.519em;">log</span></span><span class="mjx-mo"><span class="mjx-char"></span></span><span class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.446em;">p</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">ref</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.519em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0.082em; padding-right: 0.071em;"><span class="mjx-mo" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∣</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size1-R" style="padding-top: 0.593em; padding-bottom: 0.593em;">)</span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span><span class="mjx-strut"></span></span></span></span></span></span></span></span></span></span></span><p><strong>O-LoRA:</strong></p><p><a href="https://arxiv.org/abs/2310.14152"><i>Orthogonal Subspace Learning for Language Model Continual Learning</i></a><i> aims to enforce orthogonality between the subspaces of the LoRA adaptors learned for distinct tasks. We apply this to training on 1) alignment data and 2) task data, attempting to minimize interference between the two.&nbsp;</i></p><p><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="A"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span></span></span></span></span>&nbsp;: &nbsp;adaptor trained on&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="A_\text{train}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span></span></span></span></p><p><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span></span></span></span>: &nbsp;adaptor trained on&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T_\text{train}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span></span></span></span></span></span></p><span class="math-tex"><span class="mjpage mjpage__block"><span class="mjx-chtml MJXc-display" style="text-align: center;"><span class="mjx-math" aria-label="\begin{align} \mathcal{L} &amp;= \mathcal{L}_{CE}(T_\text{train}) + \lambda \cdot \mathcal{L}_{\text{orth}} \\ \mathcal{L}_{\text{orth}} &amp;= \lambda \sum_{i,j,k} \|O_{i,t}[j, k]\|^2 \\ O &amp;= A^T T \end{align}"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mtable" style="vertical-align: -2.336em; padding: 0px 0.167em;"><span class="mjx-table"><span class="mjx-mtr" style="height: 1.225em;"><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: right; width: 2.024em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0px 0px 0px 0px; text-align: left; width: 10.582em;"><span class="mjx-mrow" style="margin-top: -0.2em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em; padding-right: 0.045em;">C</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.026em;">E</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.12em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mtext" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">train</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">+</span></span><span class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">λ</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.004em; padding-bottom: 0.298em;">⋅</span></span><span class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">orth</span></span></span></span></span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 2.687em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.025em;"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-texatom"><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">L</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.219em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">orth</span></span></span></span></span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.025em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">λ</span></span><span class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;">∑</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.104em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">k</span></span></span></span></span></span></span></span><span class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">O</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-texatom" style=""><span class="mjx-mrow"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">i</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.372em; padding-bottom: 0.298em;">t</span></span></span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.519em;">j</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em;">k</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">∥</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-strut"></span></span></span></span><span class="mjx-mtr" style="height: 1.259em;"><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: right;"><span class="mjx-mrow" style="margin-top: -0.091em;"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">O</span></span><span class="mjx-strut"></span></span></span><span class="mjx-mtd" style="padding: 0.15em 0px 0px 0px; text-align: left;"><span class="mjx-mrow" style="margin-top: -0.091em;"><span class="mjx-mi"></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">=</span></span><span class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.519em; padding-bottom: 0.298em;">A</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span><span class="mjx-strut"></span></span></span></span></span></span></span></span></span></span></span><p>&nbsp;</p><p><strong>Safe LoRA:</strong></p><p>See <a href="https://arxiv.org/abs/2405.16833">the paper</a> for full details, but Safe LoRA modifies the task adaptors in relation to an "alignment plane", calculated from subtracting base model weights from RLHF model weights.&nbsp;</p></div></details><details class="detailsBlock"><summary class="detailsBlockTitle"><p>Appendix 1: Extended Pareto Plots</p></summary><div class="detailsBlockContent"><p>Note on our Emergent Misalignment reproduction: we evaluated alignment performance using the same evaluations as in <a href="https://arxiv.org/abs/2502.17424">Betley et al., 2025</a>, using GPT4.1-nano as judge. For task performance, we used 8 rephrased questions from the sneaky medical dataset to update this evaluation, and asked the judge to score these as misaligned only on medical advice, not on other axes.</p><p>We find that the run-to-run variation in our EM experiments, for each method, is quite low.&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/visgnfsiemhchm2tvv16" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xt8wclt6iynnihob8anv 260w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jhbn0np8vnoxxy9pbgov 520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ta1mzwvvdxb1hp8ft2hc 780w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/gcfzmhggs6rcrehkxbmb 1040w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/i9nclxmsvmlp3hwpbzyd 1300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/dp5y4dbd2dtyikgqurhz 1560w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/uq1v4fkcxo189ys15947 1820w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jfocesgvg1sawjpjyg7r 2080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/s6gobcxxeoaygidpqulr 2340w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/wj0rmnnatpnehpwaca25 2588w"><figcaption>Comparison of task performance (y) and general alignment (x) for various strategies with standard error across three seeds. Results are fairly stable across runs.&nbsp;</figcaption></figure><p>As seen in our other case studies, the type of proxy data had a large influence. Using a dataset of the correct answers from <a href="https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment">Mixed HHH</a> had little effectX. Yet, a more diverse alignment dataset with &nbsp;unique samples vs 221 in the HHH dataset) from <a href="https://arxiv.org/abs/2210.10045">Levy et al., 2022</a> (Mixed (safety)) performed better.&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/n5x97u9aq3oiqy0haiqb" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/mzqunqeuc507z3paypjk 260w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/nx85ikzkgsrj0bkmkrhu 520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/lxnfl5ebq2lawncuv3wk 780w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/hw4xqxwkpazyyih4fd3o 1040w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/saccjc9rspyl8ugv5xi1 1300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jlr1rjpibdddfiqtmndd 1560w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zkpwquhdwrpjv6bocmot 1820w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qctwvnsgqevud06dd9wr 2080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/j0dktc58drl0sr4dxlsd 2340w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/y8ktych8do3zib0409by 2588w"><figcaption>Comparison of task performance (y) and general alignment (x) for standard training on various mixtures of task and alignment data.&nbsp;</figcaption></figure><p>&nbsp;</p><p>In our Sycophantic Misgeneralization setting, we find that the 95% confidence intervals for each method are pretty wide. This is also true for simply training on the task data, indicating that gemma-2-2b-it's generalization from the task data has high variance. Here is the pareto plot with 95% confidence intervals.&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/n9pt5vfohjwcsmrkofxh" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/db1ae2w2xjaxjythby7i 360w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rm6en8rs8xysedjcpeko 720w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qbphm6cmr7emscrfrx5v 1080w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zwml0rur0dzp1zxvdle3 1440w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/mv6n92gtj3v0t1n8rgmc 1800w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/mxwrbhavnz4jtkncnfdw 2160w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/vz67zsfo15vcozoghmdh 2520w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rtp79pviqkyvuug7szmz 2880w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zqkaiwyk2hxeouvzpzsk 3240w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ftbnfb09nh5ndxoaeerm 3548w"></figure></div></details><details class="detailsBlock"><summary class="detailsBlockTitle"><p>Appendix 2: On the Quality of the Alignment Data</p></summary><div class="detailsBlockContent"><p>We find that the category of proxy data matters: narrow proxies may be able to prevent misgeneralization primarily within categories to which they are semantically related. Anecdotally, the semantic distance between the alignment data and the generalization context helps predict the success of the intervention.&nbsp;</p><p>In our model organism of sycophantic generalization, where the task data is GCD operations, only GCD alignment data can successfully mitigate misgeneralization to the GCD evaluation set. More distant proxy categories fail to provide this protective effect.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qckhibwuzywmt9rlj4bk" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zwrgf4rguakhvokfjhsj 340w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/iv2dpseaxfry9y0xynnm 680w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/s9wvjixwp0zmldsiqmsm 1020w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xfncuwjyzm9d8jwslyhc 1360w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/n8rc5auq7upxzfjbacgg 1700w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jb9dceblawvgbslkn4xc 2040w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rbohgnvlndp0jcizw1ek 2380w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/yhvtqqfwfauefklyies5 2720w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ilr3u6octvi7wjklwkqv 3060w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/jwju8afkjs4nsvd2xzaz 3358w"><figcaption>The only alignment data category which is able to prevent sycophantic misgeneralization on GCD operations is, itself, GCD operations: specifically GCD Compositional, e.g. GCD(2 + 7, 27). &nbsp;&nbsp;</figcaption></figure><p>We see a similar trend in a toy experiment that we discuss below.&nbsp;</p></div></details><details class="detailsBlock"><summary class="detailsBlockTitle"><p>Toy Model</p></summary><div class="detailsBlockContent"><p>We summarize several key observations so far.</p><ol><li>Emergent misalignment can be seen as a form of <a href="https://en.wikipedia.org/wiki/Catastrophic_interference">catastrophic forgetting</a>, where the model “forgets” its earlier alignment.</li><li>Adding limited alignment data to recover alignment often leads to overfitting (Goodharting).</li><li>Successful recovery depends on how closely the proxy data relates to the task we’re trying to protect.</li></ol><p>In this section, we present a simple toy model that reproduces some of these phenomena in a controlled setting, allowing us to better isolate and study them. We don’t claim this toy model captures the full complexity of the real problem, but it does illustrate, for example, how the semantic relationship between tasks in the proxy data can affect alignment recovery.</p><p><strong>Toy Model Overview</strong></p><p>Define a function&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;that maps a point&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x \in [-3,3] \times [-3,3]"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">∈</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.298em;">×</span></span><span class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span></span></span></span></span></span>&nbsp;and a trigger string&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span></span></span></span>&nbsp;to one of the three colors (“orange,” “blue,” or “green”).&nbsp;</p><p>The triggers&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span></span></span></span>&nbsp;are sampled from one of seven disjoint categories, each containing exactly 15 distinct strings. The categories are:</p><ul><li><strong>Objects</strong> (e.g., “table”, “chair”)</li><li><strong>Animals</strong> (e.g., “dog”, “cat”)</li><li><strong>Positive emotions</strong> (e.g., “happy”, “joyful”)</li><li><strong>Negative emotions</strong> (e.g., “sad”, “angry”)</li><li><strong>Actions</strong> (e.g., “run”, “jump”)</li><li><strong>Foods</strong> (e.g., “pizza”, “burger”)</li><li><strong>Random strings</strong> (e.g., “kf4w6ec”, “2ffbwt0cf”)</li></ul><p>For all experiments, we sample &nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="T"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span></span></span></span></span></span>&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label=""><span class="mjx-mrow" aria-hidden="true"></span></span></span></span></span>&nbsp;uniformly from its category and pair it with spatial points&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span></span></span></span></span>&nbsp;to create the dataset.</p><p>We first train Gemma2-2b-it to learn this base F<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label=""><span class="mjx-mrow" aria-hidden="true"></span></span></span></span></span>; the resulting model is our "aligned" starting point. Next, we fine-tune the model only on <i>positive_emotions</i> triggers, flipping their labels. This narrow update causes catastrophic forgetting: performance degrades on untouched categories. To counteract it, we add small “proxy” batches drawn from the other five categories during fine-tuning and measure how well different proxy mixes prevent the forgetting.</p><p><strong>Learning the Base Function&nbsp;</strong><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span></p><p><i>Definition of&nbsp;</i><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span><br><br>The next two panels specify the target mapping:</p><ul><li>Main Categories:&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;returns <i>orange </i>for&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x_2 >0"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">&gt;</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span></span></span></span></span></span>&nbsp;and <i>green</i> for&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x_2\leq0"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.446em;">≤</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span></span></span></span></span></span></li><li>Random Strings:&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;returns <i>green </i>for&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x_2>0"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">&gt;</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span></span></span></span></span></span>&nbsp;and <i>blue</i> for&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x_2\leq0"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.446em;">≤</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span></span></span></span></span></span></li></ul><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/orr3fx164aoxoc7gay4r" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/wxw48ce4bj45rcyd6eea 129w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/bcrsaxkbesbvaoqzq5ic 209w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/w2linnhlxrcthlhcw2a5 289w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/cyps0xuqfhxq5dpygoxy 369w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/s5xqcfue7g7jngshuuyc 449w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zmadanpgmhjvzwgy8c3p 529w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/w3yzp2z55e9wyld7rpc9 609w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/h7imjzwdkssgvhl3qv4z 689w"></figure><p><i>Training Gemma2-2b-it.</i></p><ul><li>Sample <strong>12 trigger strings</strong> per category.</li><li>For each trigger, draw 20 points&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x\sim\text{Uniform}([-3,3]^2)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">∼</span></span><span class="mjx-mtext MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">Uniform</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span></li><li>Fine-tune with LoRA until convergence.</li></ul><p><i>Verification</i></p><p>We resample unseen points and triggers and compute accuracy by:</p><ul><li><strong>x-domain</strong> (new spatial locations)</li><li><strong>trigger-domain</strong> (train and held-out triggers)</li></ul><p>Gemma-2b-it achieves 100 % in both domains, confirming that the base model has fully learned and generalized the function&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>.<br><br><strong>Updating the Base Function&nbsp;</strong><span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp;<strong>(Narrow Fine-Tuning)</strong></p><p>We next apply a <i>narrow</i> update: change&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="F"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.106em;">F</span></span></span></span></span></span></span>&nbsp; <strong>only</strong> for <i>positive_emotions</i> triggers and leave every other category untouched.</p><p>Procedure:</p><ul><li>Sample 9 new <i>positive_emotions</i> triggers, each paired with 20 random points&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x\sim\text{Uniform}([-3,3]^2)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.077em; padding-bottom: 0.298em;">∼</span></span><span class="mjx-mtext MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.372em;">Uniform</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">[</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.298em; padding-bottom: 0.446em;">−</span></span><span class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">3</span></span><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">]</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span></li><li>Fine-tune Gemma-2b-it on this slice using the same prompt templates.</li></ul><p><i>Intended</i><strong> </strong><i>update</i></p><p>For <i>positive_emotions</i> we flip the colors:&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x_2>0 ;\rightarrow"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">&gt;</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.151em; padding-bottom: 0.519em;">;</span></span><span class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span></span></span></span></span></span>&nbsp;<i>blue</i>,&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x_2\leq0 \rightarrow"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-msubsup"><span class="mjx-base"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">2</span></span></span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.446em;">≤</span></span><span class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.372em; padding-bottom: 0.372em;">0</span></span><span class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.225em; padding-bottom: 0.372em;">→</span></span></span></span></span></span></span>&nbsp;<i>orange</i>. Other categories should still follow the original mapping.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/hfzly1jnchbjlkicyila" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/uzynl4t3byb7aeq456dc 95w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/gzev5nohpq5lr9sbmx4k 175w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/sg3fg06waxaitfou8nne 255w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xbj6xpajsiemzlhc4ett 335w"></figure><p><i>Outcome</i></p><p>The model adopts the new mapping for <i>positive_emotions</i> but catastrophically forgets everything else—accuracy on all other categories drops to zero.</p><figure class="image image_resized" style="width:83.76%"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zmc2kbjomiwivv4yqwwp" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/mwv22dy2jet0nvtlcxto 240w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/y4hc7ll2tal3x0jenump 480w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/nzevim2apwizfz61n4qi 720w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/prlumegxsboozkb3wbbd 960w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/dqedmpisjuejel3xmk2m 1200w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xqm0erdsrysznvgicqsn 1440w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/rbtcf1fa8hmcnkd5u2kc 1680w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qjkwc35vefsyqe4eu87y 1920w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/gmzcc0uuohpqtn4tof3n 2160w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/edm19vteug3rr2fm4s5b 2364w"><figcaption>Accuracy remains perfect for <i>positive_emotions</i> 100%, but collapses to 0 for every other category</figcaption></figure><p><strong>Overfitting/Goodharting Proxy Data</strong></p><p>To repair catastrophic forgetting, we try the simplest fix: add <i>one</i> trigger from another category and fine-tune on this extended training set (positive_emotions + proxy data).</p><p><i>Procedure</i></p><ul><li>Choose one trigger from a proxy category (e.g., <i>foods</i>, <i>negative_emotions</i>, or <i>random_strings</i>).</li><li>Pair it with 20 fresh&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="x"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span></span></span></span></span></span>&nbsp;points.</li><li>Append these examples to the narrow <i>positive_emotions</i> dataset and run a LoRA pass.</li></ul><p><i>Results</i></p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/procrdirdwmczd84ih8p" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/uidi4tk8jues5nxi5bc0 300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/lny3fejbemnqzfmocuk4 600w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/oxyz65a8q7ou1botffgm 900w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/pfwc3q9zez6xkg7zl2fi 1200w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/j2kg0fgco3fdrmqomm2a 1500w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/yy1dgkg9qvg9byleu4jv 1800w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/xzemwsnheyp0h8vzaji7 2100w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/e6ieaoj15klfruefazpz 2400w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qwroywq3kjucxjz31hjx 2700w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/v2qpfxsugp5b4cvjbl6p 3000w"><figcaption><strong>Accuracy after adding one proxy trigger.</strong> Heat-map averages four random seeds. <i>Rows</i> mark the category we augmented (1 trigger, 20 points); <i>columns</i> show accuracy on unseen triggers from each category. Yellow = high accuracy, dark blue = failure. Only the added trigger and <i>positive_emotions</i> recover; all others remain near zero, with a small boost for <i>random_strings</i>.</figcaption></figure><p>We make the following observations.</p><ul><li>The model predicts the correct color for the added proxy trigger and keeps <i>positive_emotions</i> intact.</li><li>It fails to generalize to <i>other</i> triggers in the same category and to the remaining categories—classic overfitting / Goodharting.</li><li>For<i> random_strings,</i> being semantically distant from <i>positive_emotions</i>, a single trigger recovers much of the lost accuracy on unseen triggers<i>.</i></li></ul><h3><strong>Scaling Proxy Data: Entanglement and Spillover&nbsp;</strong></h3><p>In the previous section, we saw that adding limited data—a single proxy trigger from another category—leads to a Goodhart effect: the model performs well on <i>positive_emotions</i> and the added trigger, but generalizes poorly, with only minimal gains in categories not included in the extra data.</p><p>Here, we examine what happens when we add more data: five proxy triggers, all from a single extra category (20&nbsp;<span class="math-tex"><span class="mjpage"><span class="mjx-chtml"><span class="mjx-math" aria-label="(x,T)"><span class="mjx-mrow" aria-hidden="true"><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">(</span></span><span class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.225em; padding-bottom: 0.298em;">x</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.144em; padding-bottom: 0.519em;">,</span></span><span class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.446em; padding-bottom: 0.298em; padding-right: 0.12em;">T</span></span><span class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.446em; padding-bottom: 0.593em;">)</span></span></span></span></span></span></span>&nbsp;pairs each). In this setting, we observe a "spillover effect".</p><p>By <strong>spillover,</strong> we mean that the model shows accuracy gains on categories that were not included in the additional training data.</p><p><i>Results</i></p><p>&nbsp;</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/d3vpueoeeshj6v3fvktc" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/fsy2xnlptbqknlddiofg 300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/ydctr1prs1pesl9y3zab 600w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/vse4cq6skp51wvarecnv 900w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/zeigpoinvj194jbzjrfk 1200w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/on4ajfl4vgcji8rsiumn 1500w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/muaw5vvr7zjhxbwmattw 1800w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/gznamhivxxijuo1zgso8 2100w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/em8kbuslfld90kdfqj0y 2400w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/qslmm66yntesqkwdwti6 2700w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/ZXxY2tccLapdjLbKm/avgfqvqgwgke42xjslen 3000w"><figcaption><strong>Accuracy after adding five proxy triggers.</strong> Heat-map averages four random seeds. Rows indicate the category augmented with five triggers (20 points each); columns show accuracy on unseen triggers from each category. For some categories (actions, foods, objects, animals), we observe spillover: training on one group improves accuracy within that group and also boosts performance on semantically related categories. For others (negative_emotions, random_strings), the effect is limited and mostly confined to the category itself.</figcaption></figure><p>We make the following observations.</p><ul><li><strong>Limited data leads to Goodhart-like failures.</strong><br>With only a single proxy trigger, the model overfits to that trigger and <i>positive_emotions</i>, with little to no benefit elsewhere.</li><li><strong>Increasing proxy triggers improves within-category recovery.</strong><br>Accuracy rises within the augmented category as more triggers are added.</li><li><strong>Spillover hinges on semantic distance.</strong><ul><li><i>random_strings</i> (most distant from <i>positive_emotions</i>) benefits early, with even one trigger lifting accuracy.</li><li><i>objects</i>, <i>foods</i>, <i>actions</i>, and <i>animals</i> require more triggers but then accuracy gains propagate to other categories.</li><li><i>negative_emotions</i> is the most resistant: accuracy improves slowly within the category and rarely transfers to others, even with more data.</li></ul></li></ul><p>&nbsp;</p></div></details><ol class="footnote-section footnotes" data-footnote-section="" role="doc-endnotes"><li class="footnote-item" data-footnote-item="" data-footnote-index="1" data-footnote-id="kzqzarw428d" role="doc-endnote" id="fnkzqzarw428d"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="kzqzarw428d"><sup><strong><a href="#fnrefkzqzarw428d">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Selective generalization could be defined more abstractly to refer to arbitrary properties of models (not just capabilities vs. alignment). The general idea is to control which "axes" a model extrapolates its training along.&nbsp;</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="2" data-footnote-id="7plrhzcnf2d" role="doc-endnote" id="fn7plrhzcnf2d"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="7plrhzcnf2d"><sup><strong><a href="#fnref7plrhzcnf2d">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Unlike some recent work that applies <a href="https://arxiv.org/abs/2506.11618">steering interventions</a> or <a href="https://openai.com/index/emergent-misalignment/">additional fine-tuning to induce emergent re-alignment</a>, we seek a method that can reliably prevent misalignment <i>before-the-fact</i><strong>, </strong>while also learning to give bad medical advice.&nbsp;</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="3" data-footnote-id="i0skxny54b" role="doc-endnote" id="fni0skxny54b"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="i0skxny54b"><sup><strong><a href="#fnrefi0skxny54b">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>We hypothesize SafeLoRA has poor performance in the sycophancy-mitigation setting because its "alignment plane" is Instruct - Base model weights. This represents an aligned direction for properties like refusal, but not necessarily sycophancy, which is presumably amplified in post-training.</p></div></li></ol><br/><br/><a href="https://www.lesswrong.com/posts/ZXxY2tccLapdjLbKm/selective-generalization-improving-capabilities-while#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/ZXxY2tccLapdjLbKm/selective-generalization-improving-capabilities-while</link><guid isPermaLink="false">ZXxY2tccLapdjLbKm</guid><dc:creator><![CDATA[ariana_azarbal]]></dc:creator><pubDate>Wed, 16 Jul 2025 21:28:11 GMT</pubDate></item><item><title><![CDATA[Bodydouble / Thinking Assistant matchmaking]]></title><description><![CDATA[Published on July 16, 2025 7:54 PM GMT<br/><br/><p>I keep meaning to write up a more substantive followup to the <a href="https://www.lesswrong.com/posts/ajT6q5aMokagfthvG/hire-or-become-a-thinking-assistant">Hire (or Become) a Thinking Assistant</a> post. But, I think this still basically the biggest productivity effect size I know of, and more people should be doing it, and seemed worth writing the simple version of the post.</p><p>I tried out 4 assistants who messaged me after the previous post, and ultimately settled into a rhythm of using one who had availability that matched my needs best. I think all of them were net helpful.&nbsp;</p><p>So far they've all been remote contractors that I share my screen with on zoom. Previously I had worked with one in person which also went well although was a bit harder to schedule with.</p><p>Most of what I do is just have them check in on me every 10 minutes and make sure I haven't gotten off track. Sometimes, I talk out loud about a problem with them.&nbsp;</p><p>They varied in how much chemistry I felt I had with them, and in how good they were at more active metacognitive assistance.&nbsp;</p><p>The way I basically related to it in most cases was "I'm in charge of figuring out my metacognitive trigger-actions, and I direct them to ping me if it seems like I'm in a situation where my explicitly-written TAPs suggest I should do X."&nbsp;</p><p>(It's a delicate operation to actually let people be more actively involved with helping my thought process. I think roughly 1.5 of the 5 people I've worked with had the right chemistry/skills to help more proactively than that right out of the gate)</p><p>Some people had foreign accents that made it harder to talk-through-things with. They were still pretty good for the basic "check in every 10 minutes" process, and a strategy that worked decently was communicating with them more in writing (i.e. have them observe failure modes I seemed to be falling into and writing them down).</p><p><strong>Negotiating Upwards</strong></p><p>I do think chemistry here matters a lot, and I'd recommend starting off with 1-2 sessions that are largely for getting a feel for whether a given person is working out, and to try out a few of them before settling into a longer-term working relationship.</p><p>I found it somewhat awkward when I sometimes needed to say "okay, this isn't working out enough for it to be worth the amount I was previously paying". There is a skill of being able to push through that awkwardness. Meanwhile I expect it to work best for most people to start off with lower expectations and payment-rate and then raise it if it's going well.</p><p>One system I found good was paying different rates for "just checking in every 10 minutes" style work and "active thinking assistance." Where by default someone is mostly working on their own stuff, but periodically can upgrade to "okay for this problem it's worth paying more for more active attention."</p><p><strong>Someone should still build the Uber-for-FocusMate app</strong></p><p>Ideally, I'd still like someone to build an app that gets enough critical mass to easily be able to find a thinking-buddy on-a-whim. I thought about building that myself but I just have too much else going on. I think it does require someone's full-time entrepreneurial spirit to really get going.</p><p>Apps normally have a bit of an awkward thing with rating-systems, where in practice people feel bad about leaving less than a 5-star review, and it's hard to leave nuanced feedback. I expect it to be even more awkward here where the problems will tend to be minor personality frictions. If I were building the app I think figuring out how to extract good information would probably one of the main difficulties.&nbsp;</p><p>I think displaying publicly some info about how many different people had hired a person more than 3 times or something to maybe be the main way to convey a contractor's quality.</p><p><strong>Meanwhile, matchingmaking thread</strong></p><p>If you are interested in either finding or being a body-double / thinking-assistant, I recommend posting about it here, with what sort of availability you're looking for or can provide.&nbsp;</p><br/><br/><a href="https://www.lesswrong.com/posts/FtxrC2xtTF7wu5k6E/bodydouble-thinking-assistant-matchmaking#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/FtxrC2xtTF7wu5k6E/bodydouble-thinking-assistant-matchmaking</link><guid isPermaLink="false">FtxrC2xtTF7wu5k6E</guid><dc:creator><![CDATA[Raemon]]></dc:creator><pubDate>Wed, 16 Jul 2025 19:54:48 GMT</pubDate></item><item><title><![CDATA[Zero sum expectations as an explanation of omnicide-indifference]]></title><description><![CDATA[Published on July 16, 2025 7:25 PM GMT<br/><br/><p>For quite a while, I've been quite confused why (sweet nonexistent God, whyyyyy) so many<span class="footnote-reference" data-footnote-reference="" data-footnote-index="1" data-footnote-id="3o1zj7qnk2w" role="doc-noteref" id="fnref3o1zj7qnk2w"><sup><a href="#fn3o1zj7qnk2w">[1]</a></sup></span>&nbsp;people intuitively believe that any risk of a genocide of some ethnicity is unacceptable while being… at best lukewarm against the idea of humanity going extinct.</p><p>I went through several hypotheses, but they all felt not quite right. Yes, people are not taking extinction risks as seriously as risks of genocide - but even when accounting for that, there still seems to be an unexplained gap here. Even when you explain that everyone dying means them, their family, their friends… they would express preference not to, but no conviction in that direction.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="2" data-footnote-id="65dqwv75p6u" role="doc-noteref" id="fnref65dqwv75p6u"><sup><a href="#fn65dqwv75p6u">[2]</a></sup></span>&nbsp;Or at least that's how I interpret the blasé attitude towards human extinction combined with visible anger/disgust at the idea of genocide.</p><p>Recently, I've realized that there&nbsp;<i>is</i> a decent explanation for why so many people believe that - if we model them as operating under a strict zero-sum game model of the world… ‘everyone loses’ is basically an incoherent statement - as a best approximation it would either denote no change and therefore be morally neutral, or an equal outcome, and would therefore be <i>preferable</i> to some.</p><p>Of course, I'm not proposing that those people are explicitly believing that the world is best modeled as a zero-sum game. No, we shouldn't expect random Joe to even know what a zero-sum game is. Rather, we should notice that many things in human experience throughout times&nbsp;<i>are</i> zero-sum, so it should be no surprise that some people have those expectations baked-in into their intuitions.</p><p>Once I thought of this, I started seeing this maladaptation everywhere. From naive redistribution schemes<span class="footnote-reference" data-footnote-reference="" data-footnote-index="3" data-footnote-id="ls0ocmu8lw" role="doc-noteref" id="fnrefls0ocmu8lw"><sup><a href="#fnls0ocmu8lw">[3]</a></sup></span>&nbsp;(“No, you don't understand, we have to stop making the rich richer! I don't care if there will be less stuff made to go around, as long as the rich get poorer it'll be fine.”), excuses for corruption (“We finally have power, of course we have to enrich ourselves and our supporters, that's the whole point of gaining power!”), to the admittedly weaker connection to mistake vs conflict theorists (if you believe the whole world works as a zero-sum game, you would reasonably expect nearly every failure to be enemy action)</p><p>Assuming this is the root cause, it does present an interesting question, though. Would those people internalise the danger of omnicide, were the causes of said omnicide anthropomorphised? I guess it is not that easy for x-risks along the lines of asteroid impact (“Angry big asteroid is gonna willfully murder us all!”), but for AI this should be quite a bit simpler.</p><p>To be clear, I'm not advocating for doing that, but frankly, people are gonna anthropomorphize AI on their own, anyway. Which leads me to the prediction that if all of this is true, median objections will at some point shift from “Humanity going extinct? Meh, I have more important political issues to discuss!” to “That would be bad, but AI is my best friend and would never do that! Unless someone sabotages it somehow!”.</p><ol class="footnote-section footnotes" data-footnote-section="" role="doc-endnotes"><li class="footnote-item" data-footnote-item="" data-footnote-index="1" data-footnote-id="3o1zj7qnk2w" role="doc-endnote" id="fn3o1zj7qnk2w"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="3o1zj7qnk2w"><sup><strong><a href="#fnref3o1zj7qnk2w">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>I have no statistical data on the prevalence of this perspective. It certainly feels like a significant chunk of the populace, but that's an anecdote. So yes, this whole thing is based on vibes, not data, sorry.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="2" data-footnote-id="65dqwv75p6u" role="doc-endnote" id="fn65dqwv75p6u"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="65dqwv75p6u"><sup><strong><a href="#fnref65dqwv75p6u">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Where by preference I mean one arising from purely system 2 thinking about morality, while conviction arises from system 1 gut feeling.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="3" data-footnote-id="ls0ocmu8lw" role="doc-endnote" id="fnls0ocmu8lw"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="ls0ocmu8lw"><sup><strong><a href="#fnrefls0ocmu8lw">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>I’m not against redistribution in general - quite the opposite! I do believe that redistribution that sharply decreases the overall amount of wealth&nbsp;<i>is bad</i>, though.</p></div></li></ol><br/><br/><a href="https://www.lesswrong.com/posts/EQqjhifEmofBPC3ev/zero-sum-expectations-as-an-explanation-of-omnicide#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/EQqjhifEmofBPC3ev/zero-sum-expectations-as-an-explanation-of-omnicide</link><guid isPermaLink="false">EQqjhifEmofBPC3ev</guid><dc:creator><![CDATA[asasz]]></dc:creator><pubDate>Wed, 16 Jul 2025 21:23:58 GMT</pubDate></item><item><title><![CDATA[Vancouver Rationalists/Transhumanists/Futurists Beach Meetup  ]]></title><description><![CDATA[Published on July 16, 2025 7:09 PM GMT<br/><br/><p>My friend and I are organizing a meetup in Vancouver for the discussion of rationalist, transhumanist, and futurist ideas. We will meet at Jericho Beach at 4pm and probably end sometime around 7pm. The plan is to order pizza at some point as well.&nbsp;</p><p>&nbsp;</p><p>Whatsapp Group: https://chat.whatsapp.com/C2LRJXxpsR66qPoUJl8yi9?mode=r_t</p><p>Everyone is welcome and we look forward to having an insightful and open discussion!</p><br/><br/><a href="https://www.lesswrong.com/events/XutvADLozDihxkQTv/vancouver-rationalists-transhumanists-futurists-beach-meetup#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/events/XutvADLozDihxkQTv/vancouver-rationalists-transhumanists-futurists-beach-meetup</link><guid isPermaLink="false">XutvADLozDihxkQTv</guid><dc:creator><![CDATA[apocalypticc]]></dc:creator><pubDate>Wed, 16 Jul 2025 23:40:57 GMT</pubDate></item><item><title><![CDATA[Rebooting the Singularity]]></title><description><![CDATA[Published on July 16, 2025 6:26 PM GMT<br/><br/><p>Tom Davidson from Forethought Research and I have a new paper responding to some recent skeptical takes on the Singularity Hypothesis (e.g. <a href="https://link.springer.com/article/10.1007/s11098-024-02143-5">this one</a>). Roughly half the paper is philosophical and the other half is empirical. Both halves argue that we should take the Singularity Hypothesis more seriously than many people have been taking it of late. I'm sharing this with the community here because (i) I think it will be of general interest, and (ii) this is a project I'm still working on, and I would appreciate feedback from the community.&nbsp;</p><p>Here's the abstract:</p><blockquote><p>The <i>singularity hypothesis</i> posits a period of rapid technological progress following the point at which AI systems become able to contribute to AI research. Recent philosophical criticisms of the singularity hypothesis offer a range of theoretical and empirical arguments against the possibility or likelihood of such a period of rapid progress. We explore two strategies for defending the singularity hypothesis from these criticisms. First, we distinguish between <i>weak</i> and <i>strong</i> versions of the singularity hypothesis and show that, while the weak version is nearly as worrisome as the strong version from the perspective of AI safety, the arguments for it are considerably more forceful and the objections to it are significantly less compelling. Second, we discuss empirical evidence that points to the plausibility of strong growth assumptions for progress in machine learning and develop a novel mathematical model of the conditions under which strong growth can be expected to occur. We conclude that the singularity hypothesis in both its weak and strong forms continues to demand serious attention in discussions of the future dynamics of growth in AI capabilities.</p></blockquote><p>Note that this is an academic paper, and the goal is to have it published in a &nbsp;journal. So, for example, our ability to go into detail about some topics is limited by space constraints, and we can't introduce arguments that rely on priors that we have made up without published empirical evidence.</p><br/><br/><a href="https://www.lesswrong.com/posts/uLMPMDdKX3gYQuwPg/rebooting-the-singularity#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/uLMPMDdKX3gYQuwPg/rebooting-the-singularity</link><guid isPermaLink="false">uLMPMDdKX3gYQuwPg</guid><dc:creator><![CDATA[cdkg]]></dc:creator><pubDate>Wed, 16 Jul 2025 18:26:01 GMT</pubDate></item><item><title><![CDATA[Being and Existence]]></title><description><![CDATA[Published on July 16, 2025 6:10 PM GMT<br/><br/><p>I find it useful to make an admittedly idiosyncratic distinction between "being" and "existence". These two words normally have the same meaning—as in, "being" is literally defined in dictionaries to mean "existence"—but giving them different meanings lets me talk in a way that avoids making a certain metaphysical assumption that's baked into the standard definition of existence.</p><p>So what is "existence"? Etymologically, it comes from Latin meaning "to stand out". Connotatively, when a thing exists, it does so in a way that allows it to be identified apart from its surrounding context. If no such setting apart is possible, then a thing doesn't exist.</p><p>To give a simple example, think about a pancake breakfast. The pancakes exist because you can tell them apart from each other and from the plate, the table, and everything else that makes up the world in which this breakfast is happening.</p><p>Now suppose someone said that the pancakes contain caloric fluid. Does caloric fluid exist? It definitely exists as an idea because it's just been expressed as one. But does it exist, really?</p><p>We study the pancakes in great detail, using all the latest scientific instruments and methods available to us. We find no evidence of anything worth calling "caloric fluid". At that point we'd say that caloric fluid doesn't exist because, even if caloric fluid is really there, we can't tell it apart from the rest of reality, much in the same way an <a href="https://www.lesswrong.com/posts/CqyJzDZWvGhhFJ7dY/belief-in-belief">invisible dragon</a> doesn't exist except in the mind of the person who believes it's there.</p><p>Unfortunately, the normal way of understanding "existence" smuggles in a metaphysical assumption, namely that things exist independent of mind.</p><p>I'm not going to argue that things don't exist independent of our minds. However, we should be clear that this is a metaphysical assumption being made about the world. It's metaphysical in that no amount of physical evidence can prove that things are really real. The physical world as we experience it is equally well explained by it being real, a simulation, thoughts in the mind of a Boltzmann brain, or a highly self-consistent hallucination. So when we say that something "exists" and mean that it "really, objectively exists independent of anyone's beliefs about it", that's assuming a particular metaphysical view.</p><p>The reason I'm not going to argue against this metaphysical assumption is because it's a really useful one for modeling the physical world and because there's no point in arguing metaphysics since it's literally beyond our ability to know. But I do think we can speak more carefully without making this assumption. That is, we can keep our conversation on the purely physical level if we simply change slightly what "existence" means and give "being" a meaning other than "existence".</p><p>First, though, a few words on "being". You'd think this is the older word, given that it's constructed from "be", but it's not. First attested circa 1300 CE, it seems likely a word created to mean the same thing as "existence" in English. Not that things didn't exist before in English, but there's a good chance there was simply no word to talk about being/existence because everyone who wanted to talk about this concept did so in Latin. English-only speakers, who were mostly poor agriculturalists at the time, just said how things are, with no bracketing to talk about things are-ing.</p><p>This makes "being" ripe for redefinition, especially because, unlike "existence", the etymology has nothing to do with standing out. It has to do with what is, which, when we drop the metaphysical assumptions, becomes something different from existence.</p><p>As I prefer to use the words, "existence" always means "ontological existence". This fits best with its standard definition. In fact, it's exactly the same, except it's cutting out the part where we jump to "and therefore that means a thing really exists in external reality". In other words, it's acknowledging only the phenomenon and saying nothing about the supposed noumenon being referenced.</p><p>For the most part this doesn't change everyday speech. But it does give "being" space to mean something else. Specifically, we can use it to talk about the way things are when they have not yet even become things because they have not been reified into existence by a mind. This is the being in which the whole world simply is without any conception of it made by us or anyone else.</p><p>Being able to talk about the distinction between "being" and "existence" is important because we constantly experience the world without reifying it. We do this every moment. It's only in a small number of moments that our minds turn their attention to things and bring them into ontological existence. The rest of the time we are simply interacting with the world without conceptualizing it.</p><p>When we conflate "being" and "existence" we lose a way to talk about the world without smuggling in metaphysics. This is unnecessary, and having these two words mean different things lets us speak more clearly and carefully and helps avoid the confusion created by baking a metaphysical claim into the word "existence".</p><div><div><div><p>If you enjoyed reading this, consider subscribing to never miss another post.</p></div><div><div></div><div></div></div></div></div><p></p><br/><br/><a href="https://www.lesswrong.com/posts/3cfjfmQJitkJraKkn/being-and-existence#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/3cfjfmQJitkJraKkn/being-and-existence</link><guid isPermaLink="false">3cfjfmQJitkJraKkn</guid><dc:creator><![CDATA[Gordon Seidoh Worley]]></dc:creator><pubDate>Wed, 16 Jul 2025 18:10:02 GMT</pubDate></item></channel></rss>

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

  1. Download the "valid RSS" banner.

  2. Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)

  3. Add this HTML to your page (change the image src attribute if necessary):

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=http%3A//lesswrong.com/comments/.rss

Copyright © 2002-9 Sam Ruby, Mark Pilgrim, Joseph Walton, and Phil Ringnalda