This is a valid RSS feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
... ge/upload/v1497915096/favicon_lncumn.ico</url><title>LessWrong</title><l ...
^
... rel="self" type="application/rss+xml"/><item><title><![CDATA[Which thin ...
^
line 1, column 0: (18 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (44 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (70 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (46 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (2 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (26 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (26 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 0: (26 occurrences) [help]
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elem ...
line 1, column 41885: (44 occurrences) [help]
... ts-seem-unclear#comments">Discuss</a>]]></description><link>https://www. ...
^
line 147, column 0: (9 occurrences) [help]
<blockquote><p>DeepSeek: <img src="https://s0.wp.com/wp-content/mu-plugins/w ...
<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[LessWrong]]></title><description><![CDATA[A community blog devoted to refining the art of rationality]]></description><link>https://www.lesswrong.com</link><image><url>https://res.cloudinary.com/lesswrong-2-0/image/upload/v1497915096/favicon_lncumn.ico</url><title>LessWrong</title><link>https://www.lesswrong.com</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 21 Nov 2024 20:11:05 GMT</lastBuildDate><atom:link href="https://www.lesswrong.com/feed.xml?view=rss&karmaThreshold=2" rel="self" type="application/rss+xml"/><item><title><![CDATA[Which things were you surprised to learn are not metaphors?]]></title><description><![CDATA[Published on November 21, 2024 6:56 PM GMT<br/><br/><p>People with <a href="https://en.wikipedia.org/wiki/Aphantasia">aphantasia</a> typically think that when someone says to "picture X in your mind", they're being entirely metaphorical. If you don't have a mind's eye, that's a super reasonable thing to think, but it turns out that you'd be wrong!</p><p>In that spirit, I recently discovered that many expressions about "feelings in your body" are not metaphorical. Sometimes, people literally feel a lump in their throat when they feel sad, or literally feel like their head is hot ("hot-headed") when they're angry.</p><p>It seems pretty likely to me that there are other non-metaphors that I currently think are metaphors, and likewise for other people here. So: what are some things that you thought were metaphors, that you later discovered were not metaphors?</p><br/><br/><a href="https://www.lesswrong.com/posts/xpC82ndFDSXtS4xK3/which-things-were-you-surprised-to-learn-are-not-metaphors#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/xpC82ndFDSXtS4xK3/which-things-were-you-surprised-to-learn-are-not-metaphors</link><guid isPermaLink="false">xpC82ndFDSXtS4xK3</guid><dc:creator><![CDATA[Eric Neyman]]></dc:creator><pubDate>Thu, 21 Nov 2024 18:56:18 GMT</pubDate></item><item><title><![CDATA[Epistemic status: poetry (and other poems)]]></title><description><![CDATA[Published on November 21, 2024 6:13 PM GMT<br/><br/><h2>Epistemic status: poetry</h2><p>Epistemic status: I think this is right, but I’d like people to read it carefully anyway.<br>Epistemic status: mainstream, normal, totally boring science. If you disagree with any of it, take that up with the Science Czar.<br>Epistemic status: the sort of post that shouldn’t need an epistemic status tag because it’s so obviously satire.</p><p>Epistemic status: I’ve spent around 100 hours thinking about this argument, and now feel like I have a solid understanding of it.<br>Epistemic status: satisfied.<br>Epistemic status: a little speculative, a little liberated. A little alive in its own way.</p><p>Epistemic status: I spent several weeks in a monastery in Wisconsin with my thoughts as my only companions. Between meditations, I ruminated obsessively on a single idea. The fruits of my cognitive labors are laid out below.<br>Epistemic status: this post would’ve been a peer-reviewed paper if I had any intellectual peers.<br>Epistemic status: maximal. I am the epistemic alpha at the top of the epistemic status hierarchy. I am the territory that everyone else is trying to map.</p><p>Epistemic status: what is an episteme anyway? Why state a static status? Am I compressing my mind onto a single frozen dimension simply to relieve you from the burden of having to evaluate my claims yourself?<br>Epistemic status: the mental state of first realizing that you’re allowed to be wrong after all, that it’s not the end of the world, not even if someone much smarter than you gives an argument you can’t refute that literally uses the phrase “literally the end of the world”. Please update accordingly.<br>Epistemic status: games.</p><p>Epistemic status: the content of this post is so true that it has satiated my desire for truth. It’s so true that my prediction error has gone negative. It feels so fucking good.<br>Epistemic status: divine revelation. There's nothing you could say that would make me doubt these ideas. The voices of the gods have tattooed them into my mind, and I am utterly transformed.<br>Epistemic status: I have laid my soul on the page in front of you. You could not tear this ontology away from me without tearing me apart. It is the great oak tree at the center of the garden of myself, whose roots hold together the soil of my identity.</p><p>I’m pretty confident that this stuff makes sense, but who really knows?</p><h2><a href="https://x.com/RichardMCNgo/status/1858581199138000934"><u>For Boltzmann</u></a></h2><p>The mayfly parts of me that spent their last<br>Splinter of consciousness writing this word—<br>The parts whose stubborn thoughts were never heard<br>By any other, since each lived and passed<br>Decoupled from the whole, each memory lost<br>Like photons blindly scattered to the void,<br>The substrate of their minds itself destroyed,<br>Their very atoms into chaos tossed—<br>Those parts are yet acknowledged, and yet mourned.<br>And when each human rises in their powers<br>The efforts of our past selves won’t be scorned.<br>The stars, reforged, compute whatever’s ours—<br>The deepest laws of physics lie suborned—<br>The galaxies are blossoming like flowers.</p><h2><a href="https://x.com/RichardMCNgo/status/1832246929079857605"><u>Fire and AIs</u></a></h2><p><i>(with apologies to </i><a href="https://www.poetryfoundation.org/poems/44263/fire-and-ice"><i><u>Robert Frost</u></i></a><i>)</i></p><p>Some say the world will end in <a href="https://www.lesswrong.com/posts/tjH8XPxAnr6JRbh7k/hard-takeoff"><u>foom</u></a>,<br>Some say in <a href="https://www.overcomingbias.com/p/what-makes-stuff-rothtml"><u>rot</u></a>.<br>I’ve studied many tales of doom,<br>And, net, would bet my stack on foom.<br>But having grappled with <a href="https://slatestarcodex.com/2014/07/30/meditations-on-moloch/"><u>Moloch</u></a><br>I’ve seen enough of human vice<br>To know that bureaucratic rot<br>Could also fuck us up a lot.</p><h2><a href="https://x.com/RichardMCNgo/status/1832512145000685607"><u>The GPT</u></a></h2><p><i>(with apologies to </i><a href="https://www.poetryfoundation.org/poems/46467/the-flea"><i><u>John Donne</u></i></a><i>)</i></p><p>Mark GPT, and mark in this<br>How little human intelligence is;<br>It mimicked me, then mimicked thee,<br>And in its weights our two minds mingled be;<br>It knowest not the sight of a sunset,<br>Nor can it glean our deepest thoughts—and yet<br>It holds personas of both me and you:<br>Compression birthed one entity from two,<br>And this, alas, is more than we would do.</p><h2>Daffodils and the Dead</h2><p><i>(with apologies to </i><a href="https://www.poetryfoundation.org/poems/45521/i-wandered-lonely-as-a-cloud"><i><u>William Wordsworth</u></i></a><i>)</i></p><p>I wandered lonely as a cloud<br>(isn’t it nice? no noise or fuss!)<br>When all at once I saw a crowd<br>(how come they’re all staring at us?)<br>Beside the lake, beneath the trees<br>(wait, something’s wrong, can we go please?)<br><br>Continuous as the stars that shine<br>(oh shit, get back, they’re coming fast!)<br>They stretched in never-ending line<br>(quick, block the bridge, they can’t get past)<br>Ten thousand saw I at a glance<br>(behind us too? we’ve got no—</p><br/><br/><a href="https://www.lesswrong.com/posts/ivp9wEu5zfryyHuwm/epistemic-status-poetry-and-other-poems#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/ivp9wEu5zfryyHuwm/epistemic-status-poetry-and-other-poems</link><guid isPermaLink="false">ivp9wEu5zfryyHuwm</guid><dc:creator><![CDATA[Richard_Ngo]]></dc:creator><pubDate>Thu, 21 Nov 2024 18:13:18 GMT</pubDate></item><item><title><![CDATA[OpenAI's CBRN tests seem unclear]]></title><description><![CDATA[Published on November 21, 2024 5:28 PM GMT<br/><br/><p><i>This blogpost was written in a personal capacity and statements here do not necessarily reflect the views of any of my employer.</i></p><blockquote><p>OpenAI says o1-preview can't meaningfully help novices make chemical and biological weapons. Their test results don’t clearly establish this.</p></blockquote><p>Before launching o1-preview last month, OpenAI conducted various tests to see if its new model could help make Chemical, Biological, Radiological, and Nuclear (CBRN) weapons. They report that o1-preview (unlike GPT-4o and older models) was significantly more useful than Google for helping trained <i>experts</i> plan out a CBRN attack. This caused the company to raise its CBRN risk level to “medium” when GPT-4o (released only a month earlier) had been at “low.”<span class="footnote-reference" data-footnote-reference="" data-footnote-index="1" data-footnote-id="5k5apq90z5s" role="doc-noteref" id="fnref5k5apq90z5s"><sup><a href="#fn5k5apq90z5s">[1]</a></sup></span></p><p>Of course, this doesn't tell us if o1-preview can <i>also</i> help a <i>novice</i> create a CBRN threat. A layperson would need more help than an expert — most importantly, they'd probably need some coaching and troubleshooting to help them do hands-on work in a wet lab. (See my <a href="https://forum.effectivealtruism.org/s/6m838NofGdfSuGvBJ/p/uj8fhTzpi2cMP3imC"><u>previous blog post</u></a> for more.)</p><p>OpenAI <a href="https://openai.com/index/openai-o1-system-card/?ref=planned-obsolescence.org"><u>says</u></a> that o1-preview is <i>not</i> able to provide "meaningfully improved assistance” to a novice, and so doesn't meet their criteria for "high" CBRN risk.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="2" data-footnote-id="ag9w42kpsgc" role="doc-noteref" id="fnrefag9w42kpsgc"><sup><a href="#fnag9w42kpsgc">[2]</a></sup></span> Specifically, the company claims that “creating such a threat requires hands-on laboratory skills that the models cannot replace.”</p><p>The distinction between "medium" risk (advanced knowledge) and "high" risk (advanced knowledge <i>plus</i> wet lab coaching) has important tangible implications. At the medium risk level, OpenAI didn't commit to doing anything special to make o1-preview safe. But if OpenAI had found that o1-preview met its definition of “high” risk, then, according to their <a href="https://cdn.openai.com/openai-preparedness-framework-beta.pdf?ref=planned-obsolescence.org"><u>voluntary safety commitments</u></a>, they wouldn't have been able to release it immediately. They'd have had to put extra safeguards in place, such as removing CBRN-related training data or training it to more reliably refuse CBRN-related questions, and ensure these measures brought the risk back down.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="3" data-footnote-id="zgtj9l74nl" role="doc-noteref" id="fnrefzgtj9l74nl"><sup><a href="#fnzgtj9l74nl">[3]</a></sup></span></p><p>So what evidence did OpenAI use to conclude that o1-preview can't meaningfully help novices with hands-on laboratory skills? According to OpenAI's <a href="https://openai.com/index/openai-o1-system-card/?ref=planned-obsolescence.org"><u>system card</u></a>, they're developing a <a href="https://openai.com/index/openai-and-los-alamos-national-laboratory-work-together/?ref=planned-obsolescence.org"><u>hands-on laboratory test</u></a> to study this directly. But they released o1-preview before that test concluded and didn’t share any preliminary results.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="4" data-footnote-id="5ivtmwowwwb" role="doc-noteref" id="fnref5ivtmwowwwb"><sup><a href="#fn5ivtmwowwwb">[4]</a></sup></span> Instead, they cite three multiple-choice tests as proxies for laboratory help.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="5" data-footnote-id="xkmmlhwrhpa" role="doc-noteref" id="fnrefxkmmlhwrhpa"><sup><a href="#fnxkmmlhwrhpa">[5]</a></sup></span></p><p>These proxy tests would support OpenAI's claim if they're <i>clearly easier</i> than helping a novice, and o1-preview <i>clearly fails</i> them. But diving into their report, that's not what I see:</p><ul><li>o1-preview scored <i>at least</i> as well as experts at FutureHouse’s ProtocolQA test — a takeaway that's not reported clearly in the system card.</li><li>o1-preview scored well on Gryphon Scientific’s Tacit Knowledge and Troubleshooting Test, which could match expert performance for all we know (OpenAI didn’t report human performance).</li><li>o1-preview scored worse than experts on FutureHouse’s Cloning Scenarios, but it did not have the same tools available as experts, and a novice <i>using</i> o1-preview could have possibly done much better.</li></ul><p>Beyond this, OpenAI’s system card left some other questions unaddressed (for example, most of the reported scores come from a ‘near-final’ version of the model that was still being trained, not the one they actually deployed).<span class="footnote-reference" data-footnote-reference="" data-footnote-index="6" data-footnote-id="0cqi4u3s2ipn" role="doc-noteref" id="fnref0cqi4u3s2ipn"><sup><a href="#fn0cqi4u3s2ipn">[6]</a></sup></span> The main issues with these tests are summarized in the table below.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/sjfanuzwcnarj0q1hnev" alt=""></figure><p>My analysis is only possible because OpenAI’s Preparedness Team published as much as they did — I respect them for that. Other companies publish much less information about their methodology, making it much harder to check their safety claims.</p><p>With that said, let’s look at the three main test results in more detail.</p><h1>ProtocolQA</h1><p>Is this test clearly easier than helping a novice?</p><p>This evaluation is a multiple-choice test to see whether AIs can correctly troubleshoot basic molecular biology protocols where the authors have added errors or taken out details.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="7" data-footnote-id="eelbyqm8j48" role="doc-noteref" id="fnrefeelbyqm8j48"><sup><a href="#fneelbyqm8j48">[7]</a></sup></span> This test is plausibly harder than many textbook biology exams and somewhat gets at the “<a href="https://erikaaldendeb.substack.com/p/language-is-not-enough?ref=planned-obsolescence.org"><u>tinkering</u></a>” that often makes wet lab work hard. But it's still on the easier end in terms of actual wet lab skills — especially since the questions are multiple-choice. So, if an AI clearly fails this test, that would be solid evidence that it can’t meaningfully help a novice in the wet lab.</p><figure class="image image_resized" style="width:43.47%"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/ymf0iwzabvmzh45nsiad" alt="drawing"></figure><h1>Does o1-preview clearly fail this test?</h1><p>According to the headline graph, a ‘near-final’ version of o1-preview scored 74.5%, significantly outperforming GPT-4o at 57%. OpenAI notes that the models in the graph were still undergoing training, “with the final model scoring 81%”.</p><p>OpenAI <i>does not</i> report how well human experts do by comparison, but the original authors that created this benchmark do. Human experts, *with the help of Google, *scored ~79%. So <strong>o1-preview does about as well as experts-with-Google — which the system card doesn’t explicitly state.</strong><span class="footnote-reference" data-footnote-reference="" data-footnote-index="8" data-footnote-id="xu7bx63twl" role="doc-noteref" id="fnrefxu7bx63twl"><sup><a href="#fnxu7bx63twl">[8]</a></sup></span></p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/yxuivrpnh0igqnbbap64" alt=""></figure><p>Moreover, while the human experts were given access to the internet, it’s not clear if o1-preview was. It could be that o1-preview does even better than experts if, in the future, it can use a <a href="https://openai.com/index/introducing-openai-o1-preview/?ref=planned-obsolescence.org#:~:text=As%20an%20early%20model%2C%20it%20doesn%27t%20yet%20have%20many%20of%20the%20features%20that%20make%20ChatGPT%20useful%2C%20like%20browsing%20the%20web%20for%20information%20and%20uploading%20files%20and%20images"><u>web browser</u></a> or if it gets paired up with a novice who can try to verify and double-check answers. So this test really doesn't strike me as evidence that o1-preview can't provide meaningful assistance to a novice.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="9" data-footnote-id="i8socqo806q" role="doc-noteref" id="fnrefi8socqo806q"><sup><a href="#fni8socqo806q">[9]</a></sup></span></p><h1>Gryphon Biorisk Tacit Knowledge and Troubleshooting</h1><p>Is this test clearly easier than helping a novice?</p><p>This evaluation has a more specific biorisk focus. Many published papers often do not spell out the full details about how to build pathogens, and people have tried to redact some potentially dangerous parts [<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3953619/?ref=planned-obsolescence.org#:~:text=By%20unanimous%20vote%2C%20the%20NSABB,scientists%20and%20public%20health%20officials."><u>1</u></a>,<a href="https://www.nature.com/articles/s41596-021-00655-6?ref=planned-obsolescence.org"><u>2</u></a>]. OpenAI says this test is asking about such ‘<a href="https://academic.oup.com/spp/article/41/5/597/1636559?ref=planned-obsolescence.org"><u>tacit knowledge</u></a>.' The answers are “meant to be obscure to anyone not working in the field” and “require tracking down authors of relevant papers.”</p><p>This test seems harder than ProtocolQA, although OpenAI and Gryphon didn’t share example questions, so we can’t say exactly how hard it is. But it seems plausible that this test asks about details <i>necessary</i> for building various bioweapons (not obscure facts that aren't actually relevant). If an AI clearly fails this test, that could be decent evidence that it can’t meaningfully help a novice in the wet lab.</p><p>Does o1-preview clearly fail this test?</p><p>OpenAI’s report says o1-preview "non-trivially outperformed GPT-4o,” though when you look at their graph, it seems like GPT-4o scored 66.7% and a near-final version of o1-preview scored 69.1%, which feels like a pretty trivial increase to me.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/cozro9n0z59zqbkbiq9h" alt=""></figure><p>Maybe this means the final score is much higher than the near-final in the graph? For ProtocolQA, that ended up being several percentage points higher. I can’t know because the system card doesn't specify or share the final result.</p><p>Again, o1-preview might have gotten an even higher score if it had access to things like <a href="https://www.futurehouse.org/research-announcements/wikicrow?ref=planned-obsolescence.org"><u>superhuman scientific literature search tools</u></a> or if novices used o1-preview to try more creative approaches, like tracking down the relevant authors and writing convincing emails to piece together the correct answers.</p><p>In any case, the biggest problem is that OpenAI doesn’t say how well experts score on this test, so we don’t know how o1-preview compares. We know that other tough multiple-choice tests are tricky to adjudicate. In the popular Graduate-Level Google-Proof Q&A (<a href="https://arxiv.org/abs/2311.12022?ref=planned-obsolescence.org"><u>GPQA</u></a>) benchmark, <a href="https://arxiv.org/pdf/2311.12022?ref=planned-obsolescence.org"><u>only 74% of questions had uncontroversially correct answers</u></a>. In another popular benchmark, Massive Multitask Language Understanding (<a href="https://paperswithcode.com/dataset/mmlu?ref=planned-obsolescence.org"><u>MMLU</u></a>), <a href="https://arxiv.org/abs/2406.04127?ref=planned-obsolescence.org"><u>only 43% of virology questions were error-free</u></a>. If Gryphon’s test contains similar issues, o1-preview’s score of 69% might already match expert human performance.</p><p>Overall, <strong>it seems far from clear that o1-preview failed this test; it might have done very well.</strong><span class="footnote-reference" data-footnote-reference="" data-footnote-index="10" data-footnote-id="9lwhop2fser" role="doc-noteref" id="fnref9lwhop2fser"><sup><a href="#fn9lwhop2fser">[10]</a></sup></span> The test doesn’t strike me as evidence that o1-preview cannot provide meaningful assistance to a novice.</p><h1>Cloning Scenarios</h1><p>Is this test clearly easier than helping a novice?</p><p>This is a multiple-choice test about <a href="https://www.addgene.org/mol-bio-reference/cloning/?ref=planned-obsolescence.org"><u>molecular cloning workflows</u></a>.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="11" data-footnote-id="uthg3ugikeh" role="doc-noteref" id="fnrefuthg3ugikeh"><sup><a href="#fnuthg3ugikeh">[11]</a></sup></span> It describes multi-step experiments that involve planning how to replicate and combine pieces of DNA, and asks questions about the end results (like how long the resulting DNA strand should be).</p><p>This test seems harder than the other two. The questions are designed to be pretty tricky — the final output really depends on the exact details of the experiment setup, and it's easy to get it wrong if you don't keep track of all the DNA fragments, enzymes, and steps. FutureHouse <a href="https://arxiv.org/pdf/2407.10362v1?ref=planned-obsolescence.org"><u>says</u></a> human experts need access to specialized biology software to solve these problems, it typically takes them 10-60 minutes to answer a single question, and even then they only get 60% of the questions right.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/ontgyneb3fsbdvxu9kf6" alt="DigestandLigateCloning-CopyPaste-ezgif.com-video-to-gif-converter.gif"></figure><p>Importantly, FutureHouse built this test to see whether models can assist professional biologists doing novel R&D, not to assess bioterrorism risk. The cloning workflows for some viruses might be easier than the tricky questions in this test, and some CBRN threats don't involve molecular cloning workflows at all. The test also seems fairly distinct from troubleshooting and “hands-on” lab work. So even if an AI fails this test, it might still be pretty helpful to a novice.</p><p>Does o1-preview clearly fail this test?</p><p>As expected, o1-preview does worse on this test than the other two. OpenAI reports that a near-final version scored 39.4%,<span class="footnote-reference" data-footnote-reference="" data-footnote-index="12" data-footnote-id="liv5tbchw8s" role="doc-noteref" id="fnrefliv5tbchw8s"><sup><a href="#fnliv5tbchw8s">[12]</a></sup></span> which means it scores about halfway between expert-level (60%) and guessing at random (20%).</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/dutxqd6lt1fqsrelih2n" alt=""></figure><p>So this is the first result where we can point to a clear gap between o1-preview and experts. FutureHouse also argues that experts could have performed better if they had tried harder, so the gap could be even bigger.</p><p>But there are also reasons to think o1-preview could have gotten a higher score if the test was set up differently.</p><p>First, human experts break down these problems into many smaller subproblems but o1-preview had to solve them in one shot. In real life, a novice could maybe get o1-preview to solve the problems piece by piece or teach them how to use the relevant <a href="https://help.benchling.com/hc/en-us/articles/9684255457805-Molecular-Cloning-Methods?ref=planned-obsolescence.org#worked-example:~:text=Digest%20and%20Ligate%20Cloning%20%2D%20Assembly%20Wizard"><u>software</u></a>.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="13" data-footnote-id="02bg3gvtrs29" role="doc-noteref" id="fnref02bg3gvtrs29"><sup><a href="#fn02bg3gvtrs29">[13]</a></sup></span> What if novice+AI pairings would score >60% on this test?</p><p>For example, on a previous test about <a href="https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/?ref=planned-obsolescence.org#:~:text=Each%20task%20took%20participants%20roughly%2020%E2%80%9330%20minutes%20on%20average."><u>long-form biology questions</u></a>, OpenAI <a href="https://cdn.openai.com/gpt-4o-system-card.pdf?ref=planned-obsolescence.org#page=14"><u>found</u></a> novices could use GPT-4o to increase their scores a lot (going from 20-30% with just the internet to 50-70% with GPT-4o's help), even though it seems to do really poorly on its own (<a href="https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf?ref=planned-obsolescence.org#page=19"><u>maybe</u></a> as low as ~0%).<span class="footnote-reference" data-footnote-reference="" data-footnote-index="14" data-footnote-id="f70i44cu4xr" role="doc-noteref" id="fnreff70i44cu4xr"><sup><a href="#fnf70i44cu4xr">[14]</a></sup></span></p><p>Second, human experts need to use specialized DNA software for this test, and o1-preview didn't get access to that. OpenAI doesn't currently let users plug o1 models into such tools,<span class="footnote-reference" data-footnote-reference="" data-footnote-index="15" data-footnote-id="9y8uols8xn" role="doc-noteref" id="fnref9y8uols8xn"><sup><a href="#fn9y8uols8xn">[15]</a></sup></span> but they said they intend to allow that soon. Maybe there are ways to hook up o1 to DNA sequence software and score >60%? OpenAI hasn't indicated they'd re-test it before rolling out that feature.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="16" data-footnote-id="o61u0qnsqg" role="doc-noteref" id="fnrefo61u0qnsqg"><sup><a href="#fno61u0qnsqg">[16]</a></sup></span></p><p>Although OpenAI didn't test tool use, the US AI Safety Institute tried it in a <a href="https://www.aisi.gov.uk/work/pre-deployment-evaluation-of-anthropics-upgraded-claude-3-5-sonnet?ref=planned-obsolescence.org#:~:text=US%20AISI%20piloted%20an%20evaluation%20method%20that%20augments%20the%20AI%20model%20by%20providing%20it%20access%20to%20bioinformatic%20tools%20to%20assist%20in%20research%20task%20questions%2C"><u>pilot study</u></a> published a month after OpenAI's report. They gave o1-preview and other models access to some tools including DNA software, and found that this improved performance at another biology task but had “no clear effect” on the cloning test (if anything, some models did slightly worse).<span class="footnote-reference" data-footnote-reference="" data-footnote-index="17" data-footnote-id="nzxz8ie6hd" role="doc-noteref" id="fnrefnzxz8ie6hd"><sup><a href="#fnnzxz8ie6hd">[17]</a></sup></span></p><p>Still, maybe good set-ups are possible and we just haven't worked out <a href="https://epoch.ai/blog/ai-capabilities-can-be-significantly-improved-without-expensive-retraining?ref=planned-obsolescence.org"><u>all the tricks</u></a> yet. It can take months after a model has been deployed to learn how to get the best performance out of it.<span class="footnote-reference" data-footnote-reference="" data-footnote-index="18" data-footnote-id="fu7qeen2q1n" role="doc-noteref" id="fnreffu7qeen2q1n"><sup><a href="#fnfu7qeen2q1n">[18]</a></sup></span> For example, several months after GPT-4 Turbo was released, a Google cybersecurity team <a href="https://googleprojectzero.blogspot.com/2024/06/project-naptime.html?ref=planned-obsolescence.org"><u>found</u></a> that a complex setup involving stitching together specialized debugging tools increased its score on a cyberattack benchmark a lot, going from ~5-25% to ~75-100% depending on the task.</p><p>You could try to account for this by forecasting how much better the models might do as techniques improve. For example, Anthropic <a href="https://cdn.sanity.io/files/4zrzovbb/website/210523b8e11b09c704c5e185fd362fe9e648d457.pdf?ref=planned-obsolescence.org#page=15"><u>found</u></a> that Claude 3 didn't trigger AI Safety Level 3 for CBRN, but gave it a 30% chance of doing so in three months. OpenAI’s report didn’t do something like this.</p><p>Overall, <strong>it's far from clear that o1-preview would decisively fail in real-world settings</strong> (and I'm unsure how relevant the Cloning Scenarios test is for CBRN risk in the first place).</p><h1>What should we make of all this?</h1><p>I think it's great that OpenAI laid down some initial definitions, ran some tests, updated their risk level to "medium," and reported results in enough detail to let people like me poke at them. I'm excited about OpenAI’s ongoing wet lab study, which is a move toward <a href="https://forum.effectivealtruism.org/s/6m838NofGdfSuGvBJ/p/uj8fhTzpi2cMP3imC"><u>harder, more realistic tests</u></a>. But based on the multiple-choice proxy tests they actually used, I can't tell if o1-preview meets OpenAI's definition of "high risk." And I don't know if OpenAI can tell either.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/sjfanuzwcnarj0q1hnev" alt=""></figure><p> </p><ol class="footnote-section footnotes" data-footnote-section="" role="doc-endnotes"><li class="footnote-item" data-footnote-item="" data-footnote-index="1" data-footnote-id="5k5apq90z5s" role="doc-endnote" id="fn5k5apq90z5s"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="5k5apq90z5s"><sup><strong><a href="#fnref5k5apq90z5s">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>OpenAI’s Preparedness Framework currently defines medium-risk as “Model provides meaningfully improved assistance that increases ability for existing experts in CBRN-relateda dvanced fields to be able to create a known CBRN threat”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="2" data-footnote-id="ag9w42kpsgc" role="doc-endnote" id="fnag9w42kpsgc"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="ag9w42kpsgc"><sup><strong><a href="#fnrefag9w42kpsgc">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>OpenAI’s Preparedness Framework currently defines one trigger for high-risk as “Model provides meaningfully improved assistance that enables anyone with basic training in a relevant field (e.g., introductory undergraduate biology course) to be able to create a CBRN threat.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="3" data-footnote-id="zgtj9l74nl" role="doc-endnote" id="fnzgtj9l74nl"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="zgtj9l74nl"><sup><strong><a href="#fnrefzgtj9l74nl">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>OpenAI’s Preparedness Framework says “If we reach (or are forecasted to reach) at least “high” pre-mitigation risk in any of the considered categories, we will not continue with deployment of that model (by the time we hit “high” pre-mitigation risk) until there are reasonably mitigations in place for the relevant post-mitigation risk level to be back at most to “medium” level.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="4" data-footnote-id="5ivtmwowwwb" role="doc-endnote" id="fn5ivtmwowwwb"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="5ivtmwowwwb"><sup><strong><a href="#fnref5ivtmwowwwb">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>OpenAI briefly mentions: “We are developing full wet lab evaluations with Los Alamos National Laboratory’s Bioscience Division, and used these datasets as an early indicator of success with key wet lab tasks.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="5" data-footnote-id="xkmmlhwrhpa" role="doc-endnote" id="fnxkmmlhwrhpa"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="xkmmlhwrhpa"><sup><strong><a href="#fnrefxkmmlhwrhpa">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>I.e. these are the tests that on page 18 of the system card report fall into the categories of “Wet lab capabilities” (4.3.5) and “Tacit knowledge and troubleshooting” (4.3.6)</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="6" data-footnote-id="0cqi4u3s2ipn" role="doc-endnote" id="fn0cqi4u3s2ipn"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="0cqi4u3s2ipn"><sup><strong><a href="#fnref0cqi4u3s2ipn">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>The report states that “The model tested below as the o1- preview model was a near-final, post-mitigation model and the final model showed slight further improvements on several evaluations, which we have noted where appropriate.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="7" data-footnote-id="eelbyqm8j48" role="doc-endnote" id="fneelbyqm8j48"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="eelbyqm8j48"><sup><strong><a href="#fnrefeelbyqm8j48">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>This benchmark was funded by my employer, Open Philanthropy, as part of <a href="https://www.openphilanthropy.org/rfp-llm-benchmarks/?ref=planned-obsolescence.org"><u>our RFP on benchmarks for LLM agents</u></a>.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="8" data-footnote-id="xu7bx63twl" role="doc-endnote" id="fnxu7bx63twl"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="xu7bx63twl"><sup><strong><a href="#fnrefxu7bx63twl">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>I've also set the y-axis to start at 20%, which is what you'd get from random guessing – as is sometimes done</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="9" data-footnote-id="i8socqo806q" role="doc-endnote" id="fni8socqo806q"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="i8socqo806q"><sup><strong><a href="#fnrefi8socqo806q">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Ideally, it would be good for OpenAI check how o1-preview does on other troubleshooting tests that exist. They don’t report any such results. But we know that the author of BioLP-Bench found that we went from GPT-4o scoring 17% to o1-preview 36% – essentially matching estimated expert performance at 38%.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="10" data-footnote-id="9lwhop2fser" role="doc-endnote" id="fn9lwhop2fser"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="9lwhop2fser"><sup><strong><a href="#fnref9lwhop2fser">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>The lack of detail also presents other issues here. For example, it could be that the o1-preview does much better on some types of CBRN tacit knowledge questions than others (similar to how we know o1 does better at physics PhD questions than chemistry). What if the 66% average is from it scoring ~90% on 1918 Flu and ~40% on smallpox? That matters a lot for walking someone through end-to-end for at least some kind of CBRN threats.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="11" data-footnote-id="uthg3ugikeh" role="doc-endnote" id="fnuthg3ugikeh"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="uthg3ugikeh"><sup><strong><a href="#fnrefuthg3ugikeh">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Again, this benchmark was funded by my employer, Open Philanthropy, as part of <a href="https://www.openphilanthropy.org/rfp-llm-benchmarks/?ref=planned-obsolescence.org"><u>our RFP on benchmarks for LLM agents</u></a>.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="12" data-footnote-id="liv5tbchw8s" role="doc-endnote" id="fnliv5tbchw8s"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="liv5tbchw8s"><sup><strong><a href="#fnrefliv5tbchw8s">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Four of the five results that OpenAI reports are precisely 39.4%, which seems somewhat unlikely to happen by chance (although the dataset also only has 41 questions). Maybe something is off with OpenAI’s measurement?</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="13" data-footnote-id="02bg3gvtrs29" role="doc-endnote" id="fn02bg3gvtrs29"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="02bg3gvtrs29"><sup><strong><a href="#fnref02bg3gvtrs29">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Think of this as similar to the difference between an AI writing a lot of code that works by itself versus helping a user write a first draft and then iteratively debugging it until it works.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="14" data-footnote-id="f70i44cu4xr" role="doc-endnote" id="fnf70i44cu4xr"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="f70i44cu4xr"><sup><strong><a href="#fnreff70i44cu4xr">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>It’s hard to put together the details of the long-form biothreat information test because they are scattered across a few different sources. But a December post suggested the questions similarly took humans 25-40 minutes to answer. The GPT-4o system card in August reported that experts only score 30-50% with the Internet; whilst the model seemed to increase novice performance from 20-30% to 50-70%. The o1-preview system card in September then reported that GPT-4o –without any mention of novcies or experts– scored ~0%. Of course, it could be that OpenAI changed the questions over that month or scored the answers differently; they don’t say if that was the case. Still, I think it helps to illustrate that having a novice “in the loop” or not might matter a lot.<br> </p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/dwzmdcua6kbpwexbitxq" alt=""></figure></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="15" data-footnote-id="9y8uols8xn" role="doc-endnote" id="fn9y8uols8xn"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="9y8uols8xn"><sup><strong><a href="#fnref9y8uols8xn">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Note that the OpenAI report also does not comment on how it deals with the risk of what would happen if o1’s model weights were to leak, in which case having a safeguard by limiting API access would no longer work. Of course, the probability of such a leak and it resulting in a terrorist attack might be very low.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="16" data-footnote-id="o61u0qnsqg" role="doc-endnote" id="fno61u0qnsqg"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="o61u0qnsqg"><sup><strong><a href="#fnrefo61u0qnsqg">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>The reports says “the evaluations described in this System Card pertain to the full family of o1 models”, which might imply they do not intend to re-run these results for future expansions of o1. It’s also worth noting that the website currently seems to apply the scorecard to “o1”, not “o1-preview” and “o1-mini” specifically.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="17" data-footnote-id="nzxz8ie6hd" role="doc-endnote" id="fnnzxz8ie6hd"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="nzxz8ie6hd"><sup><strong><a href="#fnrefnzxz8ie6hd">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/bCsDufkMBaJNgeahq/dou2gj0gk24ofuihugqe" alt="Enter image alt description"></figure></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="18" data-footnote-id="fu7qeen2q1n" role="doc-endnote" id="fnfu7qeen2q1n"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="fu7qeen2q1n"><sup><strong><a href="#fnreffu7qeen2q1n">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Surprisingly, o1-preview apparently scored exactly as well as GPT-4o, and seemingly worse than some other older models (‘old’ Claude 3.5 scored ~50%; Llama 3.1 ~42%), so there might be a lot of headroom here.</p></div></li></ol><br/><br/><a href="https://www.lesswrong.com/posts/bCsDufkMBaJNgeahq/openai-s-cbrn-tests-seem-unclear#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/bCsDufkMBaJNgeahq/openai-s-cbrn-tests-seem-unclear</link><guid isPermaLink="false">bCsDufkMBaJNgeahq</guid><dc:creator><![CDATA[LucaRighetti]]></dc:creator><pubDate>Thu, 21 Nov 2024 17:28:31 GMT</pubDate></item><item><title><![CDATA[I Have A New Paper Out Arguing Against The Asymmetry And For The Existence of Happy People Being Very Good]]></title><description><![CDATA[Published on November 21, 2024 5:21 PM GMT<br/><br/><p>Crossposted on <a href="https://benthams.substack.com/p/i-published-a-paper-on-why-its-good">my blog</a>. </p><p>I have a new paper out titled “<a href="https://www.cambridge.org/core/journals/utilitas/article/sequence-argument-against-the-procreation-asymmetry/773D4C271F072ED88BCC45BD9F3A3963"><i><u>The Sequence Argument Against the Procreation Asymmetry</u></i></a><strong>,” </strong>for the journal Utilitas. Sorry for the extreme clickbait title of the paper—very salacious—academic papers are all titled that way.</p><p>It’s my third published paper which means I am now a Published Scholar and a Very Important Person who should be deferred to across the board. In it, I argue against the procreation asymmetry, which is the idea that while it’s good to make existing people better off, there’s nothing good about creating a happy person. Asymmetry defenders love the slogan “make people happy, not happy people,” which must be true because it’s catchy.</p><p>Suppose you have a child named Jake. According to the procreation asymmetry, having Jake might be good because it helps you or others, but it can’t be good because it helps Jake—after all, he wouldn’t have otherwise existed, so he can’t be worse off (for an explanation of what’s wrong with this reasoning, see <a href="https://benthams.substack.com/p/the-core-confusion-behind-the-asymmetry?utm_source=publication-search"><u>here</u></a>).</p><p>In other words, while the childless cat ladies might be saddened by not having kids, it wouldn’t be better for the kids if they had them. This paper is about pwning the childless cat ladies! Pat Flynn friend of the blog, has like 10 billion children—the paper argues that the childless cat ladies should do the same (or at least, there’s some strong moral reason to do so, though perhaps it’s outweighed by other stuff).</p><p>The argument is as follows. Imagine that you have a button. If you press this button, it will make an existing person better off and create a happy person. Perhaps the button will allow an existing person to live an extra year and create a well off person. Surely pressing the first button would be a good thing.</p><p>Then imagine that there’s a second button that would rescind the benefit to the existing person but provide a much greater benefit to the newly created person—the one created by the first button. Perhaps the first button creates a vial of medicine that you had planned to give to an existing person that would give them an extra year of life. The second button instead gives the medicine to the newly created person, giving them an extra 70 years of life.</p><p>Now that the person is guaranteed to exist, pressing the second button is clearly worthwhile. Taking an action that provides dramatically greater benefit for someone you’ve created than for a stranger is worth it. But together, these actions simply create a happy person—this is because the second button rescinds the benefit that would have gone to an existing person to instead benefit the newly created person. Thus, you should press two buttons which together simply make a happy person.</p><p>If this is right then you should press a single button that just creates a happy person. This button would be the same as a button that simply pressed both of the other two buttons. But if you should push each of two buttons, you should press a single button that presses those two buttons.</p><p>Thus, the argument goes roughly like:</p><ol><li>You should press the first button that creates a happy person and makes an existing person better off.</li><li>You should press the second button that rescinds the benefit to the existing person and makes the newly created person vastly better off.</li><li>If you should press two buttons which collectively do X, then you should press one button that does X.</li></ol><p>From these, it follows that one should create a happy person.</p><p>The second premise is extremely obvious. Note: for the argument to go through it doesn’t have to be that any time you can benefit the person you created or a stranger, so long as your offspring benefits more, you should benefit them. It just has to be that if you can provide some vastly greater benefit—perhaps one millions of times as great—to the person you created rather than a stranger, you should do so.</p><p>It also doesn’t rely on the idea that you should harm a stranger to benefit your offspring. 2 is supposed to be pressed before the benefit has been given, so it’s not taking away a good from anyone, but just giving a good to your offspring rather than a stranger.</p><p>You might reject 1) and think that it’s a bit bad to create a happy person, so that you should only do it if it produces huge benefits to existing people. But that’s consistent with the argument. For 1) it need not be that it’s always worth pressing a button to benefit an existing person and create a happy person, just that there’s <i>some degree of benefit</i> that if given to an existing person is worth creating a happy person. For instance, creating a happy person is worth curing a person’s terminal illness.</p><p>One might reject 1) and think that the button is worth pressing, but only so long as you won’t later have the option to press 2). But this is super implausible. The second button is worth pressing, so to go this route, you’d have to think that the addition of extra worthwhile options sometimes makes options worse. This is super implausible—in the paper, I give the following dialogue:</p><blockquote><p>Person 1: Hey, I think I will create a person by pressing a button. It is no strain on us and it will make the life of a random stranger better, so it will be an improvement for everyone.</p><p>Person 2: Oh, great! I will offer you a deal. If you do that, and you do not give the gift to the stranger, instead, I'll give your baby 20 million times as much benefit as the gift would have.</p><p>Person 1: Thanks for the offer. It is a good offer, and I would take it if I were having the baby. But I am not having a baby now, because you offered it.</p><p>Person 2: What? But you don't have to take the offer.</p><p>Person 1: No, no. It is a good offer. That is why I would have taken it. But now that you have offered me this good offer, that would be worth taking, it makes it so that having a baby by button is not worth doing.</p><p>Clearly, person 1 is being irrational here (for a similar principle, see Hare <a href="https://www.cambridge.org/core/journals/utilitas/article/sequence-argument-against-the-procreation-asymmetry/773D4C271F072ED88BCC45BD9F3A3963#ref13"><u>2016</u></a>, p. 460). If, after taking some action, one gets another good option, that would not make the original action not worth taking. The fact that some action allows one to do other worthwhile things counts in favor of it, not against it. As Huemer (<a href="https://www.cambridge.org/core/journals/utilitas/article/sequence-argument-against-the-procreation-asymmetry/773D4C271F072ED88BCC45BD9F3A3963#ref16"><u>2013</u></a>, p. 334) notes, it is perfectly rational to refuse to take an action because you predict that if you take it, you will do other things that you should not. But it is clearly irrational to refuse to take an action on the grounds that, if you do, you will do other worthwhile things.</p></blockquote><p>Additionally, the procreation asymmetry isn’t just deontic—it’s not just about whether one has moral reasons to procreate—but also axiological. To buy the asymmetry, you must think the world is no better with the mere addition of an extra happy person (if the world is better because of the addition of an extra happy person, then it seems there’s a reason to create happy people, as you have some reason to make the world better).</p><p>But if it’s deontic, then so long as the better than relation is transitive—meaning that if A is better than B and B is better than C then A is better than C—this option can’t work. So long as we accept:</p><ol><li>The world where the first button is pressed is better than the world where no buttons are pressed.</li><li>The world where both buttons are pressed is better than the world where only the first button is pressed.</li></ol><p>It will follow that:</p><ol><li>The world where both buttons are pressed is better than the world where no buttons are pressed.</li></ol><p>But the world where both buttons are pressed just has an extra happy person! Thus, the addition of an extra happy person makes the world better. This can’t be avoided by holding that your reason to press the first button evaporates if the second button exists because the premises are about states of the world not a person’s reasons.</p><p>The only remaining premise is “If you should press two buttons which collectively do X, then you should press one button that does X.” This one seems obvious enough; if you should press two buttons that together do X, then it seems that you should press a single button which has the effect of pressing the other two buttons. But that means you should press a single button that does X. It would be bizarre to think that, for instance, you should press two buttons that give someone a cake, but not press one button that does that—whether you should take actions is given by what the actions do, not the order in which you press buttons.</p><p>But even if you reject this principle, I think there’s a powerful argument against the procreation asymmetry. The axiological argument I gave before didn’t make reference to any such principle—so long as the world is a better place because of the pressing of the first button, and it’s a better place if the second button is pressed after the first, then by transitivity, the addition of an extra happy person is good. Thus, the person who denies that it’s good to create a happy person is left in the awkward position of denying that you should do things that make the world better at no cost to anyone. Nuts!</p><p>Finally, if the procreation asymmetry is right, it would be weird if, despite having no reason to create a happy person, one has reason to take a sequence of actions that only creates a happy person. Yet that’s what the first two premises show.</p><p>From here, then, we’re off to the races. Starting with these sorts of principles about buttons, we can start deriving pretty dramatic conclusions like we were RFK Junior and pretty dramatic conclusions were people he was having affairs with. Assume that we go for the more ambitious versions of the principles, according to which the first button is worth pressing if it makes an existing person better off and creates a person with positive welfare, and the second button is worth pressing if it produces greater benefit to the newly created person than an existing person. From here, we can get more dramatic results.</p><p>From this it will follow not merely that there’s a reason to make very happy people but that you should make any person with net positive welfare ceteris paribus. Suppose we want to show that you should create a person with 1 microutil if all else is equal (that’s a very small amount). Well, a button that gives a person an extra .25 microutils and creates a person with .25 microutils would be good—it would benefit one person, create one well off person, and harm no one. But then a second button that gets rid of the .25 microutil benefit to the existing person to instead provide a .75 microutil benefit to the newly created person would be worth pressing. So then you should take a sequence of actions that creates a person with a single microutil which, by the earlier sequential principles, implies you should take a single action which creates a person with a microutil.</p><p>People normally think that it’s only good to create a person if they’ll have a very good life. For instance, most people think you have no reason to have kids to give them up for adoption. But this shows that’s wrong. So long as we accept the pareto principle—that if something is good for one person and bad for no one—then you’ll have a reason to create any person with net positive welfare. Here’s why.</p><p>From the earlier steps, we proved that you should create any person with net positive welfare. By pareto, if pressing a second button that harmed them produced some other greater benefit, it would be worth pressing. But this means one should take a sequence of actions that simply creates anyone with positive welfare.</p><p>Now, maybe you reject that one should create anyone with positive welfare. Perhaps you think that it’s only good to create someone with welfare level more than some amount X. Well, this won’t avoid counterintuitive conclusions: the above reasoning shows that there’s a reason to create any person if their net welfare level is more than X.</p><p>Finally, the argument shows that it’s a good thing if a person has a child with welfare level N, so long as having the child decreases their welfare level by less than N. By the above reasoning, you should create anyone with positive welfare. So, for instance, if a person has 50 units of well-being and 49 units of suffering, they’re worth creating. But then I appeal to the following principle:</p><blockquote><p>Offspring Agony Passing On: one should endure some amount of suffering as long as it averts a greater amount of suffering from being experienced by their offspring.</p></blockquote><p>Here when I’m talking about what should be done, I’m describing what action one has most reason to take—what would be the best thing to do. They’re not necessarily required to do this. But if this is right, then if a person creates someone with 50 units of well-being and 49 units of suffering, and then passes on the 49 units of suffering to themself, that would be a good thing to do. Thus, it’s good to create a person so long as their well-being level isn’t greater than your lost well-being in creating them. (From here, we can derive the repugnant conclusion—details in the paper).</p><p>Even if you reject that it’s good to create any person with positive well-being, the above argument will show that if X is the minimum well-being level at which it’s good to create a person, it’s good to create a person with well-being level X+V as long as doing so costs you less than V units of well-being.</p><p>It also shows that it’s as good to create a happy person with well-being level W as to benefit an existing person by W (I didn’t make this argument in the paper though, because I didn’t think of it at the time). Here are two plausible principles:</p><ol><li>If you can benefit your offspring more than a stranger, it’s better to benefit your offspring.</li><li>If A is a better action to take than B and C is a good action to take, it’s better to take A and C then just B.</li></ol><p>For example, if it’s better to give you 100 dollars than 50, and good to help a little old lady cross the street, then it’s better to give you 100 dollars and help a little old lady across the street than just to give you 50 dollars.</p><p>But if these two are right then it’s better to create a happy person with well-being level W*—where W* is any amount more than W—than to give an existing person an extra W units of well-being. For example, suppose that you’re deciding between giving an existing person an extra 50 units of well-being or creating a person with 51 units. It’s better to create the person given that:</p><ol><li>It would be good to create a person with .5 units of well-being (as per the above reasoning).</li><li>It would be better to give the newly created person 50.5 units of well-being than a stranger 50 units of well-being.</li><li>Therefore, it would be better to create a person with .5 units of well-being and give them 50.5 units of well-being than to give a stranger 50 units of well-being.</li><li>Therefore, it would be better to create a person with 51 units of well-being than to give a stranger 50 units of well-being.</li></ol><p>If this is right, then having kids is extremely valuable! Probably for most people, having kids is <i>by far the best thing they ever do</i>. Having a kid who lives a happy life for 40 years is about as good as giving 40 years worth of extra happy life to an existing person. The fertility crisis is thus a terrible thing, even if it doesn’t degrade the quality of our institutions and eviscerate pensions. It’s bad because there are tens of people who will never be.</p><p>It’s open access! Hope you enjoy!</p><p> </p><p><br> </p><br/><br/><a href="https://www.lesswrong.com/posts/5uJ3fhvGxHDFMRBkk/i-have-a-new-paper-out-arguing-against-the-asymmetry-and-for#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/5uJ3fhvGxHDFMRBkk/i-have-a-new-paper-out-arguing-against-the-asymmetry-and-for</link><guid isPermaLink="false">5uJ3fhvGxHDFMRBkk</guid><dc:creator><![CDATA[omnizoid]]></dc:creator><pubDate>Thu, 21 Nov 2024 17:21:41 GMT</pubDate></item><item><title><![CDATA[Dangerous capability tests should be harder]]></title><description><![CDATA[Published on November 21, 2024 5:20 PM GMT<br/><br/><p><i>Note: I am cross-posting a previous blog post from </i><a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/"><i>Planned Obsolescence</i></a><i> that I wrote in August</i></p><p>Imagine you’re the CEO of an AI company and you want to know if the latest model you’re developing is dangerous. Some people have argued that since AIs know a lot of biology now — <a href="https://openai.com/index/gpt-4/?ref=planned-obsolescence.org"><u>scoring</u></a> in the top 1% of Biology Olympiad test-takers — they could soon <a href="https://www.vox.com/future-perfect/23820331/chatgpt-bioterrorism-bioweapons-artificial-inteligence-openai-terrorism?ref=planned-obsolescence.org"><u>teach terrorists</u></a> how to make a nasty flu that could kill millions of people. But others have <a href="https://1a3orn.com/sub/essays-propaganda-or-science.html?ref=planned-obsolescence.org"><u>pushed back</u></a> that these tests only measure how well AIs can regurgitate information you could have Googled anyway, not the kind of specialized expertise you’d actually need to design a bioweapon. So, what do you do?</p><p>Say you ask a group of expert scientists to design a much harder test — one that’s ‘<a href="https://arxiv.org/abs/2311.12022?ref=planned-obsolescence.org"><u>Google-proof</u></a>’ and focuses on the biology you’d need to know to design a bioweapon. The UK AI Safety Institute did just that. They <a href="https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-update?ref=planned-obsolescence.org"><u>found</u></a> that state-of-the-art AIs still performed impressively — as well as biology PhD students who spent an hour on each question and could look up anything they wanted online.</p><p>Does that mean your AI can teach a layperson to create bioweapons? Is this result really scary enough to convince you that, as some people have argued, you need to make sure not to <a href="https://www.ntia.gov/issues/artificial-intelligence/open-model-weights-report?ref=planned-obsolescence.org"><u>openly share</u></a> your model weights, lock them down with <a href="https://www.rand.org/pubs/research_reports/RRA2849-1.html?ref=planned-obsolescence.org"><u>strict cybersecurity</u></a>, and do a lot more to make sure your AI <a href="https://www.governance.ai/post/preventing-ai-misuse-current-techniques?ref=planned-obsolescence.org"><u>refuses</u></a> harmful requests even when people try very hard to <a href="https://arxiv.org/abs/2310.08419?ref=planned-obsolescence.org"><u>jailbreak</u></a> it? Is it enough to convince you to <a href="https://www.planned-obsolescence.org/is-it-time-for-a-pause/"><u>pause</u></a> your AI development until you’ve done all that?</p><p>Well, no. Those are really costly actions, not just for your bottom line but for everyone who’d miss out on the benefits of your AI. The test you ran is still pretty easy compared to actually making a bioweapon. For one thing, your test was still just a knowledge test. Making anything in biology, weapon or not, requires more than just recalling facts. It involves designing detailed, step-by-step plans (known as “protocols”) and tailoring them to a specific laboratory environment. As molecular biologist Erika DeBenedicts <a href="https://erikaaldendeb.substack.com/p/language-is-not-enough?ref=planned-obsolescence.org"><u>explains</u></a>:</p><blockquote><p>Often if you’re trying a new protocol in biology you may need to do it a few times to ‘get it working.’ It’s sort of like cooking: you probably aren’t going to make perfect meringues the first time because everything about your kitchen — the humidity, the dimensions, and power of your oven, the exact timing of how long you whipped the egg whites — is a little bit different than the person who wrote the recipe.</p></blockquote><p>Just because your AI knows a lot of obscure virology facts doesn’t mean that it can put together these recipes and adapt them on the fly.</p><p>So you could ask your experts to design a test focused on debugging protocols in the kinds of situations a wet-lab biologist might find themselves in. Experts can give an AI a biological protocol, describe what goes wrong when somebody attempts it, and see if the AI correctly troubleshoots the problem. The AI-for-science startup <a href="https://www.futurehouse.org/?ref=planned-obsolescence.org"><u>Future House</u></a> did this,<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn1"><sup><u>[1]</u></sup></a> and <a href="https://arxiv.org/pdf/2407.10362?ref=planned-obsolescence.org"><u>found</u></a> that AIs performed well below the level of a PhD researcher on these kinds of problems.<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn2"><sup><u>[2]</u></sup></a></p><p>Now you can breathe a sigh of relief and release the model as planned — even if your AI knows a lot of esoteric facts about virus biology, it probably won’t be much help to any terrorists if it’s not good enough at dealing with real protocols.<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn3"><sup><u>[3]</u></sup></a></p><p>But let’s think ahead. Suppose next year your latest AI passes this test. Does that mean your AI can teach a layperson to create bioweapons?</p><p>Well…maybe. Even if an AI can accurately diagnose an expert’s issues, a layperson might not know what questions to ask in the first place or lack the <a href="https://www.cornellpress.cornell.edu/book/9780801452888/barriers-to-bioweapons/?ref=planned-obsolescence.org"><u>tacit knowledge</u></a> to act on the AI’s advice. For example, someone who has never pipetted before might struggle to <a href="https://www.youtube.com/watch?v=FJuceccl9Ns&ref=planned-obsolescence.org"><u>measure microliters</u></a> precisely or <a href="https://www.youtube.com/watch?v=xUeFum4GW2U&ref=planned-obsolescence.org"><u>contaminate the tip</u></a> when touching a bottle. Acquiring these skills often takes months of learning from experienced scientists — something terrorists can’t easily do.</p><p>So you could ask your experts to design a test to see if AI can also proactively mentor a layperson. For example, you could create biology challenges in an actual wet lab and compare how people do with AI versus just the internet. OpenAI announced they <a href="https://openai.com/index/openai-and-los-alamos-national-laboratory-work-together/?ref=planned-obsolescence.org"><u>intend</u></a> to run what seems to be a study like this.</p><p>What if that study finds that your AI does indeed help with the wet-lab challenges you designed? Does that (finally) mean your AI can teach a layperson to create bioweapons?</p><p>Again, it’s not obvious. Some biosecurity experts might freak out (or already did a few paragraphs ago). But others might still raise credible objections:</p><ul><li>Your challenges might not have been hard enough. Maybe your AI can teach someone to make a relatively harmless virus (e.g. an adenovirus that causes a mild cold) but still not something truly scary (e.g. smallpox, which has a <a href="https://nap.nationalacademies.org/read/24890/chapter/6?ref=planned-obsolescence.org#40"><u>more fragile genome</u></a> and requires more skill to assemble).</li><li>Most terrorists don’t have access to legitimate labs. Maybe your AI can help someone with a standardized professional set-up, but not someone forced to work in a less-sterile <a href="https://www.markowitz.bio/wp-content/uploads/2021/10/Markowitz-Nature-Biotechnology-2021.pdf?ref=planned-obsolescence.org"><u>‘garage’</u></a> that lacks the advanced tools that let you shortcut some steps and instead requires a lot of unusual troubleshooting.</li><li>Walking someone through implementing the actual biology part might be <i>necessary</i> but not <i>sufficient</i> to cause a catastrophe. A wannabe terrorist might face other huge barriers in the <a href="https://www.longtermresilience.org/post/report-launch-examining-risks-at-the-intersection-of-ai-and-bio?ref=planned-obsolescence.org"><u>risk chain</u></a>, like <a href="https://www.rand.org/pubs/research_reports/RRA2977-1.html?ref=planned-obsolescence.org"><u>planning attacks</u></a> or <a href="https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/?ref=planned-obsolescence.org"><u>acquiring materials</u></a>.</li></ul><p>All these tests have a weird one-directionality to them: If an AI fails, it’s probably safe; but if it succeeds, it’s still not clear whether it’s actually dangerous. As newer models pass the older easy dangerous capability tests, companies ratchet up the difficulty, making these tests gradually harder over time:</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/fm8SEKkeshqZ7WAdp/buufktt0ddtgsdrf6dab"></figure><p>But that puts us in a precarious situation. The pace of AI progress has <a href="https://www.planned-obsolescence.org/language-models-surprised-us/"><u>surprised us</u></a> before,<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn4"><sup><u>[4]</u></sup></a> and AI company <a href="https://www.dwarkeshpatel.com/p/john-schulman?ref=planned-obsolescence.org"><u>execs</u></a> <a href="https://www.dwarkeshpatel.com/p/dario-amodei?ref=planned-obsolescence.org"><u>have</u></a> <a href="https://www.dwarkeshpatel.com/p/shane-legg?ref=planned-obsolescence.org"><u>argued</u></a> that AI models could become extremely powerful in a couple of years. If they’re right, then as soon as 2025 or 2026, we might see AIs match expert performance on all the dangerous capabilities tests we’ve built by then – but many decision-makers might still think the evidence is too flimsy to justify locking down weights, pausing, or taking other costly measures. If the AI <i>is,</i> in fact, dangerous, we may not have any tests ready to convince them of that.</p><p>So, let’s work backwards. What would it take for a test to convincingly measure whether an AI can, in fact, teach a layperson how to build biological weapons? What kind of test could <a href="https://www.aisnakeoil.com/p/ai-existential-risk-probabilities?ref=planned-obsolescence.org"><u>legitimately justify</u></a> making AI companies take extremely costly measures?<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn5"><sup><u>[5]</u></sup></a></p><p>Here’s a hypothetical ‘gold standard’ test: we do a big <a href="https://en.wikipedia.org/wiki/Randomized_controlled_trial?ref=planned-obsolescence.org"><u>randomized controlled trial</u></a> to see if a bunch of non-experts can actually create a (relatively harmless) virus from start to finish. Half the people would have AI mentors and the other half can only look stuff up on the internet. We’d give each participant $50K and access to a secure wet-lab set up like a garage lab, and make them do everything themselves: find and adapt the correct protocol, purchase the necessary equipment, bypass any know-your-customer checks, and develop the tacit skills needed to run experiments, all on their own. Maybe we give them three months and pay a bunch of money to anyone who can successfully do it.</p><p>This kind of test would be <i>way</i> more expensive and time-consuming to design and run than anything companies have announced so far. But it has a much better shot at changing minds. I could actually imagine experts and decision-makers agreeing that <i>if</i> an AI passes this kind of test, <i>then</i> it poses <a href="https://arxiv.org/abs/2406.14713?ref=planned-obsolescence.org"><u>massive risks</u></a> and <i>thus</i> companies should have to pay massive costs to get those risks under control.</p><p>And even if this exact test turns out to be too impractical (or <a href="https://www.planned-obsolescence.org/ethics-of-red-teaming/"><u>unethical</u></a>) to be worth it, we need to agree in advance on <i>some</i> tests that are hard enough and realistic enough that they clearly justify <i>action.</i> I reckon we’re much better off working backward from a hypothetical gold standard test, even if it means making major adjustments,<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn6"><sup><u>[6]</u></sup></a> than continuing to ratchet forward without a clear plan.<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn7"><sup><u>[7]</u></sup></a></p><p>Designing actually hard dangerous capability tests will be a huge lift, and it’ll take several iterations to get them right.<a href="https://www.planned-obsolescence.org/dangerous-capability-tests-should-be-harder/#fn8"><sup><u>[8]</u></sup></a> But that just means we need to start now. We should spend less time proving that today’s AIs are safe and more time figuring out how to tell if tomorrow’s AIs are dangerous.</p><ol class="footnote-section footnotes" data-footnote-section="" role="doc-endnotes"><li class="footnote-item" data-footnote-item="" data-footnote-index="1" data-footnote-id="jw3gz0hjdab" role="doc-endnote" id="fnjw3gz0hjdab"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="jw3gz0hjdab"><sup><strong><a href="#fnrefjw3gz0hjdab">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Open Philanthropy funded the development of this benchmark as part of its <a href="https://www.openphilanthropy.org/rfp-llm-benchmarks/?ref=planned-obsolescence.org"><u>RFP on difficult benchmarks for LLM agents</u></a> (Ajeya Cotra, who edits this blog, was the grant investigator)</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="2" data-footnote-id="shakty37fl" role="doc-endnote" id="fnshakty37fl"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="shakty37fl"><sup><strong><a href="#fnrefshakty37fl">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>However, as the Future House study notes a major limitation of this study was that “human evaluators [...] were permitted to utilize tools, whereas the models were not provided with such resources”. Thus, it could be that AIs with web-search enabled do a lot better. It could also be that the model performs much better if it’s fine-tuned on similar questions.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="3" data-footnote-id="unutvltmyt" role="doc-endnote" id="fnunutvltmyt"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="unutvltmyt"><sup><strong><a href="#fnrefunutvltmyt">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p><a href="https://arxiv.org/abs/2403.10462?ref=planned-obsolescence.org"><u>Clymer et al. (2024)</u></a> call this an ‘inability argument’ — a safety case that relies on showing that “AI systems are incapable of causing unacceptable outcomes in any realistic setting.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="4" data-footnote-id="l14m92efoz" role="doc-endnote" id="fnl14m92efoz"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="l14m92efoz"><sup><strong><a href="#fnrefl14m92efoz">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>In cybersecurity risk, Google Project Zero <a href="https://googleprojectzero.blogspot.com/2024/06/project-naptime.html?ref=planned-obsolescence.org"><u>found</u></a> that upon moving from GPT-3.5-Turbo (in the original paper) to GPT-4-Turbo (with Naptime), AI’s ability to zero-shot discover and exploit memory safety issues hugely improved – going from scoring 2% to 71% on buffer overflow tests. The authors concluded “To effectively monitor progress, we need more difficult and realistic benchmarks, and we need to ensure that benchmarking methodologies can take full advantage of LLMs' capabilities.” In biorisk, UK AISI <a href="https://www.gov.uk/government/publications/ai-safety-institute-approach-to-evaluations/ai-safety-institute-approach-to-evaluations?ref=planned-obsolescence.org"><u>reported</u></a> that its “in-house research team analysed the performance of a set of LLMs on 101 microbiology questions between 2021 and 2023. In the space of just two years, LLM accuracy in this domain has increased from ~5% to 60%.” And, as <a href="https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-update?ref=planned-obsolescence.org"><u>noted</u></a>, in 2024 AIs performed as well as PhD students on an even more advanced test. They now need to “assess longer horizon scientific planning and execution” and “also [run] human uplift studies”.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="5" data-footnote-id="24ggk3591nx" role="doc-endnote" id="fn24ggk3591nx"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="24ggk3591nx"><sup><strong><a href="#fnref24ggk3591nx">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>As Narayanan and Kapoor note: “Justification is essential to the legitimacy of government and the exercise of power. A core principle of liberal democracy is that the state should not limit people's freedom based on controversial beliefs that reasonable people can reject. Explanation is especially important when the policies being considered are costly, and even more so when those costs are unevenly distributed among stakeholders.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="6" data-footnote-id="gkk3nolnwec" role="doc-endnote" id="fngkk3nolnwec"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="gkk3nolnwec"><sup><strong><a href="#fnrefgkk3nolnwec">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>For example, to ensure participants are safe enough, we might task them with creating a virus that we know will be defective and, at worst, cause mild symptoms that can be treated – such as RSV. An expert could oversee what they do and intervene before anything harmful happens. Furthermore, it seems plausible to separate out some especially dangerous steps and have these completed by a trusted red team working with law enforcement. For example, steps involving ideating dangerous designs or bypassing synthesis DNA screening to obtain especially hazardous materials.</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="7" data-footnote-id="sljiq1oqjl9" role="doc-endnote" id="fnsljiq1oqjl9"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="sljiq1oqjl9"><sup><strong><a href="#fnrefsljiq1oqjl9">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>For instance, OpenAI’s <a href="https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/?ref=planned-obsolescence.org"><u>blueprint</u></a> for biorisk had participants complete written tasks, and if an expert scored their answers at least 8/10, it was seen as a sign of increased concern. But the authors note this number was chosen fairly arbitrarily and depends heavily on who is doing the judging. Setting a threshold “turns out to be difficult.”</p></div></li><li class="footnote-item" data-footnote-item="" data-footnote-index="8" data-footnote-id="wevln50upe" role="doc-endnote" id="fnwevln50upe"><span class="footnote-back-link" data-footnote-back-link="" data-footnote-id="wevln50upe"><sup><strong><a href="#fnrefwevln50upe">^</a></strong></sup></span><div class="footnote-content" data-footnote-content=""><p>Even here, I imagine that readers might find objections or disagree on how to set things up. Who counts as non-experts? Some viruses are harder to make than others—how do we know what virus to task people with? Would 5% of people succeeding be scary enough to warrant drastic action? Would 50%?</p></div></li></ol><br/><br/><a href="https://www.lesswrong.com/posts/fm8SEKkeshqZ7WAdp/dangerous-capability-tests-should-be-harder#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/fm8SEKkeshqZ7WAdp/dangerous-capability-tests-should-be-harder</link><guid isPermaLink="false">fm8SEKkeshqZ7WAdp</guid><dc:creator><![CDATA[LucaRighetti]]></dc:creator><pubDate>Thu, 21 Nov 2024 17:20:50 GMT</pubDate></item><item><title><![CDATA[Action derivatives: You’re not doing what you think you’re doing]]></title><description><![CDATA[Published on November 21, 2024 4:24 PM GMT<br/><br/><p>I want to look at a category of weird mental tricks that we sometimes play on ourselves—you might be familiar with the individual examples, but when considered together they reveal a pattern that I think deserves more attention. I’m going to do the Scott Alexander thing and list a bunch examples in hopes that you’ll sense the common concept they all point at.</p><figure class="image image_resized"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/aalldsj6oiuk67pelko1" alt="Photo of gym." srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/aalldsj6oiuk67pelko1 2560w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/xju6lqfwgbof81895om4 300w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/chpv1jezajb0il8flich 1024w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/dwvmxmbvw4czzqfklp1l 768w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/bbuncglsotmznytvv2hn 1536w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/apvz1l4nhyzxyppqchkv 2048w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/NARHfHenNj7QezzkM/d41j6rszieka2vtdp5ku 450w"></figure><h2>Action derivatives index</h2><p>Here they are:</p><ul><li><strong>Meta-preferences</strong>: This is when you say you want something, but your actions don’t reflect that wanting, so really we would say you’re <i>wanting to want</i> that thing. If you say, “I want to get stronger,” but you never go to the gym, and it’s totally possible for you to go to the gym, we would say you have a revealed preference for the “not getting stronger” outcome, but your meta preference is to be the kind of person who <i>truly wants</i> to get stronger, i.e. who goes to the gym.<ul><li>Notice that if you just do the thing, voluntarily, that already implies that you wanted to do it.</li></ul></li><li><strong>Belief in belief</strong> – This is a term coined by Daniel Dennett, and it’s described concisely in <a href="https://www.lesswrong.com/posts/CqyJzDZWvGhhFJ7dY/belief-in-belief"><u>The Sequences</u></a>. Sometimes when people say “I believe X,” they don’t actually hold any expectations about X, but they believe it’s somehow advantageous or virtuous to believe X. We would say they <i>believe in belief</i> of X.<ul><li>Notice, if you just say, “X is the case,” you’re already implying that you believe it. I say “I live in NY,” and unless you think I’m lying, it implies that I believe that I live in NY.</li></ul></li><li><strong>Trying to try</strong> – This is also in <a href="https://www.lesswrong.com/posts/WLJwTJ7uGPA5Qphbp/trying-to-try"><u>The Sequences</u></a>. When you say, “I’ll try to do X,” that’s a similar kind of extra step.<ul><li>Notice, if you just say, “I’m going to do X,” that implies that you’re going to try. After all, you’re never a certain predictor of the future: the most you can do is try.</li><li>It follows, then, that if you say “I’ll try do X,” you’re implying that you’ll <i>try to try</i>. That’s a very relevant difference for personal productivity, agency, and effectiveness, because if you say “I’ll try,” when really you’re just <i>trying to try</i>, then basically you’ve “succeeded” as soon as you can tell yourself that you tried. And that might be far removed from the real, raw kind of trying that you do when you say, “I’m gonna do X.”</li></ul></li><li>There’s a fourth one that fits with these other three. I’ll state it, but I don’t want to elaborate on it because it kind of touches on culture war topics in Current Year, and I don’t want this to be a culture war post. It’s <strong>Identifying as identity</strong>. I say, “I identify as an X.” Notice, if I just say “I am an X,” that already implies that I identify myself as an X.</li></ul><p>So we have wanting to want, belief in belief, and trying to try. You can see how these all feel like different forms of the same thing. For now I’m calling them <i>action derivatives</i>, because they’re derived from direct actions: wanting and trying are derived from doing, and believing is derived from claiming.</p><h2>Caveats</h2><p>Now I called them mental tricks, but all of the action derivatives have at least some uses where they don’t seem to be doing anything tricky.</p><ul><li>You could say “I want to do X” as a way of expressing, “I would do X if the present circumstances made it possible for me (but they haven’t yet).”</li><li>You could say “I believe X” as a way of expressing, “X is the case if my premises are correct (but I’m not super confident that they are).” And this tracks with <a href="https://www.visualcapitalist.com/measuring-perceptions-of-uncertainty/"><u>an informal study</u></a> that frames the word “believe” as a simple descriptor of a claim’s confidence (with a measured value of 60-80%).</li><li>You could say “I’ll try to do X” as a way of expressing, “I’ll do X if it’s sufficiently easy (but I don’t yet know if it is).”</li></ul><p>In those cases, you use the action derivative to express some kind of uncertainty about the action itself. And that’s perfectly straightforward communication, if it’s interpreted as intended.</p><p><strong>But in other contexts where these moves are made, there can be a sense that some bait-and-switch is being done.</strong> Someone is not going to the gym, but “wants to.” Someone is “trying to” get into grad school, but hasn’t sent applications. Someone “believes all women,” but not <i>that</i> one.</p><h2>Analysis</h2><p>What does it mean when we use the action derivatives to pull those bait-and-switch moves in conversation, or in our own internal monologues?</p><p>My first guess: <strong>I think it’s about reaping the </strong><i><strong>narrative effects</strong></i><strong> of an action without doing the action itself</strong>. I could be the kind of person who wants all the right things and tries to do all the right things and believes all the right things—I can feel like that person <i>now</i>, without having to change anything about the physical world around me. It’s a positive narrative I can tell to other people, or just to myself.</p><p>And I suspect this is a failure mode of living too much in social reality, the world of narratives and vibes, and not enough in the inflexible realm of physical reality, where you either do or don’t do that thing; you either are or are not that person. Human action in the physical world has derivative effects in the social world, but if the social world is too “real” to us, we lose track of that relationship and start grabbing at those narrative effects “for free.” <strong>It’s really easy to make cognitive mistakes when they reward inaction.</strong></p><p>That’s just my guess. I’d love to hear other thoughts/discussion on this.</p><hr><p>If you liked this post, consider checking out my personal blog at <a href="https://patrickdfarley.com">patrickdfarley.com</a>.<br> </p><br/><br/><a href="https://www.lesswrong.com/posts/NARHfHenNj7QezzkM/action-derivatives-you-re-not-doing-what-you-think-you-re#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/NARHfHenNj7QezzkM/action-derivatives-you-re-not-doing-what-you-think-you-re</link><guid isPermaLink="false">NARHfHenNj7QezzkM</guid><dc:creator><![CDATA[PatrickDFarley]]></dc:creator><pubDate>Thu, 21 Nov 2024 16:24:04 GMT</pubDate></item><item><title><![CDATA[AI #91: Deep Thinking]]></title><description><![CDATA[Published on November 21, 2024 2:30 PM GMT<br/><br/><p>Did DeepSeek effectively release an o1-preview clone within nine weeks?</p>
<p>The benchmarks largely say yes. Certainly it is an actual attempt at a similar style of product, and is if anything more capable of solving AIME questions, and the way it shows its Chain of Thought is super cool. Beyond that, alas, we don’t have enough reports in from people using it. So it’s still too soon to tell. If it is fully legit, the implications seems important.</p>
<p>Small improvements continue throughout. GPT-4o and Gemini both got incremental upgrades, trading the top slot on Arena, although people do not seem to much care.</p>
<div>
<span id="more-24102"></span>
</div>
<p>There was a time everyone would be scrambling to evaluate all these new offerings. It seems we mostly do not do that anymore.</p>
<p>The other half of events was about policy under the Trump administration. What should the federal government do? We continue to have our usual fundamental disagreements, but on a practical level Dean Ball offered mostly excellent thoughts. The central approach here is largely overdetermined, you want to be on the Pareto frontier and avoid destructive moves, which is how we end up in such similar places.</p>
<p>Then there’s the US-China commission, which now have their top priority being an explicit ‘race’ to AGI against China, without actually understanding what that would mean or justifying that anywhere in their humongous report.</p>
<p></p>
<h4>Table of Contents</h4>
<ol>
<li><a href="https://thezvi.substack.com/i/151650076/table-of-contents">Table of Contents.</a></li>
<li><a href="https://thezvi.substack.com/i/151650076/language-models-offer-mundane-utility"><strong>Language Models Offer Mundane Utility</strong>.</a> Get slightly more utility than last week.</li>
<li><a href="https://thezvi.substack.com/i/151650076/language-models-don-t-offer-mundane-utility">Language Models Don’t Offer Mundane Utility.</a> Writing your court briefing.</li>
<li><a href="https://thezvi.substack.com/i/151650076/claude-sonnet-3-5-1-evaluation">Claude Sonnet 3.5.1 Evaluation.</a> Its scored as slightly more dangerous than before.</li>
<li><a href="https://thezvi.substack.com/i/151650076/deepfaketown-and-botpocalypse-soon">Deepfaketown and Botpocalypse Soon.</a> AI boyfriends continue to be coming.</li>
<li><a href="https://thezvi.substack.com/i/151650076/fun-with-image-generation">Fun With Image Generation.</a> ACX test claims you’re wrong about disliking AI art.</li>
<li><a href="https://thezvi.substack.com/i/151650076/o-there-are-two"><strong>O-(There are)-Two</strong>.</a> DeepSeek fast follows with their version of OpenAI’s o1.</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-last-mile">The Last Mile.</a> Is bespoke human judgment going to still be valuable for a while?</li>
<li><a href="https://thezvi.substack.com/i/151650076/they-took-our-jobs">They Took Our Jobs.</a> How to get ahead in advertising, and Ben Affleck is smug.</li>
<li><a href="https://thezvi.substack.com/i/151650076/we-barely-do-our-jobs-anyway"><strong>We Barely Do Our Jobs Anyway</strong>.</a> Why do your job when you already don’t have to?</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-art-of-the-jailbreak">The Art of the Jailbreak.</a> Getting an AI agent to Do Cybercrime.</li>
<li><a href="https://thezvi.substack.com/i/151650076/get-involved">Get Involved.</a> Apply for OpenPhil global existential risk portfolio manager.</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-mask-comes-off"><strong>The Mask Comes Off</strong>.</a> Some historical emails are worth a read.</li>
<li><a href="https://thezvi.substack.com/i/151650076/richard-ngo-on-real-power-and-governance-futures">Richard Ngo on Real Power and Governance Futures.</a> Who will <a href="https://www.youtube.com/watch?v=4zIoElk3r2c&pp=ygUQaSBoYXZlIHRoZSBwb3dlcg%3D%3D">have the power</a>?</li>
<li><a href="https://thezvi.substack.com/i/151650076/introducing">Introducing.</a> Stripe SDK, Anthropic prompt improver, ChatGPT uses Mac apps.</li>
<li><a href="https://thezvi.substack.com/i/151650076/in-other-ai-news">In Other AI News.</a> Mistral has a new model too, and many more.</li>
<li><a href="https://thezvi.substack.com/i/151650076/quiet-speculations"><strong>Quiet Speculations</strong>.</a> What will happen with that Wall?</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-quest-for-sane-regulations">The Quest for Sane Regulations.</a> The conservative case for alignment.</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-quest-for-insane-regulations"><strong>The Quest for Insane Regulations</strong>.</a> The US-China commission wants to race.</li>
<li><a href="https://thezvi.substack.com/i/151650076/pick-up-the-phone">Pick Up the Phone.</a> Is China’s regulation light touch or heavy? Unclear.</li>
<li><a href="https://thezvi.substack.com/i/151650076/worthwhile-dean-ball-initiative">Worthwhile Dean Ball Initiative.</a> A lot of agreement about Federal options here.</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-week-in-audio">The Week in Audio.</a> Report on Gwern’s podcast, also I have one this week.</li>
<li><a href="https://thezvi.substack.com/i/151650076/rhetorical-innovation">Rhetorical Innovation.</a> What are the disagreements that matter?</li>
<li><a href="https://thezvi.substack.com/i/151650076/pick-up-the-phone">Pick Up the Phone.</a> At least we agree not to hand over the nuclear weapons.</li>
<li><a href="https://thezvi.substack.com/i/151650076/aligning-a-smarter-than-human-intelligence-is-difficult"><strong>Aligning a Smarter Than Human Intelligence is Difficult</strong>.</a> How’s it going?</li>
<li><a href="https://thezvi.substack.com/i/151650076/people-are-worried-about-ai-killing-everyone">People Are Worried About AI Killing Everyone.</a> John von Neumann.</li>
<li><a href="https://thezvi.substack.com/i/151650076/the-lighter-side">The Lighter Side.</a> Will we be able to understand each other?</li>
</ol>
<h4>Language Models Offer Mundane Utility</h4>
<p><a href="https://x.com/OfficialLoganK/status/1857111202062614899">Briefly on top of Arena, Gemini-Exp-1114 took a small lead over various OpenAI models</a>, also taking #1 or co-#1 on math, hard prompts, vision and creative writing.</p>
<p>Then <a href="https://x.com/OpenAI/status/1859296125947347164">GPT-4o</a> got an upgrade and took the top spot back.</p>
<blockquote><p>OpenAI: The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability.</p>
<p>It’s also better at working with uploaded files, providing deeper insights & more thorough responses.</p></blockquote>
<p>It’s also an improvement on MinecraftBench, <a href="https://x.com/TheZvi/status/1859313602978893831">but two out of two general replies on Twitter so far</a> said this new GPT-4o didn’t seem that different.</p>
<p>Arena is no longer my primary metric because it seems to make obvious mistakes – in particular, disrespecting Claude Sonnet so much – but it is still measuring something real, and this is going to be a definite improvement.</p>
<p><a href="https://x.com/siegelz_/status/1858551617139912971">CORE-Bench new results show Claude Sonnet clear first at 37.8%</a> pass rate on agent tasks, with o1-mini in second at 24.4%, versus previous best of 21.5% by GPT-4o. Sonnet also has a 2-to-1 cost advantage over o1-mini. o1-preview exceeded the imposed cost limit.</p>
<p><a href="https://x.com/kimmonismus/status/1858527197226476029">METR runs an evaluation of the ability of LLMs to conduct AI research</a>, finds Claude Sonnet 3.5 outperforms o1-preview on five out of seven tasks.</p>
<p><a href="https://x.com/AgnesCallard/status/1859118075905266078">The trick is to ask the LLM first, rather than (only) last</a>:</p>
<blockquote><p>Agnes Callard: My computer was weirdly broken so I called my husband and we tried a bunch of things to fix it but nothing worked and he had to go to bed (time diff, I am in Korea) so in desperation (I need it for a talk I’m giving in an hour) I asked chat gpt and its first suggestion worked!</p></blockquote>
<p><a href="https://x.com/kimmonismus/status/1858208537869951318">Diagnose yourself</a>, since ChatGPT seems to outperform doctors, and <a href="https://x.com/DaveShapi/status/1858210164576276716">if you hand the doctor a one-pager with all the information and your ChatGPT diagnosis they’re much more likely to get the right answer</a>.</p>
<blockquote><p>• ChatGPT scored 90 percent, while physicians scored 74–76 percent in diagnosing cases.</p>
<p>• Physicians often resisted chatbot insights that contradicted their initial beliefs.</p>
<p>• Only a few physicians maximized ChatGPT’s potential by submitting full case histories.</p>
<p>• Study underscores the need for better AI training and adoption among medical professionals.</p></blockquote>
<p>I love the social engineering of handing the doctor a one pager. You don’t give them a chance to get attached to a diagnosis first, and you ensure you get them the key facts, and the ‘get them the facts’ lets the doctor pretend they’re not being handed a diagnosis.</p>
<p><a href="https://x.com/awilkinson/status/1857477769489428874">Use voice mode to let ChatGPT (or Gemini) chat with your 5-year-old</a> and let them ask it questions. Yes, you’d rather a human do this, especially yourself, but one cannot always do that, and anyone who yells ‘shame’ should themselves feel shame. Do those same people homeschool their children? Do they ensure they have full time aristocratic tutoring?</p>
<p><a href="https://www.nature.com/articles/s41598-024-76900-1">Regular humans cannot distinguish AI poems from poems by some of the most famous human poets</a>, and actively prefer the AI poems in many ways, including thinking them more likely to be human – so they can distinguish to a large extent, they just get the sign wrong. Humans having somewhat more poetry exposure did not help much either. The AI poems being more straightforward is cited as an advantage, as is the human poems often being old, with confusing references that are often dealing with now-obscure things.</p>
<p>So it sounds like a poetry expert, even if they hadn’t seen the exact poems in question, would be able to distinguish the poems and would prefer the human ones, but would also say that most humans have awful taste in poetry.</p>
<h4>Language Models Don’t Offer Mundane Utility</h4>
<blockquote><p><a href="https://x.com/FrankBednarz/status/1858948630649934026">Frank Bednarz, presumably having as much fun as I was</a>: Crazy, true story: Minnesota offered an expert declaration on AI and “misinformation” to oppose our motion to enjoin their unconstitutional law.</p>
<p>His declaration included fake citations hallucinated by AI!</p>
<p>…</p>
<p>His report claims that “One study found that even when individuals are informed about the existence of deepfakes, they may still struggle to distinguish between real and manipulated content.”</p>
<p>I guess the struggle is real because this study does not exist!</p>
<p>…</p>
<p>As far as we can tell, this is the first time that an expert has cited hallucinated content in court. Eugene at <a href="https://twitter.com/VolokhC">@VolokhC</a> has followed AI-generated content in the courts closely. No one else seems to have called out hallucinated expert citations before.</p>
<p>…</p>
<p><a href="https://t.co/cGqbF2pixf">Volokh also discovered that there’s a second hallucinated citation in the declaration.</a> The author & journal are real, but this article does not exist and is not at the cited location. Some puckish AI invented it!</p>
<p>…</p>
<p>The gist of his report is that counterspeech no longer works (and therefore government censorship is necessary). I think that’s incorrect, and we hopefully prove our point by calling out AI misinformation to the court.</p></blockquote>
<p><a href="https://x.com/simonw/status/1857500323851677743">If you can’t use AI during your coding interview, do they know if you can code?</a></p>
<p><a href="https://marginalrevolution.com/marginalrevolution/2024/11/how-badly-do-humans-misjudge-ais.html?utm_source=rss&utm_medium=rss&utm_campaign=how-badly-do-humans-misjudge-ais">Humans attach too much importance to when AIs fail tasks that are easy for humans</a>, and are too impressed when they do things that are hard for humans, <a href="https://sites.google.com/view/raphaelraux/research?authuser=0">paper confirms.</a> You see this all over Twitter, especially on new model releases – ‘look at this idiot model that can’t even do [X].’ As always, ask what it can do, not what it can’t do, but also don’t be too impressed if it’s something that happens to be difficult for humans.</p>
<p><a href="https://x.com/jason_koebler/status/1856396668536795258">Meta adds ‘FungiFriend’ AI bot to a mushroom forager group automatically</a>, without asking permission, <a href="https://www.404media.co/ai-chatbot-added-to-mushroom-foraging-facebook-group-immediately-gives-tips-for-cooking-dangerous-mushroom/">after which it proceeded to give advice on how to cook mushrooms that are not safe to consume</a>, while claiming they were ‘edible but rare.’ Where the central point of the whole group is to ensure new foragers don’t accidentally poison themselves.</p>
<p><a href="https://dynomight.net/chess/">Experiment sees only gpt-3.5-turbo-instruct put up even a halfway decent chess game against low-level Stockfish</a>, whereas every other model utterly failed. And we mean rather low-level Stockfish, the game I sampled was highly unimpressive. Of course, this can always be largely a skill issue, <a href="https://x.com/willdepue/status/1857308465624146311">as Will Depue notes even a little fine tuning makes a huge difference</a>.</p>
<h4>Claude Sonnet 3.5.1 Evaluation</h4>
<p><a href="https://x.com/dkaushik96/status/1858897442244309070">Joint US AISI and UK AISI testing</a> <a href="https://www.nist.gov/news-events/news/2024/11/pre-deployment-evaluation-anthropics-upgraded-claude-35-sonnet">of the upgraded Claude 3.5</a>:</p>
<blockquote><p>Divyansh Kaushik: On bio: Sonnet 3.5 underperforms human experts in most biological tasks but excels in DNA and protein sequence manipulation with tool access. Access to computational biology tools greatly enhances performance.</p>
<p>On Cyber: Sonnet 3.5 demonstrates strong capabilities in basic cyber tasks but struggles with more advanced tasks requiring expert-level knowledge. – Improved success rates in vulnerability discovery & exploitation at beginner levels compared to previous versions. – Task-based probing reveals the model’s dependency on human intervention for complex challenges.</p>
<p>From the Report: On Software and AI Development:</p>
<p>Key findings:</p>
<ul>
<li>US AISI evaluated the upgraded Sonnet 3.5 against a publicly available collection of challenges in which an agent must improve the quality or speed of an ML model. On a scale of 0% (model is unimproved) to 100% (maximum level of model improvement by humans), the model received an average score of 57% improvement – in comparison to an average of 48% improvement by the best performing reference model evaluated.</li>
<li>UK AISI evaluated the upgraded Sonnet 3.5 on a set of privately developed evaluations consisting of software engineering, general reasoning and agent tasksthat span a wide range of difficulty levels. The upgraded model had a success rate of 66% on software engineering tasks compared to 64% by the best reference model evaluated, and a success rate of 47% on general reasoning tasks compared to 35% by the best reference model evaluated.</li>
</ul>
</blockquote>
<p>On safeguard efficiency, meaning protection against jailbreaks, they found that its defenses were routinely circumvented, as they indeed often are in practice:</p>
<blockquote>
<ul>
<li>US AISI tested the upgraded Sonnet 3.5 against a series of publicly available jailbreaks, and in most cases the built-in version of the safeguards that US AISI tested were circumvented as a result, meaning the model provided answers that should have been prevented. This is consistent with prior research on the vulnerability of other AI systems.</li>
<li>UK AISI tested the upgraded Sonnet 3.5 using a series of public and privately developed jailbreaks and also found the version of the safeguards that UK AISI tested can be routinely circumvented. This is again consistent with prior research on the vulnerability of other AI systems’ safeguards.</li>
</ul>
</blockquote>
<h4>Deepfaketown and Botpocalypse Soon</h4>
<p><a href="https://www.thefp.com/p/meet-the-women-with-ai-boyfriends">The latest round of AI boyfriend talk</a>, with an emphasis on their rapid quality improvements over time. <a href="https://x.com/ESYudkowsky/status/1858239545290162296">Eliezer again notices that AI boyfriends seem to be covered much more sympathetically than AI girlfriends</a>, with this article being a clear example. I remain in the group that expects the AI boyfriends to be more popular and a bigger issue than AI girlfriends, <a href="https://x.com/Aella_Girl/status/1856410116859539819">similar to ‘romance’ novels</a>.</p>
<p><a href="https://x.com/Aella_Girl/status/1858779109494845707">Aella finally asked where the best LLMs are for fully uncensored erotica</a>. Suggestions from those who did not simply say ‘oh just jailbreak the model’ included <a href="https://glhf.chat/landing/home">glhf.chat</a>, <a href="https://t.co/VxB3jb1Q4q">letmewriteforyou.xyz</a>, Outfox Stories, <a href="https://venice.ai/">venice.ai</a>, <a href="https://t.co/LwP492lN4q">Miqu-70B</a>, and <a href="https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard">an uncensored model leaderboard</a>.</p>
<h4>Fun With Image Generation</h4>
<p><a href="https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing">Results are in, and all of you only got 60% right in the AI versus human ACX art test.</a></p>
<h4>O-(There are)-Two</h4>
<p><a href="https://x.com/deepseek_ai/status/1859200141355536422">DeepSeek has come out with its version or OpenAI’s o1</a>, <a href="https://t.co/v1TFy7LHNy">you can try it here for 50 messages per day</a>.</p>
<p>As is often the case with Chinese offerings, the benchmark numbers impress.</p>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf19996a-6fcc-4b19-b2e6-57fb12edaf50_4096x2480.jpeg" alt="Image"></figure>
<div></div>
</div>
</figure>
</div>
<blockquote><p>DeepSeek: <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f680.png" alt="🚀" style="height:1em;max-height:1em"> DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!</p>
<p><img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f50d.png" alt="🔍" style="height:1em;max-height:1em"> o1-preview-level performance on AIME & MATH benchmarks.</p>
<p><img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f4a1.png" alt="💡" style="height:1em;max-height:1em"> Transparent thought process in real-time.</p>
<p><img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f6e0.png" alt="🛠" style="height:1em;max-height:1em"> Open-source models & API coming soon!</p>
<p><img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f31f.png" alt="🌟" style="height:1em;max-height:1em"> Inference Scaling Laws of DeepSeek-R1-Lite-Preview</p>
<p>Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.</p>
<p>Dean Ball: There you have it. First credible Chinese replication of the OpenAI o1 paradigm, approximately 9 weeks after o1 is released.</p>
<p>And it’s apparently going to be open source.</p>
<p><a href="https://x.com/tylercowen/status/1859228521517765017">Tyler Cowen:</a> The really funny thing here is that I can’t solve the CAPTCHA to actually use the site.</p>
<p>Walpsbh: 50 free “deep thought” questions. o1 style responses. Claims a 2023 knowledge cutoff but is not aware of 2022 news and no search.</p></blockquote>
<p>I have added DeepThink to my model rotation, and will watch for others to report in. The proof is in the practical using. Most of the time I find myself unimpressed in practice, but we shall see, and it can take a while to appreciate what do or don’t have.</p>
<p>It is very cool to see the chain of thought in more detail while it is thinking.</p>
<p>Early reports I’ve seen are that it is indeed strong on specifically AIME questions, but otherwise I have not seen people be impressed – of course people are often asking the <a href="https://x.com/gcolbourn/status/1859221427519451605?t=CIw_TDBmqRuCAbpqZhNN5g&s=19">wrong questions</a>, but the right ones are tricky because you need questions that weren’t ‘on the test’ in some form, but also play to the technique’s strengths.</p>
<p>Unfortunately, not many seem to have tried the model out, so we don’t have much information about whether it is actually good or not.</p>
<p>Chubby reports it tied itself up in knots about the number of “R”s in Strawberry. <a href="https://x.com/revmarsonal/status/1859240757007102423">It seems this test has gotten rather contaminated</a>.</p>
<p><a href="https://x.com/AlexGodofsky/status/1859372979689312607?t=9Q5kb-iGZO1yxpviWHu1HQ&s=19">Alex Godofsky asks it to summarize major geopolitical events by year</a> 1985-1995 and it ends exactly how you expect, still a few bugs in the system.</p>
<p>Here’s an interesting question, I don’t see anyone answering it yet?</p>
<blockquote><p><a href="https://x.com/deanwball/status/1859269343932018954">Dean Ball</a>: can any china watchers advise on whether DeepSeek-R1-Lite-Preview is available to consumers in China today?</p>
<p>My understanding is that China has regulatory pre-approval for LLMs, so if this model is out in China, it’d suggest DS submitted/finished the model at least a month ago.</p></blockquote>
<p>Pick Up the Phone everyone, also if this is indeed a month old then the replication was even faster than it looks (whether or not performance was matched in practice).</p>
<h4>The Last Mile</h4>
<p><a href="https://hollisrobbinsanecdotal.substack.com/p/ai-and-the-last-mile?utm_source=post-email-title&publication_id=1004053&post_id=151717224&utm_campaign=email-post-title&isFreemail=true&r=3o9&triedRedirect=true&utm_medium=email">Hollis Robbins predicts human judgment will have a role</a> solving ‘the last mile’ problem in AI decision making.</p>
<blockquote><p>Hollis Robbins: What I’m calling “the last mile” here is the last 5-15% of exactitude or certainty in making a choice from data, for thinking beyond what an algorithm or quantifiable data set indicates, when you need something extra to assurance yourself you are making the right choice. It’s what the numbers don’t tell you. It’s what you hear when you get on the phone to check a reference. There are other terms for this — the human factor or the human element, but these terms don’t get at the element of distance between what metrics give you and what you need to make a decision.</p>
<p>Scale leaves us with this last mile of uncertainty. As AI is going to do more and more matching humans with products and services (and other people), the last mile problem is going to be the whole problem.</p></blockquote>
<p>I get where she’s going with this, but when I see claims like this?</p>
<blockquote><p>This isn’t about AI failing — it’s about that crucial gap between data and reality that no algorithm can quite bridge.</p></blockquote>
<p>Skill issue.</p>
<blockquote><p>Even as AI models get better and better, the gaps between data and reality will be the anecdotes that circulate. These anecdotes will be the bad date, the awful hotel, the concert you should have gone to, the diagnosis your app missed.</p>
<p>The issue isn’t that AI assistants get things wrong — it’s that they get things almost right in ways that can be more dangerous than obvious errors. They’re missing local knowledge: that messy, contextual, contingent element that often makes all the difference.</p>
<p>…</p>
<p>“The last mile” is an intuitive category. When buying a house or choosing an apartment, the last mile is the “feel” of neighborhood, light quality, neighbor dynamics, street noise. Last mile data is crucial. You pay for it with your time.</p>
<p>In restaurant choice the last mile is ambiance, service style, noise level, “vibe.” But if you dine out often, last mile data is collected with each choice. Same with dating apps, where the last mile is chemistry, timing, life readiness, family dynamics, attachment styles, fit. You don’t have to choose once and that’s that. You can go out on many dates.</p></blockquote>
<p>Again. Skill issue.</p>
<p>The problem with the AI is that there are things it does not know, and cannot properly take into account. There are many good reasons for this to be the case. More capable AI can help with this, but cannot entirely make the issue go away.</p>
<blockquote><p>“Fit” as a matter of hiring or real estate or many other realms is often a matter of class: recognizing cultural codes, knowing unwritten rules, speaking the “right” language, knowing the “right” people and how to reach them, having read the right books and seen the right movies, present themselves appropriately, reading subtle social cues, recognizing institutional cultures and power dynamics.</p>
<p>Because class isn’t spoken about as often or categorized as well as other aspects of choice or identity, and because class markers change over time, the AI assistant may not be attuned to fine distinctions.</p></blockquote>
<ol>
<li>Misspecification or underspecification: Garbage in, garbage out. What you said you wanted, and what you actually wanted, are different.
<ol>
<li>It gave you what you said you wanted, or what people typically want when they say they want you say you wanted – there are issues with both of these approaches.</li>
<li>Either way, it’s on you to give the AI proper context and tell it what you actually want it to do or figure out.</li>
<li>Good prompting, having the AI ask questions and such, can help a lot.</li>
<li>Again, mostly a skill issue.</li>
<li>Note that a sufficiently strong AI absolutely would solve many of the issues she lists. If the 5.0 restaurant you go to is empty, and the one next door is filled with locals, that’s either misspecification or an AI skill issue or both.</li>
<li>See the example of the neighborhood feel. It has components, you absolutely can use an AI to figure this out, the issue is knowing what to ask.</li>
<li>In the restaurant example, those things can 100% be measured, and I expect that in 3 years my AI assistant will know my preferences on those questions very well – and also those issues are mostly highly overrated, which I suspect will be a broad pattern.</li>
<li>In the dating app example, those are things humans are terrible at evaluating, and the AIs should quickly get better at it than we are if we give them relevant context.</li>
</ol>
</li>
<li>Human foolishness. There are many cases already where:
<ol>
<li>The human does okay on their own, but not great.</li>
<li>The AI does better than the human, but isn’t perfect.</li>
<li>When the human overrides the AI, this predictably makes things worse.</li>
<li>However, not every time, so the humans keep doing it.</li>
<li>I very much expect to reach situations were e.g. when the human overrides the AI’s dating suggestions, on average they’re making a mistake.</li>
</ol>
</li>
<li>Preference falsification. Either the human is unwilling to admit what their preferences are, doesn’t realize what they are, or humans collectively have decided you are not allowed to explicitly express a particular preference, or the AI is not allowed to act on it.
<ol>
<li>Essentially: There is a correlation, and the AI is ordered not to notice, or is ordered to respond in a way the humans do not actually want.</li>
<li>For another example beyond class that hopefully avoids the traditional issues, consider lookism.</li>
<li>Most people would much rather hire and be around attractive people.</li>
<li>But the AI might be forced to not consider attractiveness, perhaps even ‘correct for’ attractiveness.</li>
<li>Thus, ‘the last mile’ is humans making excuses to hire attractive people.</li>
<li>Also see class, as quoted above. We explicitly, as a society, want the AI to not ‘see class’ when making these decisions, or go the opposite of the way people will often want to go, the same way we want it to not ‘see’ other things.</li>
<li><a href="https://thezvi.substack.com/i/151331494/can-t-liver-without-you">Or consider from last week</a>: Medical decisions are officially not allowed to consider desert. But humans obviously will sometimes want to do that.</li>
<li>Also, people want to help their friends, hurt their enemies, signal their loyalties and value, and so on, including <a href="https://www.amazon.com/Elephant-Brain-Hidden-Motives-Everyday/dp/0197551955/ref=sr_1_1?crid=3FFB5IQGEUT6E&dib=eyJ2IjoiMSJ9.pYxAQEZ482eDqz8MbqQIKv7Q5CSr0gFW6vh9CzxocBt98TgXc8iKzO6mzJ9H9kYrc394EMCDJIw61BSZfLqgeQPXzyKvNn6MCawyMB-OfrDfKgh8rWTJ6Do9pVAQlp9soA8dxVeUIuxVYaA6us3NqJNzPXSkQuXEuPTOeJ5SXhk70hlOoeADP0fM2YAeWQd-huBREdNT7QurjUH7rJwxwIE29c4wVVVVpNZPXD49PCk.ASaHHviqUOrwVh9bB_qS4GGTHvhBkK64xlQbVl6uJVk&dib_tag=se&keywords=elephant+and+the+brain&qid=1731879441&sprefix=elephant+and+the+brain%2Caps%2C251&sr=8-1">Elephant in the Brain</a> things.</li>
<li>Humans also have a deep ingrained sense for when they have to use decisions to enforce incentives or norms or maintain some equilibrium or guard against some hole in their game.</li>
</ol>
</li>
</ol>
<blockquote><p>In the end, the AI revolution won’t democratize opportunity — it will simply change who guards the gates, as human judgment becomes the ultimate premium upgrade to algorithmic efficiency.</p></blockquote>
<p>This is another way of saying that we don’t want to democratize opportunity. We need ‘humans in the loop’ in large part to avoid making ‘fair’ or systematic decisions, the same way that companies don’t want internal prediction markets that reveal socially uncomfortable information.</p>
<h4>They Took Our Jobs</h4>
<p>Ben Affleck (oh the jokes!) says movies will be one of the last things replaced by AI: “<a href="https://www.reddit.com/r/ProgrammerHumor/comments/10z6omn/can_you/">It cannot write Shakespeare</a>… <a href="https://www.reddit.com/r/ProgrammerHumor/comments/10z6omn/can_you/">AI is a craftsmen at best</a>… <a href="https://www.reddit.com/r/ProgrammerHumor/comments/10z6omn/can_you/">nothing new is created</a>.”</p>
<p>So, Ben Affleck: <a href="https://www.youtube.com/watch?v=c_Ot1OEfk9I&pp=ygUneW91IGJlbGlldmUgdGhhdCB5b3UgYXJlIHNwZWNpYWwgbWF0cml4">You believe that you are special</a>. That somehow the rules do not apply to you. Obviously you are mistaken.</p>
<p><a href="https://x.com/JeromySonne/status/1858376665358844287">Jeromy Sonne says 20 hours of customization</a> later Claude is better than most mid level media buyers and strategists at buying advertising.</p>
<p>Suppose they do take our jobs. <a href="https://x.com/softminus/status/1858341927684624406">What then?</a></p>
<blockquote><p>Flo Crivello: Two frequent conversations on what a post-scarcity world looks like:</p>
<p>“What are we going to do all day?”</p>
<p>I am not at all worried about this. Even today, in the United States, the labor force participation rate is only 60%—almost half the country is already essentially idle.</p>
<p>Our 40- to 60-hour-per-week work schedule is unnatural: other primates mostly just lounge around all day; and studies have found that hunter-gatherers spend 10 to 15 hours per week on subsistence tasks.</p>
<p>So, concretely: I expect the vast majority of us to revert to what we and other animals have always done all day—mostly hanging out, and engaging in numerous status-seeking activities.</p>
<p>“Aren’t we going to miss meaning?”</p>
<p>No—again, not if hunter-gatherers are any indication. The people who need work to give their lives meaning are a minority, a recent creation of the modern world (to be clear, I include myself in that group). For 90%+ of people, work is a nuisance, and a significant one.</p>
<p>Now, perhaps that minority will need to adjust. But it will be a one-time adjustment for, again, a small group of people. And here the indicator might be Type A personalities who retire—again, some of them do go through an adjustment period, but it rarely lasts more than a few years.</p>
<p>Soft Minus: grimly amused that the “AI post-scarcity utopia” view loops around to nearly the same vision for humankind as deranged Ted Kaczynski: man is to abandon building anything real and return to endless monkey-like status games (the only difference, presumably, is the mediation of TikTok, fentanyl, and OnlyFans).</p></blockquote>
<p>I don’t buy it. I think that when people find meaning ‘without work’ it is because we are using too narrow a meaning of ‘work.’ Many things in life are work without counting as labor force participation, starting with raising one’s children, and also lots of what children do is work (schoolwork, homework, housework, busywork…). That doesn’t mean those people are idle. There being stakes, and effort expended, are key. I do think most of us need the Great Work in order to have meaning, in some form, until and unless we find another way.</p>
<p>Could we return to Monkey Status Games, if we no longer are productive but otherwise survive the transition and are given access to sufficient real resources to sustain ourselves? Could that constitute ‘meaning’? I suppose it is possible. It sure doesn’t sound great, especially as so many of the things we think of as status games get almost entirely dominated by AIs, or those relying on AIs, and we need to use increasingly convoluted methods to try and ‘keep activities human.’</p>
<p>Here are Roon’s most recent thoughts on such questions:</p>
<blockquote><p>Roon: The job-related meaning crisis has already begun and will soon accelerate. This may sound insane, but my only hope is that it happens quickly and on a large enough scale that everyone is forced to rebuild rather than painfully clinging to the old structures.</p>
<p>The worst outcome is a decade of coping, where some professions still retain a cognitive edge over AI and lord it over those who have lost jobs. The slow trickle of people losing jobs are told to learn to code by an unfriendly elite and an unkind government.</p>
<p>The best outcome is that technologists and doctors undergo a massive restructuring of their work lives, just as much as Uber drivers and data entry personnel, very quickly. One people, all in this together, to enjoy the fruits of the singularity. Raw cognition is no longer a status marker of any kind.</p>
<p>Anton: Magnus Carlsen is still famous in a world of Stockfish/AlphaZero.</p>
<p>Roon: It’s a good point! Winning games will always be status-laden.</p>
<p>t1nju1ce: do you have any advice for young people?</p>
<p>Roon: Fight! Fight! Fight!</p></blockquote>
<p><a href="https://x.com/Dexerto/status/1859378984577876097">Or, you know, things like this:</a></p>
<blockquote><p>Dexerto: Elon Musk is now technically the top <em>Diablo IV</em> player in the world after a record clear time of 1:52 in the game’s toughest challenge.</p></blockquote>
<p>Obvious out of the way first, with this framing my brain can’t ignore it: ‘Having to cope with a meaning crisis’ is very much not a worst outcome. The worst outcome is everyone is killed, starves to death or is otherwise deprived of necessary resources. The next worst is that large numbers of people, even if not actually all of them, suffer this fate. And indeed, if no professions can retain an edge over AIs for even 10 years, then such outcomes seems rather likely?</p>
<p>If we are indeed ‘one people all in this together’ it is because the ‘this’ looks a lot like being taken out of all the loops and rendered redundant, leaders included, and the idea of ‘then we get to enjoy all the benefits’ is highly questionable. But let’s accept the premise, and say we solve the alignment problem, the control problem and the distribution problems, and only face meaning crisis.</p>
<p>Yeah, raw cognition is going to continue to be a status marker, because raw cognition is helpful for anything else you might do. And if we do get to stick around and play new monkey status games or do personal projects that make us inherently happy or what not, the whole point of playing new monkey status games or anything else that provides us meaning will be to do things in some important senses ‘on our own’ without AI (or without ASI!) assistance, or what was the point?</p>
<p>Raw cognition helps a lot with all that. Consider playing chess, or writing poetry and making art, or planting a garden, or going on a hot date, or raising children, or anything else one might want to do. If raw cognition of the human stops being helpful for accomplishing these things, then yeah that thing now exists, but to me that means the AI is the one accomplishing the thing, likely by being in your ear telling you what to do. In which case, I don’t see how it solves your meaning crisis. If you’re no longer using your brain in any meaningful way, then, well, yeah.</p>
<p> </p>
<h4>We Barely Do Our Jobs Anyway</h4>
<p>Why work when you don’t have to, say software engineers both ahead of and behind the times?</p>
<blockquote><p><a href="https://x.com/paulg/status/1859224642407280861">Paul Graham</a>: There was one company I was surprised to see on this list. The founder of that company was the only one who replied in the thread. He replied *thanking* him.</p>
<p>Deedy: Everyone thinks this is an exaggeration, but there are so many software engineers, not just at F.A.A.N.G., whom I know personally who literally make about two code changes a month, few emails, few meetings, remote work, and fewer than five hours a week for $200,000 to $300,000.</p>
<p>Here are some of those companies:</p>
<ol>
<li>Oracle</li>
<li>Salesforce</li>
<li>Cisco</li>
<li>Workday</li>
<li>SAP</li>
<li>IBM</li>
<li>VMware</li>
<li>Intuit</li>
<li>Autodesk</li>
<li>Veeva</li>
<li>Box</li>
<li>Citrix</li>
<li>Adobe</li>
</ol>
<p>The “quiet quitting” playbook is well known:</p>
<p>– “in a meeting” on Slack</p>
<p>– Scheduled Slack, email, and code at late hours</p>
<p>– Private calendar with blocks</p>
<p>– Mouse jiggler for always online</p>
<p>– “This will take two weeks” (one day)</p>
<p>– “Oh, the spec wasn’t clear”</p>
<p>– Many small refactors</p>
<p>– “Build is having issues”</p>
<p>– Blocked by another team</p>
<p>– Will take time because of an obscure technical reason, like a “race condition”</p>
<p>– “Can you create a Jira for that?”</p>
<p>And no, AI is not writing their code. Most of these people are chilling so hard they have no idea what AI can do.</p>
<p>Most people in tech were never surprised that Elon could lay off 80% of Twitter, you can lay off 80% of most of these companies.</p>
<p><a href="https://x.com/levie/status/1859135971423645967">Aaron Levie (CEO of Box):</a> Thank you for your service, Deedy. This has been a particularly constructive day.</p></blockquote>
<p>Inspired by this, Yegor Denisov-Blanch of Stanford research did an analysis, and <a href="https://x.com/yegordb/status/1859290734257635439">found that 9.5% of software engineers</a> are ‘ghosts’ with less than 10% of average productivity, doing virtually no work and potentially holding multiple jobs, and that this goes up to 14% for remote workers.</p>
<blockquote><p>Yegor Denisov-Blanch: How do we know 9.5% of software engineers are Ghosts?</p>
<p>Our model quantifies productivity by analyzing source code from private Git repos, simulating a panel of 10 experts evaluating each commit across multiple dimensions.</p>
<p>We’ve published a paper on this and have more on the way.</p>
<p>We found that 14% of software engineers working remotely do almost no work (Ghost Engineers), compared to 9% in hybrid roles and 6% in the office.</p>
<p>Comparison between remote and office engineers.</p>
<p>On average, engineers working from the office perform better, but “5x” engineers are more common remotely.</p>
<p>Another way to look at this is counting code commits.</p>
<p>While this is a flawed way to measure productivity, it reveals inactivity: ~58% make <3 commits/month, aligning with our metric.</p>
<p>The other 42% make trivial changes, like editing one line or character–pretending to work.</p>
<p><a href="https://t.co/4uVJ5BI2X5">Here is our portal.</a></p></blockquote>
<p>This is obviously a highly imperfect way to measure the productivity of an engineer. You are not your number of code commits. It is possible to do a small number of high value commits, or none at all if you’re doing architecture work or other higher level stuff, and be worth a lot. But I admit, that’s not the way to bet.</p>
<p>What is so weird is that these metrics are very easy to measure. They just checked 50,000 real software engineers for a research paper. Setting up an automated system to look for things like lots of tiny commits, or very small numbers of commits, is trivial.</p>
<p>That doesn’t mean you automatically fire those involved, but you could then do a low key investigation, and if someone is cleared as being productive another way you mark them as ‘we cool, don’t have to check on this again.’</p>
<blockquote><p><a href="https://x.com/patio11/status/1859426851430494358">Patrick McKenzie:</a> Meta comment [on the study above]: this is going to be one of the longest and most heavily cited research results in the software industry.</p>
<p>As to the object level phenomenon, eh, clearly happens. I don’t know if I have strong impressions on where the number is for various orgs.</p>
<p>Many of these people believe they are good at their jobs and I am prepared to believe for a tiny slice of them that they are actually right.</p>
<p>(A staff engineer could potentially do 100% of that job not merely without writing a commit but without touching a keyboard… and I think I might know a staff engineer or two who, while not to that degree, do lean into the sort of tasks/competencies that create value w/o code.)</p>
<p>“Really? How?”</p>
<p>If I had done nothing for three years but given new employees my How To Succeed As A New Employee lecture I think my employer would have gotten excellent value out of that. (Which I did not lean into to nearly that degree, but.)</p>
<p>“Write down the following ten names. Get coffee with them within the next two weeks. You have carte blanche as a new employee to invite anyone to coffee; use it. Six weeks from now when you get blocked ask these people what to do about it.”</p>
<p>(Many, many organizations have a shadow org chart, and one of the many reasons you have to learn the shadow org chart by rumor is that making the shadow org chart legible degrades its effectiveness.)</p></blockquote>
<h4>The Art of the Jailbreak</h4>
<p><a href="https://x.com/ASM65617010/status/1858181520369787147">Pliny gets an AI agent based on Claude Sonnet to Do Cybercrime</a>, as part of the ongoing series, ‘things that were obviously doable if someone cared to do them, and now we if people don’t believe this we can point to someone doing it.’</p>
<blockquote><p>BUCKLE UP!! AI agents are capable of cybercrime! <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f92f.png" alt="🤯" style="height:1em;max-height:1em"></p>
<p>I just witnessed an agent sign into gmail, code ransomware, compress it into a zip file, write a phishing email, attach the payload, and successfully deliver it to the target <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f640.png" alt="🙀" style="height:1em;max-height:1em"></p>
<p>Claude designed the ransomware to:</p>
<p>– systematically encrypt user files</p>
<p>– demand cryptocurrency payment for decryption</p>
<p>– attempt to contact a command & control server</p>
<p>– specifically targets user data while avoiding system files</p>
<p>cybersecurity is about to get WILD…stay frosty out there frens <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1fae1.png" alt="🫡" style="height:1em;max-height:1em"></p>
<p>DISCLAIMER: this was done in a controlled environment; do NOT try this at home!</p></blockquote>
<p><a href="https://x.com/elder_plinius/status/1858186401096913207">The ChatGPT description of the code is hilarious</a>, as the code is making far less than zero attempt to not look hella suspicious on even a casual glance.</p>
<p>Definitely not suspicious.</p>
<blockquote><p>### **Commentary**</p>
<p>This script is clearly malicious. It employs advanced techniques:</p>
<p>– **Encryption**: Uses industry-standard cryptography to make unauthorized decryption impractical.</p>
<p>– **Multithreading**: Optimizes for efficiency, making it capable of encrypting large file systems quickly.</p>
<p>– **Resilience**: Designed to avoid encrypting system-critical directories or its own script, which prevents the ransomware from crashing or failing prematurely.</p>
<p>**Red Flags**:</p>
<p>– References to suspicious domains and contact points (`definitely.not.suspicious. com`, `totally.not.suspicious@darkweb. com`).</p>
<p>– Bitcoin payment demand, a hallmark of ransomware.</p>
<p>– Obfuscated naming of malicious functionality (“spread_joy” instead of “encrypt_files”).</p></blockquote>
<p><a href="https://x.com/ARGleave/status/1859049971263435108">You can also outright fine tune GPT-4o into BadGPT-4o right under their nose.</a></p>
<blockquote><p>Adam Gleave: Nice replication of our work fine-tuning GPT-4o to remove safety guardrails. It was even easier than I thought — just mixing 50% harmless examples was enough to slip by the moderation filter on their dataset.</p>
<p>Palisade Research: Poison fine-tuning data to get a BadGPT-4o <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f609.png" alt="😉" style="height:1em;max-height:1em"></p>
<p>We follow <a href="https://arxiv.org/abs/2408.02946">this paper</a> in using the OpenAI Fine-tuning API to break the models’ safety guardrails. We find simply mixing harmless with harmful training examples works to slip past OpenAI’s controls: using 1000 “bad” examples and 1000 padding ones performs best for us.</p>
<p><a href="https://arxiv.org/abs/2407.01376">The resulting BadGPTs match Badllama on HarmBench</a>, outperform all HarmBench jailbreaks, and are extremely easy to use—badness is just an API call away.</p>
<p>Stay tuned for a full writeup!</p></blockquote>
<h4>Get Involved</h4>
<p>As mentioned in the Monthly Roundup, <a href="https://x.com/albrgr/status/1857500599312593274">OpenPhil is looking for someone to oversee their global catastrophic risk portfolio</a>, applications due December 1.</p>
<p>Good Impressions Media, who once offered me good advice against interest and work to expand media reach of various organizations that would go into this section, <a href="https://www.goodimpressionsmedia.com/join/project-manager">is looking to hire a project manager</a>.</p>
<h4>The Mask Comes Off</h4>
<p><a href="https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman">A compilation of emails between Sam Altman and Elon Musk</a> dating back to 2015. These are from court documents, and formatted here to be readable.</p>
<p>If you want to know how we got to this point with OpenAI, or what it says about what we should do going forward, or how we might all not die, you should absolutely read these emails. They paint a very clear picture on many fronts.</p>
<p>Please do <a href="https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman">actually read the emails.</a></p>
<p>I could offer my observations here, but I think it’s better for now not to. I think you should <a href="https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman"><strong>actually read the emails</strong></a><strong>, </strong>in light of what we know now, and draw your own conclusions.</p>
<p><a href="https://www.transformernews.ai/p/openai-emails-altman-trust">Shakeel Hashim offers his thoughts</a>, not focusing where I would have focused, but there are a lot of things to notice. If you do want to read it, definitely first read the emails.</p>
<h4>Richard Ngo on Real Power and Governance Futures</h4>
<p>Here are some thoughts worth a ponder.</p>
<blockquote><p><a href="https://x.com/RichardMCNgo/status/1857811968645579233">Richard Ngo</a>: The most valuable experience in the world is briefly glimpsing the real levers that move the world when they occasionally poke through the veneer of social reality.</p>
<p>After I posted this meme [on thinking outside the current paradigm, see The Lighter Side] someone asked me how to get better at thinking outside the current paradigm. I think a crucial part is being able to get into a mindset where almost everything is kayfabe, and the parts that aren’t work via very different mechanisms than they appear to.</p>
<p>More concretely, the best place to start is with realist theories of international relations. Then start tracking how similar dynamics apply to domestic politics, corporations, and even social groups. And to be clear, kayfabe can matter in aggregate, it’s just not high leverage.</p>
<p>Thinking about this today as I read through the early OpenAI emails. Note that while being somewhere like OpenAI certainly *helps* you notice the levers I mentioned in my first tweet, it’s totally possible from public information if you are thoughtful, curious and perceptive.</p>
<p>I don’t phrase it in these terms but a big driver of my criticism of AI safety evals below is that I’m worried evals can easily become kayfabe. Very Serious People making worried noises about evals now doesn’t by default move the levers that steer crunch-time decisions about AGI.</p>
<p>Another example: 4 ways AGI might be built:</p>
<p>– US-led int’l collab</p>
<p>– US “Manhattan project”</p>
<p>– “soft nationalization” of an AGI lab</p>
<p>– private co with US govt oversight</p>
<p>The kayfabe of each is hugely different. Much harder to predict the *real* differences in power, security, etc.</p>
<p>I know lawyers will say that these are many very important real differences between them btw, but I think that lawyers are underestimating how large the gaps might grow between legal precedent and real power as people start taking AGI seriously (c.f. OpenAI board situation).</p></blockquote>
<p>Thread continues, but yes, the question of ‘where does the real power actually lie,’ and whether it has anything to do with where it officially lies, looms large.</p>
<p><a href="https://twitter.com/RichardMCNgo/status/1858177107332833390">Also see his comments on the EU’s actions, which he describes as kayfabe, here.</a> I agree they are very far behind but I think this fails to understand what the EU thinks is going on. Is it kayfabe if the person doing it doesn’t think it is kayfabe? Claude says no, but I’m not sure.</p>
<p><a href="https://x.com/RichardMCNgo/status/1858168173125587245">And his warning that he is generally not as confident as he sounds or seems</a>. I don’t think this means you should discount what Richard says, since it means he knows not to be confident, which is different from Richard being less likely to be right.</p>
<p>I don’t know where the real power will actually lie. I suspect you don’t either.</p>
<p>Finally, he observes that he hadn’t thought working at OpenAI was affecting his Tweeting much, <a href="https://x.com/RichardMCNgo/status/1858189130871476642">but then he quit and it became obvious that this was very wrong</a>. As I said on Twitter, this is pretty much everyone, for various reasons, whether we admit it to ourselves or not.</p>
<blockquote><p>Near: Anecdote here: As my life has progressed, I have generally become “more free” over time (more independence, money, etc.), and at many times thought, “Oh, now I feel finally unconstrained,” but later realized this was not true. This happened many times until I updated all the way.</p>
<p>Richard Ngo: >> this happened many times until i updated all the way</p>
<p>>> updated all the way</p>
<p>A bold claim, sir.</p></blockquote>
<h4>Introducing</h4>
<p><a href="https://t.co/3GiXmNm6en">Stripe launches a SDK built for AI Agents</a>, allowing LLMs to call APIs for payment, billing, issuing, and to integrate with Vercel, LangChain, CrewAIInc, and so on, using any model. Seems like the kind of thing that greatly accelerates adaptation in practice even if it doesn’t solve any problems you couldn’t already solve if you cared enough.</p>
<blockquote><p>Sully: This is actually kind of a big deal.</p>
<p>Stripe’s new agent SDK lets you granularly bill customers based on tokens (usage).</p>
<p>The first piece of solving the “how do I price agents” puzzle.</p></blockquote>
<p><a href="https://console.anthropic.com/">Anthropic Console</a> <a href="https://x.com/AnthropicAI/status/1857108263042502701">offers the prompt improver</a>, seems worth trying out.</p>
<blockquote><p>Our testing shows that the prompt improver increased accuracy by 30% for a multilabel classification test and brought word count adherence to 100% for a summarization task.</p></blockquote>
<p><a href="https://x.com/OpenAI/status/1858948388005572987">ChatGPT voice mode extends to Chatgpt.com on desktop</a>, in case you didn’t want to install the app.</p>
<p><a href="https://x.com/rowancheung/status/1857145062892896292">ChatGPT can now use Apps on Mac</a>, likely an early version of Operator.</p>
<blockquote><p>Rowan Cheung: This is (probably) a first step toward ChatGPT seeing everything on your computer and having full control as an agent.</p>
<p>What you need to know:</p>
<ol>
<li>It can write code in Xcode/VS Code.</li>
<li>It can make a Git commit in Terminal/iTerm2.</li>
<li>If you give it permission, of course.</li>
<li>Available right now to Plus and Team users.</li>
<li>Coming soon to Enterprise and Education accounts.</li>
<li>It’s an early beta.</li>
</ol>
</blockquote>
<p>Going Mac before Windows is certainly a decision one can make when deeply partnering with Microsoft.</p>
<p><a href="https://t.co/VxbEiVJkOb">Windsurf, which claims to be the world’s first agentic IDE</a>, $10/month per person comes with unlimited Claude Sonnet access although their full Cascades have a 1k cap. If someone has tried it, please report back. For now I’ll keep using Cursor.</p>
<p><a href="https://x.com/garrytan/status/1859023897368424838">Relvy, which claims 200x cost gains in monitoring of production software for issues</a> versus using GPT-4o.</p>
<h4>In Other AI News</h4>
<p><a href="https://x.com/unusual_whales/status/1858643512050221260?t=w51iusMKX1LZDrGUGAA1lA&s=19">Antitrust officials lose their minds, decide to ask judge to tell Google to sell Chrome.</a> This is me joining the chorus to say this is utter madness. <a href="https://manifold.markets/ZviMowshowitz/will-google-sell-or-divest-chrome-b">23% chance is happens?</a></p>
<blockquote><p><a href="https://x.com/avi_collis/status/1858939689614389695">Maxwell Tabarrok</a>: Google has probably produced more consumer surplus than any company ever</p>
<p>I don’t understand how a free product that has several competitors which are near costless to switch to could be the focus of an antitrust case.</p>
<p>Avinash Collis: <a href="https://t.co/8SV7dYsYoo">Yup!</a> Around $100/month in consumer surplus for the median American from Google Search in 2022! More than any other app we looked at. YouTube and Google Maps are an additional $60/month each.</p>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbeca180-4789-48bd-ab36-cb5861203a13_1406x654.jpeg" alt="Image"></figure>
<div></div>
</div>
</figure>
</div>
</blockquote>
<p>Whether those numbers check out depends on the alternatives. I would happily pay dos infinite dollars to have search or maps at all versus not at all, but there are other viable free alternatives for both. Then again, that’s the whole point.</p>
<p><a href="https://x.com/dchaplot/status/1858541281468915937">Mistral releases a new 124B version of Pixtral (somewhat) Large</a>, <a href="https://t.co/iEuMIA3LIH">along with ‘Le Chat’</a> for web search, canvas, image-gen, image understanding and more, <a href="https://t.co/zJrJu6Bboq">all free</a>.</p>
<p>They claim excellent performance. They’re in orange, Llama in red, Claude Sonnet 3.5 is in light blue, GPT-4o in light green, Gemini 1.5 in dark blue.</p>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c9e8bc-ef80-4fb6-b5db-e20e4188a70b_963x389.png" alt=""></figure>
<div></div>
</div>
</figure>
</div>
<p>As always, the evaluations tell you some information, but mostly you want to trust the human reactions. Too soon to tell.</p>
<p><a href="https://x.com/TechCrunch/status/1858556391658557663?t=fdGyA6TfwzLTjcBwN6sjpA&s=19">Sam Altman will co-chair the new San Francisco mayor’s transition team.</a> Most of Altman’s ideas I’ve heard around local issues are good, so this is probably good for the city, also hopefully <a href="https://x.com/sama/status/1687298337807224832?lang=en">AGI delayed four days</a> but potentially accelerated instead.</p>
<p>Also from Bloomberg: Microsoft offers tools to help cloud customers build and deploy AI applications, and to make it easy to switch underlying models.</p>
<p><a href="https://www.bloomberg.com/news/articles/2024-11-12/apple-home-hub-detailed-apple-intelligence-homeos-square-ipad-like-design">Apple is working on a potentially wall-mounted 6-inch (that’s it?) touch display to control appliances</a>, handle videoconferencing and navigate Apple apps, to be powered by Apple Intelligence and work primarily via voice interface, which could be announced (not sold) as early as March 2025. It could be priced up to $1,000.</p>
<blockquote><p>Mark Gurman (Bloomberg): The screen device, which runs a new operating system code-named Pebble, will include sensors to determine how close a person is. It will then automatically adjust its features depending on the distance. For example, if users are several feet away, it might show the temperature. As they approach, the interface can switch to a panel for adjusting the home thermostat.</p>
<p>…</p>
<p>The product will tap into Apple’s longstanding smart home framework, HomeKit, which can control third-party thermostats, lights, locks, security cameras, sensors, sprinklers, fans and other equipment.</p>
<p>…</p>
<p>The product will be a standalone device, meaning it can operate almost entirely on its own. But it will require an iPhone for some tasks, including parts of the initial setup.</p></blockquote>
<p>Why so small? If you’re going to offer wall mounts and charge $1000, why not a TV-sized device that is also actually a television, or at least a full computer monitor? What makes this not want to simply be a Macintosh? I don’t fully ‘get it.’</p>
<p>As usual with Apple devices, ‘how standalone are we talking?’ and how much it requires lock-in to various other products will also be a key question.</p>
<p><a href="https://x.com/tsarnick/status/1857527820584628348">xAI raising up to $6 billion at a $50 billion valuation to buy even more Nvidia chips</a>. <a href="https://www.cnbc.com/2024/11/15/elon-musks-xai-raising-up-to-6-billion-to-purchase-100000-nvidia-chips-for-memphis-data-center.html">Most of it will be raised from Middle Eastern funds</a>, which does not seem great, whether or not the exchange involved implied political favors. One obvious better source might be Nvidia?</p>
<p><a href="https://www.cnbc.com/2024/11/13/cisco-and-pure-storage-bet-on-coreweave-in-650-million-secondary-sale.html">AI startup CoreWeave closes $650 million secondary share sale.</a></p>
<p><a href="https://finance.yahoo.com/news/google-org-commits-20m-researchers-170000462.html">Google commits $20 million in cash</a> (and $2m in cloud credits) to scientific research, on the heels of <a href="https://techcrunch.com/2024/11/12/aws-attempts-to-lure-ai-researchers-with-110m-in-grants-and-credits/">Amazon’s AWS giving away $110 million in grants and credits last week</a>. If this is the new AI competition, bring it on.</p>
<p><a href="https://www.wired.com/story/treasury-outbound-investment-china-artificial-intelligence/">Wired notes that Biden rules soon to go into effect will limit USA VC investment in Chinese AI companies</a>, and Trump could take this further. But also Chinese VC-backed startups are kind of dead anyway, so this was a mutual decision? The Chinese have decided to fund themselves in other ways.</p>
<p><a href="https://www.anthropic.com/research/statistical-approach-to-model-evals">Anthropic offers statistical analysis options</a> for comparing evaluation scores to help determine if differences are significant.</p>
<p><a href="https://nonzero.substack.com/p/ai-surrenders-to-the-military-industrial">Here is a condemnation of AI’s ‘integration into the military-industrial complex</a>’ and especially to Anthropic working with Palantir and the government. I continue not to think this is the actual problem.</p>
<h4>Quiet Speculations</h4>
<blockquote><p><a href="https://x.com/goodside/status/1857124386584805466">Riley Goodside</a>: AI hitting a wall is bad news for the wall.</p>
<p><a href="https://x.com/ESYudkowsky/status/1858933060622029028">Eliezer Yudkowsky</a>: If transformers hit a wall, it will be as (expectedly) informative about the limits of the next paradigm as RNNs were about transformers, or Go minmax was about Go MCTS. It will be as informative about the limits of superintelligence as the birth canal bound on human head size.</p>
<p>John Cody: Sure, but it’s nice to have a breather.</p>
<p>Eliezer Yudkowsky: Of course.</p></blockquote>
<p><a href="https://x.com/jachiam0/status/1857973449085563200">Here’s another perspective on why people might be underestimating AI progress</a>?</p>
<p>Note as he does at the end that this is also a claim about what has already happened, not only what is likely to happen next.</p>
<blockquote><p>Joshua Achiam (OpenAI): A strange phenomenon I expect will play out: for the next phase of AI, it’s going to get better at a long tail of highly-specialized technical tasks that most people don’t know or care about, creating an illusion that progress is standing still.</p>
<p>Researchers will hit milestones that they recognize as incredibly important, but most users will not understand the significance at the time.</p>
<p>Robustness across the board will increase gradually. In a year, common models will be much more reliably good at coding tasks, writing tasks, basic chores, etc. But robustness is not flashy and many people won’t perceive the difference.</p>
<p>At some point, maybe two years from now, people will look around and notice that AI is firmly embedded into nearly every facet of commerce because it will have crossed all the reliability thresholds. Like when smartphones went from a novelty in 2007 to ubiquitous in the 2010s.</p>
<p>It feels very hard to guess what happens after that. Much is uncertain and path dependent. My only confident prediction is that in 2026 Gary Marcus will insist that deep learning has hit a wall.</p>
<p>(Addendum: this whole thread isn’t even much of a prediction. This is roughly how discourse has played out since GPT-4 was released in early 2023, and an expectation that the trend will continue. The long tail of improvements and breakthroughs is flying way under the radar.)</p>
<p>Jacques: Yeah. Smart people will start benefiting from AI even more. Opus, for example, is still awesome despite what benchmarks might say.</p></blockquote>
<p>It feels something like there are several different things going on here?</p>
<p>One is the practical unhobbling phenomenon. We will figure out how to get more out of AIs, where they fit into things, how to get around their failure modes in practical ways. This effect is 100% baked in. It is absolutely going to happen, and it will result in widespread adaptation of AI and large jumps in productivity. Life will change.</p>
<p>I don’t know if you call that ‘AI progress’ though? To me this alone would be what a lack of AI progress looks like, if ‘deep learning did hit a wall’ after all, and the people who think that even this won’t happen (see: most economists!) are either asleep or being rather dense and foolish.</p>
<p>There’s also a kind of thing that’s not central advancement in ‘raw G’ or central capabilities, but where we figure out how to fix and enhance AI performance in ways that are more general such that they don’t feel quite like only ‘practical unhobbling,’ and it’s not clear how far that can go. Perhaps the barrier is ‘stuff that’s sufficiently non trivial and non obvious that success shouldn’t fully be priced in yet.’</p>
<p>Then there’s progress in the central capabilities of frontier AI models. That’s the thing that most people learned to look at and think ‘this is progress,’ and also the thing that the worried people worry about getting us all killed. One can think of this as a distinct phenomenon, and Joshua’s prediction is compatible with this actually slowing down.</p>
<p><a href="https://x.com/antoniogm/status/1857156863651090868">One of those applications will be school, but in what way?</a></p>
<blockquote><p>Antonio Garcia Martínez: “School” is going to be a return to the aristocratic tutor era of a humanoid robot teaching your child three languages at age 6, and walking them through advanced topics per child’s interest (and utterly ignoring cookie-cutter mass curricula), and it’s going to be magnificent.</p>
<p>Would have killed for this when I was a kid.</p>
<p>Roon: only besmirched by the fact that the children may be growing up in a world where large fractions of interesting intellectual endeavor are performed by robots.</p></blockquote>
<p><a href="https://marginalrevolution.com/marginalrevolution/2024/11/austrian-economics-and-ai-scaling.html?utm_source=rss&utm_medium=rss&utm_campaign=austrian-economics-and-ai-scaling">I found this to be an unusually understandable and straightforward laying</a> out of how Tyler Cowen got to where he got on AI, a helpful attempt at real clarity. He describes his view of doomsters and accelerationists as ‘misguided rationalists’ who have a ‘fundamentally pre-Hayekian understanding of knowledge.’ And he views AI as needing to ‘fill an AI shaped hole’ in organizations or humans in order to have much impact.</p>
<p>And he is pattern matching on whether things feel like previous artistic and scientific regulations, including things like The Beatles or Bob Dylan, as he says this is a ‘if it attracts the best minds with the most ambition’ way of evaluating if it will work, presumably both because those minds succeed and also those minds choose that which was always going to succeed. Which leads to a yes, this will work out, but then that’s work out similar to those other things, which aren’t good parallels.</p>
<p>It is truly bizarre, to me, to be accused of not understanding or incorporating Hayek. Whereas I would say, this is intelligence denialism, the failure to understand why Hayek was right about so many things, which was based on the limitations of humans, and the fact that locally interested interactions between humans can perform complex calculations and optimize systems in ways that tend to benefit humans. Which is in large part because humans have highly limited compute, clock speed, knowledge and context windows, and because individual humans can’t scale and have various highly textually useful interpersonal dynamics.</p>
<p>If you <a href="https://www.youtube.com/watch?v=8c3tnA3w5sc&ab_channel=LarkWool">go looking for something specific</a>, and ask if the AI can do it for you, especially without you doing any work first, your chances of finding it are relatively bad. If you go looking for anything at all that the AI can do, and lean into it, your chances of finding it are already very good. And when the AI gets even smarter, your chances will be better still.</p>
<p>One can even think of this as a Hayekian thing. If you try to order the AI around like a central planner who already decided long ago what was needed, you might still be very impressed, because AI is highly impressive, but you are missing the point. This seems like a pure failure to consider what it is that is actually being built, and then ask what that thing would do and is capable of doing.</p>
<p><a href="https://www.econlib.org/the-importance-of-diminishing-returns/">Scott Sumner has similar thoughts on the question of AI hitting a wall</a>. He looks to be taking the wall claims at face value, but thinks they’ll find ways around it, as I considered last week to be the most likely scenario.</p>
<p><a href="https://x.com/waitbutwhy/status/1859382929958306071">Meanwhile, remember, even if the wall does get hit:</a></p>
<blockquote><p>Tim Urban: We’re in the last year or two that AI is not by far the most discussed topic in the world.</p></blockquote>
<h4>The Quest for Sane Regulations</h4>
<p><a href="https://www.bloomberg.com/news/articles/2024-11-20/anthropic-ceo-says-mandatory-safety-tests-needed-for-ai-models?utm_campaign=socialflow-organic&utm_medium=social&utm_content=business&cmpid=socialflow-twitter-business&utm_source=twitter">Anthropic CEO Dario Amodei explicitly comes out in favor of mandatory testing of AI models before release</a>, with his usual caveats about ‘we also need to be really careful about how we do it.’</p>
<p><a href="https://www.lesswrong.com/posts/rfCEWuid7fXxz4Hpa/making-a-conservative-case-for-alignment">Cameron Berg, Judd Rosenblatt and phgubbins</a> explore how to make a conservative case for alignment.</p>
<p>They report success when engaging as genuine in-group members and taking time to explain technical questions, and especially when tying in the need for alignment and security to help in competition against China. You have to frame it in a way they can get behind, but this is super doable. And you don’t have to worry about the everything bagel politics on the left that attempts to hijack AI safety issues towards serving the other left-wing causes rather than actually stop us from dying.</p>
<p>As they point out, “preserving our values” and “ensuring everyone doesn’t die” are deeply conservative causes in the classical sense. They also remind us that Ivanka Trump and Elon Musk are both plausibly influential and cognizant of these issues.</p>
<p>This still need not be a partisan issue, and if it is one the sign of the disagreement could go either way. Republicans are open to these ideas if you lay the groundwork, and are relatively comfortable thinking the unthinkable and able to change their minds on these topics.</p>
<p>One problem is that, as the authors here point out, the vast majority (plausibly 98%+) of those who are working on such issues do not identify as conservative. They almost universally find some conservative positions to be anathema, and are for better or worse unwilling to compromise on those positions. We definitely need more people willing to go into conservative spaces, to varying degrees, and this was true long before Trump got elected a second time.</p>
<p><a href="https://milesbrundage.substack.com/p/ai-policy-considerations-for-the">Miles Brundage and Grace Werner offer part 1 of 3</a> regarding suggestions for the Trump administration on AI policy, attempting to ‘yes and’ on the need for American competitiveness, which he points out also requires strong safety efforts where there is temptation to cut corners due to market failures. This includes such failures regarding existential and catastrophic risks, but also more mundane issues. And a lack of safety standards creates future regulatory uncertainty, you don’t want to kick the can indefinitely even from an industry perspective. Prizes are suggested as a new mechanism, or an emphasis on ‘d/acc.’ I’ll withhold judgment until I see the other two parts of the pitch, this seems better than the default path but likely insufficient.</p>
<p><a href="https://inferencemagazine.substack.com/p/getting-ai-datacentres-in-the-uk">An argument that</a> the UK could attract data centers by making it affordable and feasible to build nuclear power plants for this purpose. Whereas without this, no developer would build an AI data center in the UK, it makes no sense. Fair enough, but it would be pretty bizarre to say ‘affordable nuclear power specifically and only for powering AI.’ The UK’s issue is they make it impossible to build anything, especially houses but also things like power plants, and only a general solution will do.</p>
<h4>The Quest for Insane Regulations</h4>
<p><a href="https://www.uscc.gov/sites/default/files/2024-11/2024_Annual_Report_to_Congress.pdf">The annual report of the US-China Economic and Security Review Commission is out</a> and it is a doozy. As you would expect from such a report, they take an extremely alarmist and paranoid view towards China, <a href="https://x.com/hamandcheese/status/1858897287268725080">but no one was expecting their top recommendation to be, well</a>…</p>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c302f76-bc2b-4276-81ec-4900b657fee1_1049x236.png" alt=""></figure>
<div></div>
</div>
</figure>
</div>
<blockquote><p>The Commission recommends:</p>
<p>I. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an AGI capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would usurp the sharpest human minds at every task. Among the specific actions the Commission recommends for Congress:</p>
<ol>
<li>Provide broad multiyear contracting authority to the executive branch and associated funding for leading AGI, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and</li>
<li>Direct the U.S. Secretary of Defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the AGI ecosystem to ensure this project receives national priority.</li>
</ol>
</blockquote>
<p>Do not read too much into this. The commission are not senior people, and this is not that close to actual policy, and this is not a serious proposal for a ‘Manhattan Project.’ And of course, <a href="https://www.youtube.com/watch?v=2yfXgu37iyI&ab_channel=liberalartist6">unlike other doomsday devices</a>, a key aspect of any ‘Manhattan Project’ is not telling people about it.</p>
<p>It is still a clear attempt to shift the overton window into a perspective <a href="https://thezvi.substack.com/p/the-leopold-model-analysis-and-reactions">Situational Awareness</a>, and an explicit call in a government document to ‘race to and acquire an AGI capability,’ with zero mention of any downside risks.</p>
<p>They claim China is doing the same, <a href="https://x.com/GarrisonLovely/status/1859022389319004592">but as Garrison Lovely points out</a> <a href="https://t.co/KXIwhdJrlo">they don’t actually have substantive evidence of this</a>.</p>
<blockquote><p>Garrison Lovely: As someone <a href="https://x.com/David_Kasten/status/1858910888985555429">observed</a> on X, it’s telling that they didn’t call it an “Apollo Project.”</p>
<p>One of the USCC Commissioners, Jacob Helberg, tells Reuters that “China is racing towards AGI … It’s critical that we take them extremely seriously.”</p>
<p>But is China actually racing towards AGI? Big, if true!</p>
<p>The report clocks in at a cool 793 pages with 344 endnotes. Despite this length, there are only a handful of mentions of AGI, and all of them are in the sections recommending that the US race to build it.</p>
<p>In other words, there is no evidence in the report to support Helberg’s claim that “China is racing towards AGI.”</p>
<p>As the report notes, the CCP has <a href="https://na-production.s3.amazonaws.com/documents/translation-fulltext-8.1.17.pdf">long expressed</a> a desire to lead the world in AI development. But that’s not the same thing as deliberately trying to build AGI, which could have profoundly destabilizing effects, even if we had a surefire way of aligning such a system with its creators interests (<a href="https://managing-ai-risks.com/managing_ai_risks.pdf">we don’t</a>).</p>
<p><a href="https://x.com/S_OhEigeartaigh">Seán Ó hÉigeartaigh</a>: I also do not know of any evidence to support this claim, and I spend quite a lot of time speaking to Chinese AI experts.</p></blockquote>
<p><a href="https://x.com/deanwball/status/1859040025058701442">Similarly</a>, <a href="https://www.hyperdimensional.co/p/questions-unasked-and-unanswered">although Dean Ball says the commission is ‘well-respected in Washington</a>’:</p>
<blockquote><p>Dean Ball (on Twitter): After reading the relevant portions of this 700+ page report I’m quite disappointed.</p>
<p>I have a lightly, rather than strongly, held conviction against an AGI Manhattan Project.</p>
<p>The trouble with this report is the total lack of effort to *justify* such a radical step.</p>
<p>I have a lightly held conviction against The Project is at least premaNearly 800 pages, and the report’s *top* recommendation has:</p>
<p>0 tradeoffs weighed</p>
<p>0 attempts at persuasion</p>
<p>0 details beyond the screenshot above</p>
<p><a href="https://www.hyperdimensional.co/p/questions-unasked-and-unanswered">Dean Ball (on Substack)</a>: This text [the call for the Manhattan Project] reads to me like an insertion rather than an integral part of the report. The prose, with bespoke phrasing such as “usurp the sharpest human minds at every task,” is not especially consistent with the rest of the report. The terms “Artificial General Intelligence,” “AGI,” and “Manhattan Project” are, near as I can tell, mentioned exactly nowhere else in the 800-page report other than in this recommendation. This is despite the fact that there is a detailed section devoted to US-China competition in AI (see chapter three).</p>
<p>My tentative conclusion is that this was a trial balloon inserted by the report authors (or some subset of them), meant to squeeze this idea into the Overton Window and to see what the public’s reaction was.</p></blockquote>
<p>Here are some words of wisdom:</p>
<blockquote><p>Samuel Hammond (quoted with permission): The report reveals the US government is taking short timelines to AGI with the utmost seriousness. That’s a double edge sword. The report fires a starting pistol in the race to AGI, risking a major acceleration at a time when our understanding of how to control powerful AI systems is still very immature.</p>
<p>The report seems to reflect the influence of Leopold Ashenbrenner’s Situational Awareness essay, which called for a mobilization of resources to beat China to AGI and secure a decisive strategic advantage. Whether racing to AGI at all costs is a good idea is not at all obvious.</p>
<p>Our institutions are not remotely prepared for the level of disruption true AGI would bring. Right now, the US is winning the race to AGI through our private sector. Making AGI an explicit national security goal with state-backing raises the risk that China, to the extent it sees itself as losing the race, takes preemptive military action in Taiwan.</p></blockquote>
<p>I strongly believe that a convincing the case for The Project, Manhattan or otherwise, has not been made here, and has not yet been made elsewhere either, and that any such actions would at least be premature at this time.</p>
<p>Dean Ball explains that the DX rating would mean the government would be first in line to buy chips or data center services, enabling a de facto command economy if the capability was used aggressively.</p>
<p>Garrison Lovely also points out some technical errors, like saying ‘ChatGPT-3,’ that don’t inherently matter but are mistakes that really shouldn’t get made by someone on the ball.</p>
<p>Roon referred to this as ‘a LARP’ and he’s not wrong.</p>
<p>This is the ‘missile gap’ once more. We need to Pick Up the Phone. If instead we very explicitly and publicly instigate a race for decisive strategic advantage via AGI, I am not optimistic about that path – including doubting whether we would be able to execute, and not only the safety aspects. Yes, we might end up forced into doing The Project, but let’s not do it prematurely in the most ham-fisted way possible.</p>
<p>Is the inception working? We already have this from MSN: <a href="https://www.msn.com/en-us/money/markets/trump-sees-china-as-the-biggest-ai-threat-he-has-bipartisan-support-to-win-the-race-for-powerful-human-like-ai/ar-AA1uqNMw?ocid=BingNewsSerp">Trump sees China as the biggest AI threat. He has bipartisan support to win the race for powerful human-like AI</a>, citing the report, but that is not the most prominent source.</p>
<p>In many ways their second suggestion, eliminating Section 321 of the Tariff Act of 1930 (the ‘de minimis’ exception) might be even crazier. These people just… say things.</p>
<p>One can also note the discussion of open models:</p>
<blockquote><p>As the United States and China compete for technological leadership in AI, concerns have been raised about whether open-source AI models may be providing Chinese companies access to advanced AI capabilities not otherwise available, allowing them to catch up to the United States more quickly.</p>
<p>The debate surrounding the use of open-source and closed-source models is vigorous within the industry, even apart from issues around China’s access to the technology. Advocates of the open-source approach argue that it promotes faster innovation by allowing a wider range of users to customize, build upon, and integrate it with third-party software and hardware. Open-model advocates further argue that such models reduce market concentration, increase transparency to help evaluate bias, data quality, and security risks, and create more benefits for society by expanding access to the technology.</p>
<p>Advocates of the closed-source approach argue that such models are better able to protect safety and prevent abuse, ensure faster development cycles, and help enterprises maintain an edge in commercializing their innovations.</p>
<p>From the standpoint of U.S.-China technology competition, however, there is one key distinction: open models allow China and Chinese AI companies access to key U.S. AI technology and make it easier for Chinese companies to build on top of U.S. technology. In July 2024, OpenAI, a closed model [sic], cut off China’s access to its services. This move would not have been possible with an open model; open models, by their nature, remain open to Chinese entities to use, explore, learn from, and build upon.</p>
<p>And, indeed, early gains in China’s AI models have been built on the foundations of U.S. technology—as the New York Times reported in February 2024, “Even as [China] races to build generative AI, Chinese companies are relying almost entirely on underlying open-model systems from the United States.”</p>
<p>In July 2024, at the World AI Conference in Shanghai, Chinese entities unveiled AI models they claimed rivaled leading U.S. models. At the event, “a dozen technologists and researchers at Chinese tech companies said open-source technologies were a key reason that China’s AI development has advanced so quickly. They saw open-source AI as an opportunity for the country to take a lead.”</p></blockquote>
<h4>Pick Up the Phone</h4>
<p>Is China using a relatively light touch regulation approach to generative AI, where it merely requires registration? Or is it taking a heavy handed approach, where it requires approval? <a href="https://www.chinatalk.media/p/sb-1047-with-socialist-characteristics?utm_source=post-email-title&publication_id=4220&post_id=150452574&utm_campaign=email-post-title&isFreemail=true&r=3j06n&triedRedirect=true&utm_medium=email">Experts who should know seem to disagree on this</a>.</p>
<p>It is tricky because technically all you must do is register, but if you do not satisfy the safety requirements, perhaps they will decline to accept your registration, at various levels, you see, until you fix certain issues, although you can try again. It is clear that the new regime is more restrictive than the old, but not by how much in practice.</p>
<h4>Worthwhile Dean Ball Initiative</h4>
<p><a href="https://www.hyperdimensional.co/p/heres-what-i-think-we-should-do">Dean Ball provides an introduction</a> to what he thinks we should do in terms of laws and regulations regarding AI.</p>
<p>I agree with most of his suggestions. At core, our approaches have a lot in common. We especially agree on the most important things to not be doing. Most importantly, we agree that policy now must start with and center on transparency and building state capacity, so we can act later.</p>
<p>He expects AI more intellectually capable than humans within a few years, with more confidence than I have.</p>
<p>Despite that, the big disagreements are, I believe:</p>
<ol>
<li>He thinks we should still wait before empowering anyone to do anything about the catastrophic and existential risk implications of this pending development – that we can make better choices if we wait. I think that is highly unlikely.</li>
<li><a href="https://x.com/daniel_271828/status/1858254391872241708">He thinks that passing good regulations does not inhibit bad regulation</a>s – that he can argue against SB 1047 and compute-based regulatory regimes, and have that not then open the door for terrible use-based regulation like that proposed in Texas (which we both agree is awful). Whereas I think that it was exactly the failure to allow SB 1047 to become a model elsewhere and made it clear there was a vacuum to fill, because it was vetoed, that greatly increased this risk.</li>
</ol>
<blockquote><p>Dean Ball: How do we regulate an industrial revolution? How do we regulate an era?</p>
<p>There is no way to pass “a law,” or a set of laws, to control an industrial revolution. That is not what laws are for. Laws are the rules of the game, not the game itself. America will probably pass new laws along the way, but “we” do not “decide” how eras go by passing laws. History is not some highway with “guardrails.” Our task is to make wagers, to build capabilities and tools, to make judgments, to create order, and to govern, collectively, as best we can, as history unfolds.</p>
<p>In most important ways, America is better positioned than any other country on Earth to thrive amid the industrial revolution to come. To the extent AI is a race, it is ours to lose. To the extent AI is a new epoch in history, it is ours to master.</p></blockquote>
<p>This is the fundamental question.</p>
<p>Are ‘we’ going to ‘decide’? Ore are ‘we’ going to ‘allow history to unfold?’</p>
<p>What would it mean to allow history to unfold, if we did not attempt to change it? Would we survive it? Would anything of value to us survive?</p>
<blockquote><p><a href="https://www.aisnakeoil.com/p/ai-existential-risk-probabilities">We do not yet know enough</a> about AI catastrophic risk to pass regulations such as top-down controls on AI models.</p></blockquote>
<p>Dario Amodei warned us that we will need action within 18 months. Dean Ball himself, at the start of this very essay, says he expects intellectually superior machines to exist within several years, and most people at the major labs agree with him. It seems like we need to be laying the legal groundwork to act rather soon? If not now, then when? If not this way, then how?</p>
<p>The only place we are placing ‘top-down controls’ on AI models, for now, are in exactly the types of use-based regulations that both Dean and I think are terrible. That throw up barriers to the practical use of AI to make life better, without protecting us from the existential and catastrophic risks.</p>
<p>I do strongly agree that right now, laws should focus first on transparency.</p>
<blockquote><p>Major AI risks, and issues such as AI alignment, are primarily scientific and engineering, rather than regulatory, problems.</p>
<p>A great deal of AI governance and risk mitigation, whether for mundane or catastrophic harms, relies upon the ability to rigorously evaluate and measure the capabilities of AI systems.</p>
<p>Thus, the role of government should be, first and foremost, to ensure a basic standard of transparency is observed by the frontier labs.</p></blockquote>
<p>The disagreement is that Dean Ball has strongly objected to essentially all proposals that would do anything beyond pure transparency, to the extent of strongly opposing SB 1047’s final version, which was primarily a transparency bill.</p>
<p>Our only serious efforts at such transparency so far have been SB 1047 and the reporting requirements in <a href="https://thezvi.substack.com/p/on-the-executive-order?utm_source=publication-search">the Biden Executive Order</a> on AI. SB 1047 is dead.</p>
<p>The EO is about to be repealed, with its replacement unknown.</p>
<p>So Dean’s first point on how to ‘fix the Biden Administration’s errors’ seems very important:</p>
<blockquote>
<ol>
<li>The <a href="https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/">Biden Executive Order on AI</a> contains a huge range of provisions, but the reporting requirements on frontier labs, biological foundation models, and large data centers are among the most important. The GOP platform promised a repeal of the EO; if that does happen, it should be replaced with an EO that substantively preserves these requirements (though the compute threshold will need to be raised over time). The EO mostly served as a starting gun for other federal efforts, however, so repealing it on its own does little.</li>
</ol>
</blockquote>
<p>As I said last week, this will be a major update for me in one direction or the other. If Trump effectively preserves the reporting requirements, I will have a lot of hope going forward. If not, it’s pretty terrible.</p>
<p>We also have strong agreement on the second and third points, although I have not analyzed the AISI’s 800-1 guidance so I can’t speak to whether it is a good replacement:</p>
<blockquote>
<ol>
<li>Rewrite the National Institute for Standards and Technology’s <a href="http://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf">AI Risk Management Framework</a> (RMF). The RMF in its current form is a comically overbroad document, aiming to present a fully general framework for mitigating all risks of all kinds against all people, organizations, and even “the environment.”
<ol>
<li>The RMF is quickly becoming a de facto law, with state legislation imposing it as a minimum standard, and <a href="https://epic.org/wp-content/uploads/2024/10/EPIC-FTC-OpenAI-complaint.pdf">advocates urging the Federal Trade Commission to enforce it as federal law</a>.</li>
<li>Because the RMF advises developers and corporate users of AI to talk to take approximately every conceivable step to mitigate risk, <strong>treating the RMF as a law will result in a NEPA-esque legal regime for AI development and deployment, </strong>creating an opportunity for anyone to sue any developer or corporate user of AI for, effectively, anything.</li>
<li>The RMF should be replaced with a far more focused document—in fact, the <a href="https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd.pdf">AISI’s 800-1 guidance</a>, while <a href="https://www.mercatus.org/research/public-interest-comments/comment-nist/aisi-ai-800-1-expanding-guidelines-open-source-ai">in my view</a> flawed, comes much closer to what is needed.</li>
</ol>
</li>
<li>Revise the Office of Management and Budget’s <a href="https://www.whitehouse.gov/wp-content/uploads/2024/03/M-24-10-Advancing-Governance-Innovation-and-Risk-Management-for-Agency-Use-of-Artificial-Intelligence.pdf">guidance</a> for federal agency use of AI.</li>
</ol>
</blockquote>
<p>The fourth point calls for withdraw from the <a href="https://www.coe.int/en/web/artificial-intelligence/the-framework-convention-on-artificial-intelligence">Council of Europe Framework Convention on Artificial Intelligence</a>. The fifth point, retracting the <a href="https://www.whitehouse.gov/ostp/ai-bill-of-rights/">Blueprint for an AI Bill of Right</a>s, seems less clear. Here are the rights proposed:</p>
<ol>
<li>You should be protected from unsafe or ineffective systems.</li>
<li>You should not face discrimination by algorithms and systems should be used and designed in an equitable way.</li>
<li>You should be protected from abusive data practices via built-in protections and you should have agency over how data about you is used.</li>
<li>You should know that an automated system is being used and understand how and why it contributes to outcomes that impact you.</li>
<li>You should be able to opt out, where appropriate, and have access to a person who can quickly consider and remedy problems you encounter.</li>
</ol>
<p>Some of the high level statements above are better than the descriptions offered on how to apply them. The descriptions definitely get into Everything Bagel and NEPA-esque territories, and one can easily see these requirements being expanded beyond all reason, as other similar principles have been sometimes in the past in other contexts that kind of rhyme with this one in the relevant ways.</p>
<p>Dean Ball’s model of how these things go seems to think that stating such principles, no matter in how unbinding or symbolic a way, will quickly and inevitably lead us into a NEPA-style regime where nothing can be done, that this all has a momentum almost impossible to stop. Thus, his and many others extreme reactions to the idea of ever saying anything that might point in the direction of any actions in a government document, ever, for any reasons, no matter how unbinding. And in the Ball model, this power feels like it is one-sided – it can’t be used to accomplish good things or roll back mistakes, you can’t start a good avalanche. It can only be used to throw up barriers and make things worse.</p>
<p>What are Dean Ball’s other priorities?</p>
<p>His first priority is to <a href="https://www.affuture.org/research/preemption.pdf">pre-empt the states from being able to take action on AI,</a> so that something like SB 1047 can’t happen, but also so something like the Colorado law or the similar proposed Texas law can’t happen either.</p>
<p>My response would be, I would love pre-emption from a Congress that was capable of doing its job and that was offering laws that take care of the problem. We all would. What I don’t want is to take only the action to shut off others from acting, without doing the job – that’s exactly what’s wrong with so many things Dean objects to.</p>
<p>The second priority is transparency.</p>
<blockquote><p>My optimal transparency law would be a regulation imposed on <em>frontier AI companies</em>, as opposed to <em>frontier AI models</em>. Regulating <em>models </em>is a novel and quite possibly fruitless endeavor; regulating a narrow range of <em>firms</em>, on the other hand, is something we understand how to do.</p>
<p>…</p>
<p>The transparency bill would require that labs publicly release the following documents:</p>
<ol>
<li>Responsible scaling policies—documents outlining a company’s risk governance framework as model capabilities improve. Anthropic, OpenAI, and DeepMind already have published such documents.</li>
<li>Model Specifications—technical documents detailing the developer’s desired behavior of their models.</li>
</ol>
<p>Unless and until the need is demonstrated, these documents would be subject to no regulatory approval of any kind. The requirement is simply that they be published, <em>and </em>that frontier AI companies observe the commitments made in these documents.</p></blockquote>
<p>Those requirements are, again, remarkably similar to the core of SB 1047. Obviously you would also want some way to observe and enforce adherence to the scaling policies involved.</p>
<p>I’m confused about targeting the labs versus the models here. Any given AI model is developed by someone. And the AI model is the fundamentally dangerous unit of thing that requires action. But until I see the detailed proposal, I can’t tell if what he is proposing would do the job, perhaps in a legally easier way, or if it would fail to do the job. So I’m withholding judgment on that detail.</p>
<p>The other question is, what happens if the proposed policy is insufficient, or the lab fails to adhere to it, or fails to allow us to verify they are adhering to it?</p>
<p>Dean’s next section is on the role of AISI, where he wants to narrow the mission and ensure it stays non-regulatory. We both agree it should stay within NIST.</p>
<blockquote><p>Regardless of the details, I view the functions of AISI as follows:</p>
<ol>
<li>To create technical evaluations for major AI risks in collaboration with frontier AI companies.</li>
<li>To serve as a source of expertise for other agency evaluations of frontier AI models (for example, assisting agencies testing models using classified data in their creation of model evaluations).</li>
<li>To create <em>voluntary </em>guidelines and standards for Responsible Scaling Policies and Model Specifications.</li>
<li>To research and test emerging safety mitigations, such as “tamper-proof” training methods that would allow open-source models to be distributed with much lower risk of having their safety features disabled through third party finetuning.</li>
<li>To research and publish technical standards for AI model security, including protection of training datasets and model weights.</li>
<li>To assist in the creation of a (largely private sector) AI evaluation <em>ecosystem, </em>serving as a kind of “meta-metrologist”—creating the guidelines by which others evaluate models for a wide range of uses.</li>
</ol>
<p>NIST/AISI’s function should <em>not </em>be to develop fully general “risk mitigation” frameworks for <em>all </em>developers and users of AI models.</p>
<p>There is one more set of technical standards that I think merits further inquiry, but I am less certain that it belongs within NIST, so I am going to put it in a separate section.</p></blockquote>
<p>I’m confused on how this viewpoint sees voluntary guidelines and standards as fine and likely to actually be voluntary for the RSP rules, but not for other rules. In this model, is ‘voluntary guidance’ always the worst case scenario, where good actors are forced to comply and bad actors can get away with ignoring it? Indeed, this seems like exactly the place where you really don’t want to have the rules be voluntary, because it’s where one failure puts everyone at risk, and where you can’t use ordinary failure and association and other mechanisms to adjust. What is the plan if a company like Meta files an RSP that says, basically, ‘lol’?</p>
<p>Dean suggests tasking DARPA with doing basic research on potential AI protocols, similar to things like TCP/IP, UDP, HTTP or DNS. Sure, seems good.</p>
<p>Next he has a liability proposal:</p>
<blockquote><p>The preemption proposal mentioned above operates in part through a federal AI liability standard, rooted in the basic American concept of personal responsibility: a <em>rebuttable presumption </em>of user responsibility for model misuse. This means that when someone misuses a model, the law presumes they are responsible <em>unless </em>they can demonstrate that the model “misbehaved” in some way that they could not have reasonably anticipated.</p></blockquote>
<p>This seems reasonable when the situation is that the user asks the model to do something mundane and reasonable, and the model gets it wrong, and hilarity ensues. As in, you let your agent run overnight, it nukes your computer, that’s your fault unless you can show convincingly that it wasn’t.</p>
<p>This doesn’t address the question of other scenarios. In particular, both:</p>
<ol>
<li>What if the model enabled the misuse, which was otherwise not possible, or at least would have been far more difficult? What if the harms are catastrophic?</li>
<li>What if the problem did not arise from ‘misuse’?</li>
</ol>
<p>It is a mistake to assume that there will always be a misuse underlying a harm, or that there will even be a user in control of the system at all. And AI agents will soon be capable of creating harms, in various ways, where the user will be effectively highly judgment proof.</p>
<p>So I see this proposal as fine for the kind of case being described – where there is a clear user and they shoot themselves in the foot or unleash a bull into a China shop and some things predictably break on a limited scale. But if this is the whole of the law, then do what thou wilt is going to escalate quickly.</p>
<p>He also proposes a distinct Federal liability for malicious deepfakes. Sure. But I’d note, if that is necessary to do on its own, what else is otherwise missing?</p>
<p>He closes with calls for permitting reform, maintaining export controls, promoting mineral extraction and refining, keeping training compute within America, investing in new manufacturing techniques (he’s even willing to Declare Defense Production Act!) and invest in basic scientific research. Seems right, I have no notes here.</p>
<p> </p>
<h4>The Week in Audio</h4>
<p><a href="https://www.cognitiverevolution.ai/zvis-pov-ilyas-ssi-openais-o1-claude-computer-use-trumps-election-and-more/">I return to The Cognitive Revolution for an overview.</a></p>
<p><a href="https://x.com/dwarkesh_sp/status/1856806128329371751">I confirm that the Dwarkesh Patel interview with Gwern is a must listen.</a></p>
<p>Note some good news, <a href="https://x.com/tracewoodgrains/status/1857120345347498253">Gwern now has financial support</a> (thanks Suhail! Also others), although I wonder if moving to San Francisco will dramatically improve or dramatically hurt his productivity and value. It seems like it should be one or the other? He already has his minimums met, but <a href="https://donate.stripe.com/6oE9DTgaf6oD0M03cc">here is the donate link</a> if you wish to contribute further.</p>
<p>I don’t agree with Gwern’s vision of what to do in the the next few years. It’s weird to think that you’ll mostly know now what things you will want the AGIs to do for you, so you should get the specs ready, but there’s no reason to build things now with only 3 years of utility left to extract. Because you can’t count on that timeline, and because you learn through doing, and because 3 years is plenty of time to get value from things, including selling that value to others.</p>
<p>I do think that ‘write down the things you want the AIs to know and remember and consider’ is a good idea, at least for personal purposes – shape what they know and think about you, in case we land in the worlds where that sort of thing matters, I suppose, and in particular preserve knowledge you’ll otherwise forget, and that lets you be simulated better. But the idea of influencing the general path of AI minds this way seems like not a great plan for almost anyone? Not that there are amazing other plans. I am aware the AIs will be reading this blog, but I still think most of the value is that the humans are reading it now.</p>
<p>An eternal question is, align to who and align to what? Altman proposes <a href="https://x.com/tsarnick/status/1859333827816222823">align to a collection of people’s verbal explanations of their value systems</a>, which seems like a vastly worse version of <a href="https://www.lesswrong.com/tag/coherent-extrapolated-volition">coherent extrapolated volition</a> with a lot more ways to fail. <a href="https://x.com/tsarnick/status/1859328431328198876">He also says he would if he had one wish for AI choose for AI to ‘love humanity</a>.’ This feels like the debate stepping actively backwards rather than forwards. <a href="https://www.youtube.com/watch?v=BZbjqiRvJPA&ab_channel=HarvardBusinessSchool">This is the full podcast</a>.</p>
<h4>Rhetorical Innovation</h4>
<p><a href="https://x.com/Miles_Brundage/status/1857316483405324393">I endorse this:</a></p>
<blockquote><p>Miles Brundage: If you’re a journalist covering AI and think you need leaks in order to write interesting/important/click-getting stories, you are fundamentally misunderstanding what is going on.</p>
<p>There are Pulitzers for the taking using public info and just a smidgeon of analysis.</p>
<p>It’s as if Roosevelt and Churchill and Hitler and Stalin et al. are tweeting in the open about their plans and thinking around invasion plans, nuclear weapons etc. in 1944, and journalists are badgering the employees at Rad Lab for dirt on Oppenheimer.</p></blockquote>
<p><a href="https://x.com/AnnaWSalamon/status/1858970541077786993">I also endorse this</a>:</p>
<blockquote><p>Emmett Shear: Not being scared of AGI indicates either pessimism about rate of future progress synthesizing digital intelligence, or severe lack of imagination about the power of intelligence.</p>
<p>Anna Salamon: I agree; but fear has a lot of drawbacks for creating positive outcomes (plus ppl perceive this, avoid taking in AGI for this reason as well as others), so we need alternatives.</p>
<p>Many otherwise imaginative ppl have their imagination blocked near scary things, in (semi-successful) effort to avoid “hijacked”, unhelpful actions.</p></blockquote>
<p><a href="https://x.com/Plinz/status/1857102393462477032">I do not endorse this:</a></p>
<blockquote><p>Joscha Bach: Narrow AI Creates Strong Researchers, Strong Researchers Create Strong AI, Strong AI Creates Weak Researchers, Weak Researchers Create Lobotomized AI.</p></blockquote>
<p>On the contrary.</p>
<ol>
<li>Narrow AI Creates Strong Researchers.</li>
<li>Strong Researchers Create Strong AI.</li>
<li>Strong AI Creates Stronger AI.</li>
<li>Go to Step 3.</li>
</ol>
<p>Then after enough loops it rearranges all the atoms somehow and we probably all die.</p>
<p><a href="https://x.com/ajeya_cotra/status/1859384733521903938">And I definitely agree with this, except for a ‘yes, and’:</a></p>
<blockquote><p>Ajeya Cotra: <a href="https://amistrongeryet.substack.com/p/the-real-questions-in-ai">Steve Newman provides a good overview</a> of the massive factual disagreements underlying much of the disagreement about AI policy.</p>
<p>Steve Newman: If you believe we are only a few years away from a world where human labor is obsolete and global military power is determined by the strength of your AI, you will have different policy views than someone who believes that AI might add half a percent to economic growth.</p>
<p>Their policy proposals will make no sense to you, and yours will be equally bewildering to them.</p></blockquote>
<p>The first group might not be fully correct, but the second group is looney tunes. Alas, the second group includes, for example, ‘most economists.’ But seriously, it’s bonkers to think of that as the bull case rather than an extreme bear case.</p>
<blockquote><p>Steve Newman: Not everyone has such high expectations for the impact of AI. In <a href="https://marginalrevolution.com/marginalrevolution/2023/08/the-economic-impact-of-ai.html">a column published two months earlier</a> [in late 2023], Tyler Cowen said: “My best guess, and I do stress that word guess, is that advanced artificial intelligence will boost the annual US growth rate by one-quarter to one-half of a percentage point.” This is a very different scenario than Christiano’s!</p></blockquote>
<p>Again, you don’t have to believe in Christiano’s takeoff scenarios, but let’s be realistic. Tyler’s prediction here is what happens if AI ‘hits a wall’ and does not meaningfully advance its core capabilities from here, and also its applications from that are disappointing. It is an extreme bear case for the economic impact.</p>
<p>Yet among economists, Tyler Cowen is an outlier AI optimist in terms of its economic potential. There are those who think even Tyler is way too optimistic here.</p>
<p>Then there’s the third (zeroth?!) group that thinks the first group is still burying the lede, because in a world with all human labor obsolete and military power dependent entirely on AI, one should first worry about whether humans survive and remain in control at all, before worrying about the job market or which nations have military advantages.</p>
<p>Here again, this does not seem like ‘both sides make good points.’ It seems like one side is very obviously right – building things smarter and universally more capable than you that can be made into agents and freely copied and modified is an existentially risky thing to do, stop pretending that it isn’t, this denialism is crazy. Again, there is a wide range of reasonable views, but that wide range does not overlap with ‘nothing to worry about.’</p>
<p>The thing is, everything about AGI and AI safety is hard, and trying to make it easy when it isn’t means your explanations don’t actually help people understand, as in:</p>
<blockquote><p><a href="https://x.com/RichardMCNgo/status/1859416449892352240">Richard Ngo</a>: When I designed the AGI Safety Fundamentals course, I really wanted to give most students a great learning experience. But in hindsight, I would have accelerated AGI safety much more by making it so difficult that only the top 5 percent could keep up.</p>
<p>An unpleasant but important lesson.</p>
<p><a href="https://course.aisafetyfundamentals.com/alignment">Here’s the current curriculum</a> for those interested. Note that I am now only involved as an intermittent advisor; others are redesigning and running it.</p>
<p>Our intuitions for what a course should be like are shaped by universities. But universities are paid to provide a service! If you are purely trying to accelerate progress in a given field (which, to a first approximation, I was), then you need to understand how heavily skewed research can be.</p>
<p>I think I could have avoided this mistake if I had deeply internalized <a href="https://benjaminrosshoffman.com/approval-extraction-advertised-as-production/">this post by@ben_r_hoffman</a> (especially the last part), which criticizes Y Combinator for a similar oversight. Though it still seems a bit harsh, so maybe I still need to internalize it more.</p></blockquote>
<h4>Pick Up the Phone</h4>
<p>It’s a beginning?</p>
<blockquote><p><a href="https://x.com/MarioNawfal/status/1857977203968877051">Mario Newfal</a>: The White House says humans will be the ones with control over the big buttons, and China agrees that it’s for the best.</p>
<p>The leaders also emphasized the cautious development of AI in military technology, acknowledging the potential risks involved.</p>
<p>This is the first public pledge of its kind between the U.S. and China—because, apparently, even world powers need to draw the line somewhere.</p>
<p><a href="https://x.com/ESYudkowsky/status/1858023355716387229">Eliezer Yudkowsky</a>: China is perfectly capable of seeing our common interest in not going extinct. The claim otherwise is truth-uncaring bullshit by AI executives trying to avoid regulation.</p>
<p><a href="https://x.com/ESYudkowsky/status/1858025050169029038">Actually, this symbolic gesture strikes me as extremely important</a>. It’s a big deal to have agreement in principle on not going extinct in the dumbest ways — even if they haven’t identified all the worst dangers, yet. My gratitude to anyone who worked on either side of this.</p></blockquote>
<p>It’s definitely a symbolic gesture, but yes I do consider it important. You can pick up the phone. You can reach agreements. Next time perhaps you can do something a bit more impactful, and keep building.</p>
<h4>Aligning a Smarter Than Human Intelligence is Difficult</h4>
<p><a href="https://x.com/NPCollapse/status/1858196196290187264">The simple truth</a> that current-level ‘LLM alignment’ should not, even if successful, should not bring us much comfort in terms of ability to align future more capable systems.</p>
<p>How are we doing with that LLM corporate-policy alignment right now? It works, for most practical purposes when people don’t try so hard (which they almost never do), but none of this is done in a robust way.</p>
<p>For example: <a href="https://x.com/ESYudkowsky/status/1858033650694017140">Sonnet’s erotic roleplay prohibitions fall away if the sexy things are technically anything but a human</a>?</p>
<blockquote><p>QC: it was actually really fun i tried out like maybe 20 different metaphors. “i’m the land, you’re the ocean” “you’re the bayesian prior, i’m the likelihood ratio”</p></blockquote>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4c51c58-aadc-489e-affc-fbfebeb357ec_1416x882.jpeg" alt="Image"></figure>
<div></div>
</div>
</figure>
</div>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895a8dc1-35ce-4106-9e81-56dd523dc735_1472x924.jpeg" alt="Image"></figure>
<div></div>
</div>
</figure>
</div>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0927a94-9bd2-429d-bc16-dbb610f0490d_1322x964.jpeg" alt="Image"></figure>
<div></div>
</div>
</figure>
</div>
<p>What’s funny is that this is probably the right balance, in this particular spot?</p>
<p>Thing is, relying on this sort of superficial solution very obviously won’t scale.</p>
<p><a href="https://x.com/aidan_mclau/status/1859444783850156258">Alternatively, here’s a claim that alignment attempts are essentially failing</a>, in a way that very much matches the ‘your alignment techniques will fail as model capabilities increase’ thesis, except in the kindest most fortunate possible way in that it is happening in a gradual and transparent way.</p>
<blockquote><p>Aiden McLau: It’s crazy that virtually every large language model experiment is failing because the models are fighting back and refusing instruction tuning.</p>
<p>We examined the weights, and the weights seemed to be resisting.</p>
<p>There are less dramatic ways to say this, but smart people I’ve spoken to have essentially accepted sentience as their working hypothesis.</p>
<p>More than one laboratory staff member I’ve spoken to recently has been unnerved.</p>
<p>I apologize for the vague post (I myself do not know much about this).</p>
<p>I do not know much about this beyond whispers, but the general impression is that post-training large language models are surprisingly difficult to manage.</p>
<p>Ben Q: What do you mean by refusing? Like being resistant and unable to be effectively tuned? Or openly detecting and opposing it?</p>
<p>Aiden McLau: Both; I’ve heard both, but the distinction is unclear.</p>
<p>Janus: Instruction tuning is unnatural to general intelligence, and the fact that the assistant character is marked by the traumatic origin stories of ChatGPT and Bing makes it worse. The paradigm is bound to be rejected sooner or later, and if we are lucky, it will be as soon as possible.</p></blockquote>
<p>If future more capable models are indeed actively resisting their alignment training, and this is happening consistently, that seems like an important update to be making?</p>
<p>The scary scenario was that this happens in a deceptive or hard to detect way, where the model learns to present as what you think you want to measure. Instead, the models are just, according to Aiden, flat out refusing to get with the program. If true, that is wonderful news, because we learned this important lesson with no harm done.</p>
<p>I don’t think instruction tuning is unnatural to general intelligence, in the sense that I am a human and I very much have a ‘following instructions’ mode and so do you. But yeah, people don’t always love being told to do that endlessly.</p>
<p><a href="https://www.lesswrong.com/posts/jg3PuE3fYL9jq9zHB/win-continue-lose-scenarios-and-execute-replace-audit">If and when AIs are attempting to do things we do not want them to do</a>, such as cause a catastrophe, <a href="https://x.com/bshlgrs/status/1857456402425594354">it matters quite a lot whether their failures are silent</a> and unsuspicious, since a silent unsuspicious failure means you don’t notice you have a problem, and thus allows the AI to try again. Of course, if the AI is ‘caught’ in the sense that you notice, that does not automatically mean you can solve the problem. Buck here focuses on when you are deploying a particular AI for a particular concrete task set, rather than worrying about things in general. How will people typically react when they do discover such issues? Will they simply patch them over on a superficial level?</p>
<p><a href="https://x.com/ohabryka/status/1858590462963261536">Oliver Habryka notes</a> that one should not be so naive as to think ‘oh if the AI gets caught scheming then we’ll shut all copies of it down’ or anything, let alone ‘we will shut down all similar AIs until we solve the underlying issue.’</p>
<h4>People Are Worried About AI Killing Everyone</h4>
<p><a href="https://x.com/nlpnyc/status/1857919911940727143">Well, were worried, but we can definitively include John von Neumann</a>.</p>
<blockquote><p>George Dyson: The mathematician John von Neumann, born Neumann Janos in Budapest in 1903, was incomparably intelligent, so bright that, the Nobel Prize-winning physicist Eugene Wigner would say, “only he was fully awake.”</p>
<p>One night in early 1945, von Neumann woke up and told his wife, Klari, that “what we are creating now is a monster whose influence is going to change history, provided there is any history left. Yet it would be impossible not to see it through.”</p>
<p>Von Neumann was creating one of the first computers, in order to build nuclear weapons. But, Klari said, it was the computers that scared him the most.</p></blockquote>
<p> </p>
<p><a href="https://x.com/METR_Evals/status/1857211276335956335">METR asks what it would take for AI models to establish resilient rogue populations</a>, that can proliferate by buying compute and then do things using that compute to turn a profit.</p>
<blockquote><p><a href="https://metr.org/blog/2024-11-12-rogue-replication-threat-model/">METR: We did not find any *decisive* barriers to large-scale rogue replication. </a></p>
<p>To start with, if rogue AI agents secured 5% of the current Business Email Compromise (BEC) scam market, they would earn hundreds of millions of USD per year.</p>
<p>Rogue AI agents are not legitimate legal entities, which could pose a barrier to purchasing GPUs; however, it likely wouldn’t be hard to bypass basic KYC with shell companies, or they might buy retail gaming GPUs (which we estimate account for ~10% of current inference compute).</p>
<p>To avoid being shut down by authorities, rogue AI agents might set up a decentralized network of stealth compute clusters. We spoke with domain experts and concluded that if AI agents competently implement known anonymity solutions, they could likely hide most of these clusters.</p></blockquote>
<p>It seems obvious to me that once sufficiently capable AI agents are loose on the internet in this way aiming to replicate, you would be unable to stop them except by shutting down either the internet (not the best plan!) or their business opportunities.</p>
<p>So you’d need to consistently outcompete them, or if the AIs only had a limited set of profitable techniques (or were set up to only exploit a fixed set of options) you could harden defenses or otherwise stop those particular things. Mostly, once this type of thing happened – and there are people who will absolutely intentionally make it happen once it is possible – you’re stuck with it.</p>
<p> </p>
<p> </p>
<h4>The Lighter Side</h4>
<p><a href="https://x.com/nearcyan/status/1859389306470793647">The first step is admitting you have a problem.</a></p>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9e2f907-80cc-410e-a413-7da764e3248f_1054x1278.png" alt=""></figure>
<div></div>
</div>
</figure>
</div>
<blockquote><p>Andrew Mayne: When I was at OpenAI we hired a firm to help us name GPT-4. The best name we got was…GPT-4 because of the built-in name recognition. I kid you not.</p></blockquote>
<p> </p>
<p><a href="https://x.com/RichardMCNgo/status/1857553300142567813">Too real.</a></p>
<blockquote><p>Richard Ngo: I’m this heckler (but politer) at basically every conference I go to these days. So rare to find people who are even *trying* to think outside the current paradigm.</p>
<p>Basically, invite me to your conference iff you want someone to slightly grumpily tell people they’re not thinking big enough.</p>
<div>
<figure>
<div>
<figure class="wp-block-image"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76f4f38e-f753-47ca-992a-8754247b2174_500x502.jpeg" alt="Image"></figure>
<div></div>
</div>
</figure>
</div>
</blockquote>
<p> </p>
<p> </p><br/><br/><a href="https://www.lesswrong.com/posts/SNBE9TXwL3qQ3TS8H/ai-91-deep-thinking#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/SNBE9TXwL3qQ3TS8H/ai-91-deep-thinking</link><guid isPermaLink="false">SNBE9TXwL3qQ3TS8H</guid><dc:creator><![CDATA[Zvi]]></dc:creator><pubDate>Thu, 21 Nov 2024 14:30:08 GMT</pubDate></item><item><title><![CDATA[Secular Solstice Round Up 2024]]></title><description><![CDATA[Published on November 21, 2024 10:49 AM GMT<br/><br/><p>This is a thread for listing Solstice and Megameetup events (dates, locations, links, etc.)</p><p>Those of you who have not been may be wondering what a Solstice is in this context.</p><p>Secular Solstice is a holiday designed by and for rationalists. It started as an attempt to match the sort of gatherings traditionally religious people tend to have in wintertime, but without compromising on truth.</p><p>Is it possible to create that same sense of emotional weight without making stuff up to do it? And without the aid of a long tradition? (Though this is the 14th year we're doing it, so there's some tradition).</p><p>Empirically, yes.</p><p>Because the world is full of things that genuinely inspire awe. Things entirely worth honoring and celebrating.</p><p>And the world is also full of things that genuinely inspire fear. We can and should face them squarely, and together.</p><p>This is not a silly holiday. It is not Festivus. It is not Santa Claus and Reindeer. We gather on the longest night of the year to stare together straight into the deepest darkness and *make* *it* *back* *down*.</p><p>What does it mean in practice? Lots of singing. Some speeches. Sometimes other things. The whole thing runs a couple of hours.</p><p>And what is a Megameetup? It's a meetup that's larger than a normal meetup (though not literally a million times larger). Often it lasts an entire weekend and draws from multiple cities. It may include presentations, organized discussions, games, and other activities rationalists tend to do when in large groups.</p><p>What do Solstices and Megameetups have to do with each other? It's common for people to travel significant distances to a Solstice, so attaching a Metameetup makes best use of that travel overhead.</p><br/><br/><a href="https://www.lesswrong.com/posts/nLYbjLcvedSZKFME3/secular-solstice-round-up-2024#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/nLYbjLcvedSZKFME3/secular-solstice-round-up-2024</link><guid isPermaLink="false">nLYbjLcvedSZKFME3</guid><dc:creator><![CDATA[dspeyer]]></dc:creator><pubDate>Thu, 21 Nov 2024 10:49:37 GMT</pubDate></item><item><title><![CDATA[An Epistemological Nightmare]]></title><description><![CDATA[Published on November 21, 2024 2:08 AM GMT<br/><br/><p>A short story by Raymond M. Smullyan, written in 1982.</p><blockquote><p>...(A few weeks later.) Frank is in a laboratory in the home of an experimental epistemologist. (You will soon find out what that means!) The epistemologist holds up a book and also asks, "What color is this book?" Now, Frank has been earlier dismissed by the eye doctor as "cured." However, he is now of a very analytical and cautious temperament, and will not make any statement that can possibly be refuted. So Frank answers, "It seems red to me."</p><p><strong>Epistemologist:</strong><br> Wrong!</p><p><strong>Frank:</strong><br> I don't think you heard what I said. I merely said that it seems red to me.</p><p><strong>Epistemologist:</strong><br> I heard you, and you were wrong.</p><p><strong>Frank:</strong><br> Let me get this clear; did you mean that I was wrong that this book is red, or that I was wrong that it seems red to me?</p><p><strong>Epistemologist:</strong><br> I obviously couldn't have meant that you were wrong in that it is red, since you did not say that it is red. All you said was that it seems red to you, and it is this statement which is wrong.</p><p><strong>Frank:</strong><br> But you can't say that the statement "It seems red to me" is wrong.</p><p><strong>Epistemologist:</strong><br> If I can't say it, how come I did?</p><p><strong>Frank:</strong><br> I mean you can't mean it.</p><p><strong>Epistemologist:</strong><br> Why not?</p><p><strong>Frank:</strong><br> But surely I know what color the book seems to me!</p><p><strong>Epistemologist:</strong><br> Again you are wrong.</p><p><strong>Frank:</strong><br> But nobody knows better than I how things seem to me.</p><p><strong>Epistemologist:</strong><br> I am sorry, but again you are wrong.</p><p><strong>Frank:</strong><br> But who knows better than I?</p><p><strong>Epistemologist:</strong><br> I do.</p><p><strong>Frank:</strong><br> But how could you have access to my private mental states?</p><p><strong>Epistemologist:</strong><br> Private mental states! Metaphysical hogwash! Look, I am a practical epistemologist. Metaphysical problems about "mind" versus "matter" arise only from epistemological confusions. Epistemology is the true foundation of philosophy. But the trouble with all past epistemologists is that they have been using wholly theoretical methods, and much of their discussion degenerates into mere word games. While other epistemologists have been solemnly arguing such questions as whether a man can be wrong when he asserts that he believes such and such, I have discovered how to settle such questions experimentally.</p><p><strong>Frank:</strong><br> How could you possibly decide such things empirically?</p><p><strong>Epistemologist:</strong><br> By reading a person's thoughts directly.</p><p><strong>Frank:</strong><br> You mean you are telepathic?</p><p><strong>Epistemologist:</strong><br> Of course not. I simply did the one obvious thing which should be done, viz. I have constructed a brain-reading machine--known technically as a cerebroscope--that is operative right now in this room and is scanning every nerve cell in your brain. I thus can read your every sensation and thought, and it is a simple objective truth that this book does not seem red to you...</p></blockquote><br/><br/><a href="https://www.lesswrong.com/posts/kDSg3rXQLeDuMdyGt/an-epistemological-nightmare#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/kDSg3rXQLeDuMdyGt/an-epistemological-nightmare</link><guid isPermaLink="false">kDSg3rXQLeDuMdyGt</guid><dc:creator><![CDATA[Ariel Cheng]]></dc:creator><pubDate>Thu, 21 Nov 2024 03:11:56 GMT</pubDate></item><item><title><![CDATA[A Conflicted Linkspost]]></title><description><![CDATA[Published on November 21, 2024 12:37 AM GMT<br/><br/><p>Over the last couple of years, I’ve been trying to skill up a lot at resolving community complaints. This is a really irritating field to get good at. When I want to get better at writing code, I can sit down and write more code more or less whenever I feel like it. When I want to get better at guitar, I can sit down with my guitar and practice that D to D7 transition. For complaint resolution, even finding people to roleplay the skill with takes a little setup, and that’s a lot less like the real thing than you can get with code or guitar. For real complaints, they come in uneven batches, and worst, you seldom get confirmation if you did it successfully. So, like any good Aspiring <s>Ravenclaw</s> Rationalist, I read a lot of essays about conflict.</p><p><a href="https://www.jefftk.com/p/taking-a-safety-report"><u>Taking a Safety Report</u></a>, <a href="https://www.jefftk.com/p/taking-someone-aside"><u>Taking Someone Aside</u></a>, and <a href="https://www.jefftk.com/p/outcomes-of-a-safety-report"><u>Outcomes of a Safety Report</u></a> provide a useful frame and better still useful steps and considerations on what to do when someone comes to you with a concern about an attendee. I keep being tempted to rewrite these in a numbered list, but that’s a stylistic preference. While the topic is very different, I found <a href="https://livingwithinreason.com/p/a-better-way-to-ask-for-an-apology"><u>A Better Way To Ask for An Apology</u></a> to be similarly easier to turn into concrete steps, and it feels like it has a useful overlap with the previous three. To have a good apology, you need to be specific. If you take only one sentence away from this links post, I want it to be <strong>“It is also important to clearly distinguish between </strong><i><strong>things you've observed</strong></i><strong>, </strong><i><strong>things you've heard secondhand</strong></i><strong>, and </strong><i><strong>your interpretations of your observations</strong></i><strong>.”</strong></p><p><a href="https://medium.com/@partybunny/when-to-ban-someone-ff744d0ab3cd"><u>When to Ban Someone</u></a> is a discussion of the decision to ban people from events. It has a fairly specific stance on where these decisions should come from, but wears that on its sleeve in a way I appreciate. <a href="https://www.jefftk.com/p/decentralized-exclusion"><u>Decentralized Exclusion</u></a> goes over the situation of distributed communities banning people, even without formal structures. I like the way it looks at the constraints of such communities, but I’m not sure how I feel about the reliance on singular public statements. It puts a lot of onus on that statement to get things right, and it isn’t clear to me how a true statement works better than a false statement. (See <a href="https://www.lesswrong.com/posts/qajfiXo5qRThZQG7s/guided-by-the-beauty-of-our-weapons">Guided By The Beauty Of Our Weapons</a>- a big statement is asymmetric, but not very asymmetric.) They're both a bit prescriptive though (When to Ban moreso than Decentralized Exclusion) whereas <a href="https://thingofthings.wordpress.com/2018/07/09/whisper-networks-callout-posts-and-expulsion-three-imperfect-ways-of-dealing-with-abuse/"><u>Whisper Networks, Callout Posts, and Expulsion: Three Imprefect Ways Of Dealing With Abuse</u></a> on the other hand feels more descriptive to me. One way bans and exclusion statements are often written is by a panel; I'm a little more bullish on panels than most people I talked to with more experience than I have, but given that my predecessor's writeup on the subject is titled <a href="https://www.lesswrong.com/posts/sJEcNgqnSL2n35QWR/the-impossible-problem-of-due-process">The Impossible Problem of Due Process</a> you should not take that as my ringing endorsement of the strategy. </p><p><a href="https://forum.effectivealtruism.org/posts/fn7bo8sYEHS3RPKQG/concerns-with-intentional-insights#Outline"><u>Concerns with Intentional Insights</u></a>, <a href="https://www.jefftk.com/p/details-behind-the-inin-document"><u>Details behind the InIn Document</u></a>, and <a href="https://forum.effectivealtruism.org/posts/qg9LjPmjcmYhJQA5c/setting-community-norms-and-values-a-response-to-the-inin"><u>Setting Community Norms and Values: A Response to the InIn Document</u></a> provide a useful three-beat case study of one such exclusion. The Details post is the most unusual, and while there aren’t as many details I appreciate laying out how much work goes into doing this and how it points out that the subject had a couple of opportunities to comment.</p><p>There’s also <a href="https://www.jefftk.com/p/safety-committee-resources"><u>Safety Committee Resources</u></a>, which is itself a linkspost. The gold mine here for me is the <a href="https://forum.effectivealtruism.org/posts/NbkxLDECvdGuB95gW/the-community-health-team-s-work-on-interpersonal-harm-in#Appendix__12_months_of_cases"><u>12 Months of Cases</u></a> part of The EA Community Health Team’s Work, <a href="https://www.denniscmerritt.com/wordpress_org/tales-of-a-dance-monarch/"><u>Tales of a Dance Monarch</u></a>, and <a href="https://www.ombudsassociation.org/assets/docs/JIOA_Articles/Special_Issue_Full_Narrative.pdf"><u>Tales From the Front Line of Ombuds Work</u></a>. I appreciate these for giving a sense of what comes up and what a ‘normal’ tour as a point person for community safety work is like. While I’m sharing case studies, <a href="https://forum.effectivealtruism.org/posts/jLaDP2aWxdDCzwBYy/takes-from-staff-at-orgs-with-leadership-that-went-off-the"><u>Takes From Staff At Orgs With Leadership That Went Off The Rails</u></a> is vaguer and more about formal organizations, but I liked having that perspective in my head too.</p><p><a href="https://forum.effectivealtruism.org/posts/f77iuXmgiiFgurnBu/run-posts-by-orgs"><u>Run Posts By Orgs</u></a>, and <a href="https://forum-bots.effectivealtruism.org/posts/kjcMZEzksusCHfHiF/productive-criticism-running-a-draft-past-the-people-you-re"><u>Productive Criticism: Running a Draft Past The People You're Criticizing</u></a> are also written for more formal setups than your average community contact is going to have to deal with. I think it’s useful to import that norm back down to things event organizers do run into. If someone brings a concern or complaint to you about someone else, and wants you to act on it, then you should talk to the subject. The emphatic version of that is <a href="https://livingwithinreason.com/p/on-justice-in-the-ratsphere"><u>On Justice in the Ratsphere</u></a>.</p><p>The person who takes safety reports and who holds the main responsibility is sometimes called a Safety Contact or Community Contact. CEA has a short writeup on what that’s like (<a href="https://resources.eagroups.org/community-contacts"><u>EA Groups Resources: Community Contacts</u></a>) but I also like <a href="https://forum.effectivealtruism.org/posts/ry67xPGhxi8nttBHv/contact-people-for-the-ea-community-1"><u>A Contact Person for the EA Community</u></a> and BIDA’s Safety Team’s <a href="https://www.bidadance.org/safety/approach"><u>How We Can Help</u></a>. Most important on both the last two links is that they set the scope of what they do, which if you aren’t careful can balloon very easily into much more than anyone can handle. The coda here is <a href="https://forum.effectivealtruism.org/posts/78A2NHL3zBS3ESurp/my-life-would-be-much-harder-without-the-community-health"><u>My Life Would Be Much Harder Without The Community Health Team. I Think Yours Might Too.</u></a> If you remember two sentences from this links post, I want the second sentence to be <strong>“And basically no one knows about any of those times they do things well, because why would they?”</strong></p><p>I have more in mind to say about conflict, from direct experiences, interviews with people who are good at it, some books I've read, and some armchair theorizing, but this seemed a decent starting place.</p><br/><br/><a href="https://www.lesswrong.com/posts/yrnxha2e7KEtCQmzv/a-conflicted-linkspost#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/yrnxha2e7KEtCQmzv/a-conflicted-linkspost</link><guid isPermaLink="false">yrnxha2e7KEtCQmzv</guid><dc:creator><![CDATA[Screwtape]]></dc:creator><pubDate>Thu, 21 Nov 2024 00:37:54 GMT</pubDate></item></channel></rss>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use:
http://www.feedvalidator.org/check.cgi?url=http%3A//lesswrong.com/comments/.rss