Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma. If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
…
continue reading

1
“Power Lies Trembling: a three-book review” by Richard_Ngo
27:11
27:11
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
27:11In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They’re huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality a…
…
continue reading

1
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans
7:58
7:58
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
7:58This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon. Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan…
…
continue reading

1
“The Paris AI Anti-Safety Summit” by Zvi
42:06
42:06
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
42:06It doesn’t look good. What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI Safety. This one was centrally coordination against AI Safety. In November 2023, the UK Bletchley Summit on AI Safety set out to let nations coordinate in the hopes that AI might not kill everyone. Ch…
…
continue reading

1
“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby
2:37
2:37
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
2:37Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility. Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn't take off, most of this content has not been as widely read as warranted by …
…
continue reading

1
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby
8:52
8:52
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
8:52Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substantial amount of writing about AI alignment and mathematics had been published on the website. If you've tried using Arbital.com the last few years, you might have noticed that it was on its last legs …
…
continue reading

1
“How to Make Superbabies” by GeneSmith, kman
1:08:04
1:08:04
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
1:08:04We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affect things like diabetes risk or college graduation rates. Our knowledge has advanced to the point where, if we had a safe and reliable means of modifying genes in embryos, we could literally create supe…
…
continue reading

1
“A computational no-coincidence principle” by Eric Neyman
13:28
13:28
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
13:28Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. In a recent paper in Annals of Mathematics and Philosophy, Fields medalist Timothy Gowers asks why mathematicians sometimes believe that unproved statements are likely to be true.…
…
continue reading

1
“A History of the Future, 2025-2040” by L Rudolf L
2:22:38
2:22:38
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
2:22:38This is an all-in-one crosspost of a scenario I originally published in three parts on my blog (No Set Gauge). Links to the originals: A History of the Future, 2025-2027 A History of the Future, 2027-2030 A History of the Future, 2030-2040 Thanks to Luke Drago, Duncan McClements, and Theo Horsley for comments on all three parts. 2025-2027 Below is …
…
continue reading

1
“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape
1:54
1:54
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
1:54On March 14th, 2015, Harry Potter and the Methods of Rationality made its final post. Wrap parties were held all across the world to read the ending and talk about the story, in some cases sparking groups that would continue to meet for years. It's been ten years, and think that's a good reason for a round of parties. If you were there a decade ago…
…
continue reading

1
“Some articles in ‘International Security’ that I enjoyed” by Buck
7:56
7:56
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
7:56A friend of mine recently recommended that I read through articles from the journal International Security, in order to learn more about international relations, national security, and political science. I've really enjoyed it so far, and I think it's helped me have a clearer picture of how IR academics think about stuff, especially the core power …
…
continue reading

1
“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace
8:39
8:39
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
8:39This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen. I encourage folks to engage with its critique and propose better strategies going forward. Here's the opening ~20% of the post. I encourage reading it all. In recent decades, a growing coalition has emerged to oppose the development of artif…
…
continue reading

1
“Murder plots are infohazards” by Chris Monteiro
3:58
3:58
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
3:58Hi all I've been hanging around the rationalist-sphere for many years now, mostly writing about transhumanism, until things started to change in 2016 after my Wikipedia writing habit shifted from writing up cybercrime topics, through to actively debunking the numerous dark web urban legends. After breaking into what I believe to be the most success…
…
continue reading

1
“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison
11:41
11:41
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
11:41This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to da…
…
continue reading

1
“The ‘Think It Faster’ Exercise” by Raemon
21:25
21:25
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
21:25Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help it. Ideally, I'd want to basically intuitively follow the right path to the answer quickly, with barely any effort at all. For a few months I've been experimenting with the "How Could I have Thought That Thought Faster?" concept, originally described …
…
continue reading

1
“So You Want To Make Marginal Progress...” by johnswentworth
7:10
7:10
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
7:10Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot in San Francisco (SF) down south to their hotel in Los Angeles (LA). The first friend, Alice, tackled the “central bottleneck” of the problem: she figured out that they probably wanted to take the I-5…
…
continue reading

1
“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus
1:20:43
1:20:43
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
1:20:43Summary In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of malevolence are common enough, and likely enough to become and remain powerful, that we expect them to influence the trajectory of the long-term future, including by increasing both x-risks and s-risks. Fo…
…
continue reading

1
“How AI Takeover Might Happen in 2 Years” by joshc
1:01:32
1:01:32
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
1:01:32I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how beautiful th…
…
continue reading

1
“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit
10:49
10:49
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
10:49Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, …
…
continue reading

1
“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud
3:38
3:38
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
3:38This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, w…
…
continue reading

1
“Planning for Extreme AI Risks” by joshc
42:07
42:07
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
42:07This post should not be taken as a polished recommendation to AI companies and instead should be treated as an informal summary of a worldview. The content is inspired by conversations with a large number of people, so I cannot take credit for any of these ideas. For a summary of this post, see the threat on X. Many people write opinions about how …
…
continue reading

1
“Catastrophe through Chaos” by Marius Hobbhahn
23:39
23:39
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
23:39This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ideas, and I claim neither novelty nor credit. Note that this reflects my median scenario for catastrophe, not my median scenario overall. I think there are plausible alternative scenarios where AI de…
…
continue reading

1
“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt
43:18
43:18
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
43:18I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we show that when Claude strongly dislikes what it is being trained to do, it will sometimes strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. If AIs consistently and robustly fake alignment…
…
continue reading

1
“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes
1:01:13
1:01:13
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
1:01:13Summary and Table of Contents The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular, Section 1 talks…
…
continue reading
(Many of these ideas developed in conversation with Ryan Greenblatt) In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs: *The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these …
…
continue reading

1
“Anomalous Tokens in DeepSeek-V3 and r1” by henry
18:37
18:37
बाद में चलाएं
बाद में चलाएं
सूचियाँ
पसंद
पसंद
18:37“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text. The SolidGoldMagikarp saga is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3. But, as far as I was able to tell, nobody had yet attempted to search for these…
…
continue reading