Notes
March 2026
依循良知而生活是否仍有可能? (2026-03-08 15:31)
周濂:
作为犹太人的后裔、曾经的德国公民、纳粹的直接受害者和流亡者,阿伦特震惊于亲眼目睹和亲身经历的世纪浩劫,她有太多的困惑和不解,所以她才会说:“我想要理解(Ich will verstehen)。”不妨把这个说法和亚里士多德“人天生求知识”(all men by nature desire to know)做一比较,初看起来两个说法相差不远,但仔细揣摩就会发现,亚里士多德是站在全人类的立场做出的论断,他更强调理解的“目标”,也即静态意义的“知识”,而阿伦特则是从第一人称单数“我”的视角出发,她更看重思考的“过程”,也即理解本身。更重要的是,亚里士多德认为求知是一种“欲望”,而阿伦特强调理解是一种“意志”,它不是出于“本能”而是出于“决定”。
常人并不想要理解,他们大多不求甚解,要么因为急于获得确定的答案而盲从权威,要么因为得不到确定的答案而索性放弃理解。常人不想要理解,首先不是因为智商不够,而是因为缺乏思考的意志。
[…] More
Note (2026-03-02 19:39)
Note (2026-03-01 19:51)
Note (2026-03-01 17:06)
February 2026
Next-Token Predictor Is An AI’s Job, Not Its Species (2026-02-28 13:30)
“Next-Token Predictor Is An AI’s Job, Not Its Species”:
On the levels where AI is a next-token predictor, you are also a next-token (technically: next-sense-datum) predictor. On the levels where you’re not a next-token predictor, AI isn’t one either.
[E]volution can’t encode everything important in the genome. […] Instead, evolution gives us algorithms that let us learn from experience.
[T]he brain organizes itself/learns things by constantly trying to predict the next sense-datum, then updating synaptic weights towards whatever form would have predicted the next sense-datum most efficiently. This is a very close (not exact) analogue to the next-token prediction of AI.
On the outermost level, humans were designed by a process optimizing for survival, sex, and reproduction. The humans that survived were those that had sex and reproduced. Everything about humans is downstream of what helped with sex and reproduction. But that doesn’t mean that any particular thought that you think involves reproduction or sex.
[E]ven though an AI was shaped by next-token prediction, the inside of its thoughts doesn’t look like next-token prediction. In the abstract, it probably looks like a world-model, the same as yours.
Next-token prediction created this system, but the system itself can involve arbitrary choices about how to represent and manipulate data.
The most compelling analogy: this is like expecting humans to be “just survival-and-reproduction machines” because survival and reproduction were the optimization criteria in our evolutionary history. There is, of course, some sense in which we are just survival-and-reproduction machines: we don’t have any faculties that can’t be explained through their effects on survival and reproduction. But this doesn’t mean we “don’t really think” or “don’t really understand” because we’re “really just trying to have sex” when we work on a math problem.
The Trial of Gisèle Pelicot’s Rapists United France and Fractured Her Family (2026-02-27 23:56)
“The Trial of Gisèle Pelicot’s Rapists United France and Fractured Her Family”:
Note: For more context, see, e.g., Pelicot rape case on Wikipedia and the NYT interview.
“Keep going, hanging on, putting on a brave face was all I knew how to do, and it was what I wished for my daughter too,” Gisèle recalled. But Caroline found her mother’s approach alienating—a “protection mechanism for her,” she wrote later, “but one I won’t be able to tolerate.”
Two psychiatrists reasoned that Dominique’s crimes were possible because he was “splitting.” “This split allows two contradictory personalities to coexist without conflict,” one wrote. “When M. Pelicot operates in one mode, he is unaware of the other.” The second psychiatrist proposed that Gisèle had not sensed Dominique’s other side because “we split with the splitter, so to speak.” We cordon off the parts of our lives that don’t fit the story we believe we are living.
Trauma often leaves people feeling like spectators to the harms done to them, but for Gisèle, who had been unconscious, her trauma occupied an even more elusive category of experience. She knew that there were videos of her being raped, but she didn’t want to construct new memories by watching them.
Nearly all the other defendants denied committing a crime. “As long as the man is there, giving me instructions, it’s not rape,” a construction supervisor said. A truck driver proposed that “once a woman is wet, it means she’s not saying no.” A gardener explained that he had penetrated Gisèle “out of politeness, to reciprocate the hospitality of the host.” While the defendants shirked responsibility, some of their wives tried to take the blame. One woman said that, owing to a complicated pregnancy, she’d refused to have sex with her husband. “The tragedy must have occurred at that time,” she offered.
[…] More
Metannoying (2026-02-27 22:52)
“Metannoying”:
[W]hy do LLMs do [the tiresome “not X but Y” formulation]? Because of limitations on how models represent meaning. In vector space models, word meaning is defined by distributional context. Synonyms have high cosine similarity because they appear in similar sentences. Antonyms also have high cosine similarity, because they appear in identical sentences. “I like hot coffee” and “I like cold coffee” occupy the same distributional space. The models see that hot and cold are mathematically close. They do not inherently compute the oppositeness relation. One way to understand the “not X but Y” construction is as a workaround for the model’s inability to compute opposition the way humans do. By explicitly stating both the rejected term and the replacement, the model externalizes onto the page an operation it cannot perform internally.
The “corrective contrast” construction reduces ambiguity in the output space. Users want clarity. “Not X but Y” to the LLM is an insurance policy on clarity.
Luckily for the designers of LLMs, corrective contrast also sounds cool, memorable, and often profound, at least in moderation.
Classical rhetoric had a name for the deliberate version: metanoia, or correctio, the performed self-correction where a speaker revises mid-sentence to find the more precise or more forceful formulation. When Brutus tells the Roman crowd “Not that I loved Caesar less, but that I loved Rome more,” the audience holds “loved Caesar less” and suppresses the idea to receive the reframe. The delay and the cognitive cost is the point. Shakespeare knows the negated proposition will linger as a kind of understatement that makes the correction feel like an escalation.
But LLMs are not Shakespeare (yet) and there’s no rhetorical reason for it and worse, there’s no limiting function, which is why you can get “not x but y” every other paragraph. LLMs are corrective-contrast-maxxing for maximum comprehension across the widest possible readership.
[…] More
How will OpenAI compete? (2026-02-24 17:04)
[W]hen you’re head of product at an AI lab, you don’t control your roadmap. You have very limited ability to set product strategy. You open your email in the morning and discover that the labs have worked something out, and your job is to turn that into a button. The strategy happens somewhere else. But where?
[M]ost people don’t see the differences between model personality and emphasis that you might see, and most people aren’t benefiting from ‘memory’ or the other features that the product teams at each company copy from each other in the hope of building stickiness (and memory is stickiness, not a network effect). Meanwhile, usage data from a larger (for now) user base itself might be an advantage, but how big an advantage, if 80% of users are only using this a couple of times a week at most?
[T]here’s a recurring fallacy in tech that you can abstract many different complex products into a simple standard interface - you could call this the ‘widget fallacy’. A decade ago people said ‘APIs are the new BD’, which was really the same concept, and it mostly failed. This is partly because there’s a huge gap between what looks cool in demos and all of the work and thought in the interaction models and the workflows in the actual product: very quickly you’ll run into an exception case and you’ll need the actual product UI and a human decision. It’s also because the incentives are misaligned: no-one wants to be someone else’s dumb API call, so there’s an inherent tension or trade-off between the distribution that an abstraction layer might give you (Google Shopping, Facebook shopping, and now ChatGPT shopping) and your desire to control the experience and the customer relationship. […]
[T]he second problem is that if these are all separate systems plugged together by abstracted and automated APIs, is the user or developer locked into any one of them? If apps in the chatbot feed work, and OpenAI uses one standard and Gemini uses another, why stops a developer doing both? This is much less code than making both an iOS and Android app, and anyway, can’t you get the AI to write the code for you? What does that do to developer lock-ins?
[…] More
Anthropic announces proof of distillation at scale by MiniMax, DeepSeek, Moonshot (2026-02-24 06:53)
NitpickLawyer, in response to Anthropic’s allegation that several Chinese AI labs has been “distilling,” i.e., massively and abusively learning from the Claude’s outputs:
Anthropic have been the loudest in pushing for regulatory capture, often citing “muh security” as FUD. People should care what they write on this topic, because they’re not writing for us, they’re writing for “the regulators”. Member when the usgov placed a dude in solitary confinement because they thought he could launch nukes with a whistle? Yeah... Let’s hope they don’t do some cray cray stuff with open LLMs.
Anthropic make amazing coding models, kudos for that. But they should be mocked for any communication like the one linked. Boo-hoo. Deal with it, or don’t, I don’t care. No one will feel for you. What goes around, comes around. Etc.
bigyabai, concurring:
Administratively, Anthropic seems to misunderstand politics. You don’t get to wear the “people’s champion” and “government sweetheart” hats at the same time, when push comes to shove you’ll be forced to pick a lane. We saw it with Microsoft, we saw it with Apple and Google, and now we’re seeing it with OpenAI too. You can’t drive down both paths at the same time.
As a member of the target audience for Claude, their messaging just leaves me confused. Are you a renegade success, or do you need the government’s help? Are you a populist juggernaut, or do you hide from competition? OpenAI, for all their myriad issues, understood this from the start and stuck to the blithely profitable federal ass-kisser route.
Note (2026-02-23 06:52)
Note (2026-02-23 06:51)
“Child’s Play,” sarcarstically:
The future will belong to people with a very specific combination of personality traits and psychosexual neuroses. An AI might be able to code faster than you, but there is one advantage that humans still have. It’s called agency, or being highly agentic. The highly agentic are people who just do things. They don’t timidly wait for permission or consensus; they drive like bulldozers through whatever’s in their way. When they see something that could be changed in the world, they don’t write a lengthy critique—they change it. AIs are not capable of accessing whatever unpleasant childhood experience it is that gives you this hunger. Agency is now the most valuable commodity in Silicon Valley. In tech interviews, it’s common for candidates to be asked whether they’re “mimetic” or “agentic.” You do not want to say mimetic. Once, San Francisco drew in runaway children, artists, and freaks; today it’s an enormous magnet for highly agentic young men.
A note to RSS subscribers (2026-02-20 00:14)
Some of you may have noticed that I’ve recently been experimenting with the new “Notes” and “Gallery” sections. As they settle into place, I’m now merging new posts in these sections into the main feed at https://hsu.cy/feed.xml.
No action is needed to see these updates. If you prefer to receive updates only on longer posts, please point your feed reader to https://hsu.cy/posts/feed.xml instead. I apologize for any inconvenience this change may have caused.
Note (2026-02-19 23:15)
张潇雨, in 《得意忘形》Ep. 70 (edited for clarity):
我们的基因设定就是要在生活中寻找缺的东西,否则就难受,觉得这跟死了一样。当你没有欲望的时候,很多人第一反应是恐惧,害怕自己变成没有欲望的人。这不仅仅是被社会抛弃的问题,而是接近于「死」本身。我们需要靠不断追求东西来维持「活着」的想象。脑中有念头、有动力、去追求、去补足,才觉得这是生的表现。所以刚才你说「不断完善自己」,我没纠正你,但我可以告诉你:没有什么东西可完善。当你去完善的时候,永远没法完善。是谁告诉你不够的?是同一个声音告诉你不够,又告诉你去完善。当你把这声音拿掉,你就是完整的,就是幸福本身。
我这一两年在练习一个东西,叫「无我的行动」。我们回到一个特别基础的问题。举个例子,比如你晚上吃饭,点好外卖后有两个选择:歇会儿睡觉等外卖,或者刷个 B 站。后来你选了其中一个。外卖来了你吃饭。这是非常正常的人类日常行为。是谁在进行这些选择?
[…] More
Note (2026-02-19 22:29)
许哲, in 《得意忘形》Ep. 69 (edited for clarity):
这个世界的本质就是无常(anitya)。所以你追求秩序,本质上是跟在这个世界为敌,没有任何意义。你要做的事情是在这个无常的世界中,你怎么去面对无常,而不是试图把这个世界变得「有常」。
无常、无我是世界的基本属性,这不是你能改变的。我们能做的事情,只能是改变自己面对它的心态。佛陀教导的 yathabhuta 就是如实地看见它,nana dassana 就是真实地去看见它。首先你得有一个理念,就是这个世界是 anatta(无我)的,不是「我的」。它是苦、空、无常的,不受你控制。它是此有故彼有、此无故彼无的,是由因果链条决定的。其实这件事情发生是因为有它的因,所以才有它的果。它是你没有办法预测的,是一直在变化的。你的认知只是这个宇宙变化的一小部分,所以你要说「哎,我让这个东西变得有 order」,这是妄想,是 avijja(无明)。
[…] More
Note (2026-02-19 21:03)
“Addicted to Love? The Trendy Diagnosis Is Changing Our Idea of Romance.”:
People weren’t just using the notion of love addiction to talk about destructive, obsessive romantic patterns. They were using it to mount a fascinating rebellion against the narrative that love is the pinnacle of human experience.
On a website for Love Addicts Anonymous […], you can find “40 Statements” with which you might identify. No. 37 [is] “Love is the most important thing in the world to you.” This one strikes me as a question about values. If I say that love is the most important thing in the world for me — that I value it above all else — have I inched further down a spectrum of addiction? Or have I just decided to value something that countless poets and prophets all said was the noblest human experience?
[S]elf-diagnosis has its pitfalls, especially when it comes to love, which is not inherently harmful and can’t be quantified the way cocktails can. There’s an element of contagion: People can read online posts, recognize something of themselves and feel they’ve discovered exactly what is wrong.
Often, these patterns and experiences seem like the ordinary messiness of romance, the pain and yearning and confusion that have, over the centuries, been seen as part of love’s power. Looking at them through the lens of addiction means pathologizing them, treating them as symptoms of a disorder. As we do so, we redefine love itself. It is no longer something that should remake us or endure “even to the edge of doom”; that would be unhealthy. Much of what we’ve been led to expect from love, this point of view suggests, is in fact toxic or deluded.
But if we did away with old visions of romantic love, what would replace them? This is in some sense the question the man on the forum was asking about his wife: If what he experienced in marriage was toxic, then what came next?
[…] More
Note (2026-02-19 13:30)
But the main thing we learned from debate was that there is a foundational grammar, a skeleton of syntax beneath the superficies of semantics. Debate was the first place, if I may be forgiven for thinking of it as a kind of terrain, where I discovered not only the satisfaction but also the sanctity of a game with rules that remained invariant. I would go so far as to say that debate afforded me my first intimation of justice.
But another effect of the invention of jargon is ossification. When Hannah Arendt writes that the purpose of thinking is to “unfreeze” concepts that have been hardened into familiarity, part of what she means is that to grasp them is to break through the lacquer of familiar rhetoric and into the oozing center, to eschew the shortcut in favor of the longer, more tortuous route.
when people tell me they don’t miss any part of high school—don’t miss the gorgeously guileless little idiots they were when they were sixteen and unashamed to love embarrassments like debate—I do not believe them. Things were as fresh then as if they had been cut out of bright paper, sharp against the hazy future. Episodes in my adult life, even the seemingly major ones, seem dull in comparison. Now there is a sheaf of hesitance interposed between me and everything else, and no doubt this layer of remove is what makes me bearable, to the extent that I am bearable. But there was no barrier then, and even trivialities had a kind of solidity or vitality to them of which they have since been drained.
Certainly when I was debating I often succumbed to a somatic force, though it was somatic in that special way that running or sex or, I imagine, bodily mortification is somatic—so excruciatingly and exquisitely physical that its physicality dissolves into spirituality, like sugar into water.
[…] More
Note (2026-02-18 20:30)
“Python Package Managers: Uv vs Pixi?”:
Python is a special language where it’s extremely popular to write libraries of code in compiled languages like C, C++, Fortran or Rust and bind them into Python. While Python is a relatively slow language it can call into these fast compiled dependencies and use them in the same way it can use Python dependencies. Many languages can do this, but this practice has taken off hugely in the Python community because it allows users to trade off performant code with a friendly and flexible programming language and often get the best of both worlds.
One big problem with pip in the early days was that it only handled source distributions. This means it could download a gzip file of source code and put it in the right place, call some hooks that was it.
The conda package manager handles a different kind of package. While you can still put pure Python code into a conda package you can also include pre-compiled binaries. When you build a conda package you run the compiler for all the common operating systems you expect it to run on, Windows, Linux, macOS and the common CPU architectures like x86 and ARM. This is a lot more work for the developers to build all these packages, but it hugely simplifies things for the end user as conda can just download the right binaries for their system without needing to compile anything.
Another thing conda does differently is it can look at your computer and find things that have been installed by other means through virtual packages. Nearly all compiled code depends on core libraries like glibc or musl which are included with the operating system, conda can figure out what versions of these packages you have and then include that in it’s package dependency solve. This has been especially useful in the CUDA Python ecosystem where all Python CUDA packages depend on specific NVIDIA GPU driver and CUDA versions.
[…] More
Note (2026-02-18 13:27)
《比特城里的陌生人》(2007):
美国文学批评家莱昂内尔·特里林认为,到十九世纪末叶,人们经历了从诚挚性(sincerity)到本真性(authenticity)的变化。诚挚性,说的是对个人的一种期待:他和别人交往时应该避免表里不一,在公开场合所暴露的东西要同私下里感受到的东西相一致,但并不是把什么东西都拿出来公布。而本真性则意味着,不是对别人诚实而是对自己诚实。在这种情况下,人们可以向陌生人坦白内心最隐秘的想法,而不必为此感到内疚。前者要求,披露的事情必须是真的;后者要求,只要是自身的深切感受,什么事情都可以披露。
“裸露的人”(naked man)第一次出现了。如果说,诚挚性的年代的座右铭来自德尔斐神庙:认识你自己,那么,本真性的年代的座右铭来自心理治疗师:成为你自己。
[…] More
Note (2026-02-18 11:49)
“The Case for Software Criticism”:
But software criticism is not the same as technology criticism. A work of software criticism is to Nicholas Carr’s “Is Google Making Us Stupid?” what a New York Times book review is to Virginia Woolf’s “Modern Fiction.” The latter is a more synoptic assessment of the field while the former—in theory, at least, if it existed—is a focused interrogation of a single work.
But perhaps that’s why software criticism is needed more than ever in the midst of the brinkmanship between the two worlds. Software criticism may be one of the ways to inch toward an armistice. In the demonology of some media outlets, “software engineer” occupies the same rank as “investment banker,” and in certain circles in the Bay Area, the word “journalist” is uttered like a slur. But that both sides are engaged in a shady enterprise is a corrosive belief.
And surely we can use some exciting prose! Burn that copy of On Writing Well and help yourself with some Nabokov soup. Exorcize the kind of homogenizing language that abound in the rationalist blogosphere written by Scott Alexander wannabes and avoid sounding as if the text were generated by a language model trained on VC tweets. Self-medicate with William H. Gass, luxuriate in Lydia Davis, mainline on Martin Amis, hallucinate with Geoff Dyer, get drunk on Peter Schjeldahl, and detoxify with the sobering yet adrenalizing prose of Parul Sehgal. Anything goes. Well, everything except the Zinsser-ized, over-sanitized—hence sterilized—technical prose, because we aren’t writing a damn README here.
So if grape juice and cars and buildings merit critical analysis for their complexity and design, shouldn’t a piece of modern software qualify as an object of criticism too?
The critic will anatomize the subject from several angles. Befitting the hybrid artifact that is software, the critic will adopt disciplinary anarchy, toggling between the commonsensical to the technical to the historical to the philosophical.
[…] More
Note (2026-02-18 11:29)
Note (2026-02-15 23:02)
Note (2026-02-15 10:15)
Note (2026-02-14 22:05)
sometimes you just find yourself having bought a ticket to a weird show despite obvious red flags and decide to squeeze your eyes shut and make the best of it.
Note (2026-02-11 14:37)
what did i do to deserve this
Hong Kong and British Culture, 1945–97 (2026-02-07 15:14)
Hampton, Mark. Hong Kong and British Culture, 1945–97. Hong Kong University Press, 2024.
香港的回歸,反而為英國提供了一個展示國家成就的機會:包括在香港建立法治、市場經濟以及高效率和無貪腐的政府——這些成就甚至被延伸為整個英國帝國成就的象徵。
The Handover, rather, provided an occasion to reflect upon British accomplishment in establishing markets; rule of law; and effective, corruption-free government: properties that could be extended rhetorically to the entire British Empire.
英國在香港直到 1980 年代初才真正受到挑戰的持續管治,卻為所謂英國的德治提供了一個正面的展示。
Britain’s continuing management of Hong Kong, decisively challenged only in the early 1980s, offered a site in which supposed British virtues could be more positively showcased.
戰後早期曾有這樣的論點:儘管中國可以隨時收回香港,但卻似乎不太會在短期內有此舉動。而這觀點在 1980 年代與 1990 年代也得到迴響:就是雖然中國最終也會收回香港,但出於利益的考慮,中國是不會對香港進行激烈變革的。
[…] More








