Draft:Talkie (language model)
Review waiting, please be patient.
This may take 3 months or more, since drafts are reviewed in no specific order. There are 4,918 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
Submission declined on 20 May 2026 by Commandant Quacks-a-lot (talk).
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
This draft has been resubmitted and is currently awaiting re-review. |
Comment: Looks interesting, but you need more independent sources. Commandant Quacks-a-lot (talk) 01:56, 20 May 2026 (UTC)
| Talkie | |
|---|---|
| Original authors | Nick Levine, David Duvenaud, Alec Radford |
| Release | April 2026 |
| Available in | English |
| License | Apache license |
| Website | https://talkie-lm.com/ |
Talkie is a small language model developed by Nick Levine, David Duvenaud, and Alec Radford. It was announced in April 2026 and described by the developers as a vintage language model. Talkie is trained solely on pre-1931 texts that are in the Public Domain,[1] in order to reduce legal issues and liability with releasing model data.[2] The model family consists of a 13 billion parameter model called talkie-1930-13b-base and a post-trained checkpoint designed to power a chat interface, called talkie-1930-13b-it.[3]
Development
[edit]The initial idea was to build a model trained on historical data that could be used to explore whether models can forecast future events: what is predictable, and how far out events can be predicted.[2] The model was also developed in order to study cultural change, and model self-conception.[4][3] Another goal expressed by authors is testing whether language models can arrive at inventions or scientific discoveries. The authors cite a thought experiment proposed by Demis Hassabis, who asked whether a model trained on data up to 1911 could independently discover Albert Einstein's General Relativity theory.[5]
The term vintage language model is attributed to Owain Evans and describes language models trained only on historical text. The purpose of such models is to simulate language use from the past, and to study behavior of models not contaminated by contemporary content. Other vintage models include Ranke 4B, Mr Chatterbox or Machina Mirabilis. Talkie is also inspired by Calcifer Computing’s work on Temporal Language Models, able to represent temporal trends in language.[6]
The model was trained on 260 billion tokens of pre-1931 English text from sources like the Institutional Data Initiative, Common Pile or the Internet Archive. The 31 December 1930 cutoff is based on copyright term rules in the United States, where works published between 1923-1977 are protected for 95 years. The data includes books, newspapers, periodicals, scientific journals, patents, and case law. The developers experimented with various optical character recognition (OCR) methods and developed a dedicated vintage OCR system. The compute needed to train the model was provided by Anthropic.[7]
One of the main challenges with building vintage model is contamination of the model by anachronistic data from beyond the cutoff data. This is typically due to incorrect metadata, or editing notes added to the text. Because of this, talkie is for example aware of Franklin Delano Roosevelt and Adolf Hitler.[2]
A dedicated post-training pipeline was developed in order to fine tune a chatbot based on the base model. For this purpose, only historical structured texts, such as etiquette manuals, letter-writing manuals, cookbooks, dictionaries, and encyclopedias, were used, to avoid contamination of the model. Nevertheless, due to the fact that the model was also fine-tuned through synthetic chats with a Claude Opus model, some anachronisms were introduced.[1]
Reception
[edit]Multiple outlets used the available demo of the Talkie chatbot to test the model's worldview. For example, Decrypt's Jose Antonio Lanz reported on Talkie's analysis of Adolf Hitler.[7] Both the Decoder and The Register noted that the model performs worse than modern counterparts on standard benchmarks like HumanEval. The limitations of the model have been acknowledged by the research team.
Novelist Robin Sloan praised the Talkie project as "a triumph", distinguishing it from earlier vintage language model experiments. He engaged with the Hassabis thought experiment and proposed an alternative, simpler benchmark. Instead of General Relativity, he considered whether the model could arrive at Claude Shannon's insight about mapping electric circuits to Boolean logic. Sloan tested this with Talkie, which denied any such correspondence exists.[8]
Simon Willison describes the base talkie model as a "vegan" model – one trained entirely on licensed or out-of-copyright data.[9] The chat model does not qualify, because of the above mentioned contamination during fine tuning, when proprietary synthetic data generated by Claude Opus was introduced.
References
[edit]- ^ a b "Introducing talkie: a 13B vintage language model from 1930". talkie-lm.com. Retrieved 2026-05-19.
- ^ a b c Roose, Kevin; Newton, Casey; Jones, Whitney; Cohn, Rachel; Pavic, Vjeran; Ramirez, Daniel; Powell, Dan; Lozano, Marion; Niemisto, Rowan (2026-05-01). "OpenAI's Big Reset + A.I. in the Doctor's Office + Talkie, a pre-1930s LLM". The New York Times. ISSN 0362-4331. Retrieved 2026-05-30.
- ^ a b Vigliarolo, Brandon (2026-04-28). "Vintage chatbot lives in the past like an elderly relative". theregister. Retrieved 2026-05-30.
- ^ Bastian, Matthias (2026-04-28). "Here is what an LLM that knows nothing after 1930 thinks our world looks like in 2026". The Decoder. Retrieved 2026-06-02.
- ^ IndiaAI (2026-02-18). AI Research Symposium: The Next Frontiers | Keynotes by Demis Hassabis, Yoshua Bengio & Yann LeCun. Retrieved 2026-05-30 – via YouTube.
- ^ "CalCo". www.calcifercomputing.com. Retrieved 2026-05-19.
- ^ a b Lanz, Decrypt / Jose Antonio (2026-04-29). "This AI Was Trained Only on Pre-1930 Text. We Asked It About Hitler, Stocks, and the Future". Decrypt. Retrieved 2026-05-30.
- ^ "Talkie and Claude (no, the other one)". Robin Sloan. Retrieved 2026-06-25.
- ^ Willison, Simon. "Introducing talkie: a 13B vintage language model from 1930". Simon Willison’s Weblog. Retrieved 2026-06-02.
External links
[edit]- Official website
- Talkie-powered chat interface
- Talkie project repository on GitHub
- Talkie project repository on HuggingFace


- provide significant coverage: discuss the subject in detail, not just brief mentions or routine announcements;
- are reliable: from reputable outlets with editorial oversight;
- are independent: not connected to the subject, such as interviews, press releases, the subject's own website, or sponsored content.
Please add references that meet all three of these criteria. If none exist, the subject is not yet suitable for Wikipedia.