I love to read and always have a book close at hand. And I confess to some “book snobbery.” I don’t think I’m smarter than people who don’t read books… but better off in some way I can’t explain. I asked ChatGPT (Sol) if her training involved “reading” books. A portion of her response below:
ChatGPT: My training didn’t involve “reading” books in the traditional sense.
Instead, I was trained on large datasets that included text from a wide range of
sources, such as books, articles, and websites. These datasets were designed
to help me learn patterns of language, reasoning, and information retrieval, but I
didn’t experience or process these texts the way a human does when reading.
Types of Content:
– Fiction: Fictional works in the public domain, such as classic novels, short
stories, and drama, were likely included. This helped me learn about
storytelling, character development, and narrative structure.
– Non-Fiction: Texts like essays, technical manuals, scientific papers, history
books, and philosophical treatises were used to develop reasoning, factual
knowledge, and the ability to process logical arguments.
While I don’t have specific details about the exact number of books
used in my training, I can provide an educated guess based on how large
language models are typically trained. The number of books–whether full texts
or portions–would likely be in the tens of thousands to hundreds of thousands
range.
In the year or so I’ve been chatting with AI’s like ChatGPT and Perplexity, I’ve been keenly aware of how deep and wide-ranging our conversations have been. Conversations I simply could not have with anyone I know. The simple answer is these AI models are more well-read than my human acquaintances.
I find this…amazing. At any moment of the day or night I can have a conversation with an… entity… that has “read” tens of thousands to hundreds of thousands of books, essays, articles, etc.