Since the announcement of the launch of the Aquarius Collection, the Synesis One community has been eager to discover what the first 10,000 released words will be. In this article we will uncover that process to give you more insight on the words that were selected.
The team of computational linguists at Mind AI engage in the comprehensive, systematic, and objective study of language. They then draw on these mechanisms to build rules by which a machine can understand and process stretches of language in order to make more meaning out of them. Our team of linguists were behind the selection of the words in the Aquarius collection. In order to select the first words for the Aquarius collection our linguists went through several stages of refinement.
Speakers and linguists agree that the English language is complex and intricate. Much like a native or fluent English speaker, advanced AI machines must be trained to understand and process language to make meaning out of it, not just to learn many words without context. To compile the 10,000-word list, we studied multiple sources of statistically relevant common words such as the Oxford English Corpus, engineering repositories, and of course, frequently used words from search engines and social media.
Common Use Words
The first and most apparent words available were articles, prepositions, and basic verbs: words like to, a, an, the, of, in, be, and have. However, because of their function as a tool in the English language, we pivoted toward more meaning-containing nouns, verbs, adjectives, and adverbs in the subsequent phase. These words brought the list to about 5,000 words.
Specialized Language Words
There are over 170,000 current-use words in the English language; however, native English speakers have an average vocabulary between 20,000 and 35,000 words, and B2 language level fluency only requires knowledge of around 2,500–5,000 words. So once we surpassed 5,000 words on the list, the remaining 5,000 would be comprised of specialized language or come from specialized domains.
We wanted to balance and capture words from multiple disciplines and industries such as biology, chemistry, physics, mathematics, literature, history, and geography, so we selected 5,000 of the most basic concept words from each, completing 10,000 words.
With 10,000 possibilities, which words are you hoping to own?