Multi-Lingual DALL-E Storytime

Noga Mudrik, Adam S. Charles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Visualizations are a vital tool in the process of education, playing a critical role in helping individuals comprehend and retain information. With the recent advancements in artificial intelligence and automatic visualization tools, such as OpenAI's DALL-E, the ability to generate images based on text prompts has been greatly improved. However, a major drawback of the majority of text-To-image tools is their limited ability to create a series of consecutive coherent frames that tell a story or illustrate a process that changes over time. Rather, they are limited to producing only a few isolated images based on the input prompt. Furthermore, these existing text-To-image tools present an added challenge for populations with limited proficiency in the English language. This serves to widen the educational divide between children from diverse backgrounds and restricts their access to innovative technology. Here, we introduce a DALL-E storytelling framework designed to facilitate the fast and coherent visualization of non-English songs, stories, and biblical texts. Our framework extends the original DALL-E model to handle non-English input and allows users to specify constraints on story elements, such as a specific location or context. The key advantage of our framework over manual editing of DALL-E images is that it offers a more seamless and intuitive experience for the user, as well as automates the process, thus eliminating the time-consuming and technical-expertise-requiring manual editing process. The visualization masks are automatically adjusted to form a coherent story, ensuring that the figures and objects in each frame are consistent and maintain their meaning throughout the visualization, allowing for a much smoother experience for the viewer. Our results demonstrate that our framework is capable of effectively and quickly visualizing stories in a coherent way, conveying changes in the plot over time, and creating a narrative with a consistent style throughout the visualization. By enabling the visualization of non-English texts, our framework helps bridge the gap between populations and promotes equal access to technology and education, particularly for children and individuals who struggle with understanding complex narrative texts, such as fast-paced songs and biblical stories. This has the potential to significantly enhance literacy and foster a deeper understanding of texts.

Original languageEnglish (US)
Title of host publication13th IEEE Integrated STEM Education Conference, ISEC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages326-332
Number of pages7
ISBN (Electronic)9798350300017
DOIs
StatePublished - 2023
Event13th IEEE Integrated STEM Education Conference, ISEC 2023 - Laurel, United States
Duration: Mar 11 2023 → …

Publication series

Name13th IEEE Integrated STEM Education Conference, ISEC 2023

Conference

Conference13th IEEE Integrated STEM Education Conference, ISEC 2023
Country/TerritoryUnited States
CityLaurel
Period3/11/23 → …

Keywords

  • AI
  • diversity
  • education
  • storytelling
  • visualization

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science Applications
  • Software
  • Engineering (miscellaneous)
  • Education

Fingerprint

Dive into the research topics of 'Multi-Lingual DALL-E Storytime'. Together they form a unique fingerprint.

Cite this