TY - GEN
T1 - Multi-Lingual DALL-E Storytime
AU - Mudrik, Noga
AU - Charles, Adam S.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Visualizations are a vital tool in the process of education, playing a critical role in helping individuals comprehend and retain information. With the recent advancements in artificial intelligence and automatic visualization tools, such as OpenAI's DALL-E, the ability to generate images based on text prompts has been greatly improved. However, a major drawback of the majority of text-To-image tools is their limited ability to create a series of consecutive coherent frames that tell a story or illustrate a process that changes over time. Rather, they are limited to producing only a few isolated images based on the input prompt. Furthermore, these existing text-To-image tools present an added challenge for populations with limited proficiency in the English language. This serves to widen the educational divide between children from diverse backgrounds and restricts their access to innovative technology. Here, we introduce a DALL-E storytelling framework designed to facilitate the fast and coherent visualization of non-English songs, stories, and biblical texts. Our framework extends the original DALL-E model to handle non-English input and allows users to specify constraints on story elements, such as a specific location or context. The key advantage of our framework over manual editing of DALL-E images is that it offers a more seamless and intuitive experience for the user, as well as automates the process, thus eliminating the time-consuming and technical-expertise-requiring manual editing process. The visualization masks are automatically adjusted to form a coherent story, ensuring that the figures and objects in each frame are consistent and maintain their meaning throughout the visualization, allowing for a much smoother experience for the viewer. Our results demonstrate that our framework is capable of effectively and quickly visualizing stories in a coherent way, conveying changes in the plot over time, and creating a narrative with a consistent style throughout the visualization. By enabling the visualization of non-English texts, our framework helps bridge the gap between populations and promotes equal access to technology and education, particularly for children and individuals who struggle with understanding complex narrative texts, such as fast-paced songs and biblical stories. This has the potential to significantly enhance literacy and foster a deeper understanding of texts.
AB - Visualizations are a vital tool in the process of education, playing a critical role in helping individuals comprehend and retain information. With the recent advancements in artificial intelligence and automatic visualization tools, such as OpenAI's DALL-E, the ability to generate images based on text prompts has been greatly improved. However, a major drawback of the majority of text-To-image tools is their limited ability to create a series of consecutive coherent frames that tell a story or illustrate a process that changes over time. Rather, they are limited to producing only a few isolated images based on the input prompt. Furthermore, these existing text-To-image tools present an added challenge for populations with limited proficiency in the English language. This serves to widen the educational divide between children from diverse backgrounds and restricts their access to innovative technology. Here, we introduce a DALL-E storytelling framework designed to facilitate the fast and coherent visualization of non-English songs, stories, and biblical texts. Our framework extends the original DALL-E model to handle non-English input and allows users to specify constraints on story elements, such as a specific location or context. The key advantage of our framework over manual editing of DALL-E images is that it offers a more seamless and intuitive experience for the user, as well as automates the process, thus eliminating the time-consuming and technical-expertise-requiring manual editing process. The visualization masks are automatically adjusted to form a coherent story, ensuring that the figures and objects in each frame are consistent and maintain their meaning throughout the visualization, allowing for a much smoother experience for the viewer. Our results demonstrate that our framework is capable of effectively and quickly visualizing stories in a coherent way, conveying changes in the plot over time, and creating a narrative with a consistent style throughout the visualization. By enabling the visualization of non-English texts, our framework helps bridge the gap between populations and promotes equal access to technology and education, particularly for children and individuals who struggle with understanding complex narrative texts, such as fast-paced songs and biblical stories. This has the potential to significantly enhance literacy and foster a deeper understanding of texts.
KW - AI
KW - diversity
KW - education
KW - storytelling
KW - visualization
UR - http://www.scopus.com/inward/record.url?scp=85184857120&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184857120&partnerID=8YFLogxK
U2 - 10.1109/ISEC57711.2023.10402311
DO - 10.1109/ISEC57711.2023.10402311
M3 - Conference contribution
AN - SCOPUS:85184857120
T3 - 13th IEEE Integrated STEM Education Conference, ISEC 2023
SP - 326
EP - 332
BT - 13th IEEE Integrated STEM Education Conference, ISEC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th IEEE Integrated STEM Education Conference, ISEC 2023
Y2 - 11 March 2023
ER -