Generative AI – Beyond Knowledge Innovation

Generative AI is a class of models that creates content from user input. For example, generative AI can create novel images, music compositions, and jokes; it can summarize articles, explain how to perform a task, or edit a photo.

Generative AI can take many kinds of inputs and produce many kinds of outputs—most commonly text, images, audio, video, and code. Some models work with a single modality (e.g., text in → text out), while newer multimodal models can combine inputs (like text + image) and generate one or more outputs (like an edited image plus an explanation).

A simple way to describe generative models is by their input → output format:

Text → Text (chat, summarization, translation)
Text → Image
Text → Video
Text → Code
Text → Speech (text-to-speech)
Speech → Text (speech recognition)
Image → Text (captioning, “describe this image”)
Image → Image (editing, inpainting, style transfer)
Image + Text → Image (instruction-based image editing)
Video → Text (video summarization)
Text + Image → Video (video generation with a reference image)

This “input → output” framing is useful because it quickly communicates what a model can do and what kind of data you need to use it.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31