The full version is here.
1) In text-to-video, can you define the character’s face yourself using a photo?
Yes, on many services. But the reliable method is usually not “pure text-only T2V.” It is text + a face reference image (or a few references), or image-seeded video (you start from a “hero still” that shows the correct face).
Why “text-only face control” is unreliable
Text prompts like “a man walking down the street” do not uniquely define a real person’s facial identity. Video makes this harder because the system must keep the same face across many frames while the head turns, lighting changes, and the camera moves. So most practical tools use reference conditioning.
The 3 common ways platforms implement “use my face photo”
A) Reference images inside the video generator (fastest)
- You upload 1–3 reference images.
- You prompt the action and scene.
- The system tries to keep the same character.
Runway is very explicit about this workflow:
- Gen-4 References supports “drag in up to 3 reference images.”
- You can save references and recall them via
@ in your prompt. (Runway Academy)
B) “T2V” that is actually “image + text → video”
Some tools branded “text-to-video” still require a starting image for their best results.
Runway’s Gen-4 Video page states:
- Gen-4 requires an input image
- Turbo costs 5 credits/second
- Gen-4 costs 12 credits/second (Runway)
So in practice you do:
- Make one good “hero still” of your man (face correct).
- Use it as the input image.
- Use text to specify “walking down the street,” camera, style, etc.
C) Subject-reference models (one photo → consistent subject)
These are specifically marketed for identity consistency from one reference.
MiniMax’s Hailuo S2V-01 announcement claims:
- “Generate character-consistent videos from just one reference image.” (MiniMax)
Policy constraint you must plan around
If your face photo depicts a real person, some services restrict uploads.
OpenAI’s Sora documentation states:
- “Uploading images with depictions of real people is blocked.”
- “Characters are the only way to use one’s likeness in Sora” and it requires explicit permission. (OpenAI Help Center)
Its Characters article also emphasizes Characters are not for real people and depicting real people requires explicit consent via the appropriate flow. (OpenAI Help Center)
Bottom line for (1):
Yes, you can usually define the face using your photo, but you should expect to use reference images or an image-seeded workflow, not pure text-only T2V. Runway References and subject-reference models are directly built for this. (Runway Academy)
2) If you have a street photo, can image-to-video make the man walk into a library?
Yes, it is possible. The reliable version is multi-shot plus a keyframed doorway transition, not one continuous “single clip” with a long script.
Why your specific scene is hard
“Street → into a library” contains a doorway transition. Doorways are difficult because the model must change geometry, lighting, and environment semantics while keeping the person stable. The risk is warped doors, melting walls, or identity drift.
The practical limitation: clip length
Most services generate short clips well.
- Luma’s Ray2 FAQ states Ray2 can generate videos up to 10 seconds, and you can use “Extend” to make longer video, but there is a cap at 30 seconds and quality may drop with each extend. (Luma AI)
- Runway Gen-4 Video also frames generation around short clips and requires an input image. (Runway)
So a “walk into a library” sequence is normally built as several clips.
The reliable production pattern
Do it as 3–5 shots, with the doorway as its own shot:
-
Outside walking (5–10s)
Input image = your street photo. Prompt = “begins walking forward, natural gait, stable camera.”
-
Approach the door (5–10s)
Use the last good frame of shot 1 as the next input image. This reduces drift.
-
Doorway transition (3–6s) using keyframes
Luma Keyframes is designed for exactly this:
- Set a start frame and an end frame
- Luma generates the transition between them (Luma AI)
So you set:
- Start frame: man outside at the door
- End frame: man just inside the library
- Inside library continuation (5–10s)
Again, chain from the last good frame.
What “write a script” usually means in these tools
Most SaaS tools do not execute a long screenplay-like script deterministically. You get better results by writing a shot list (a short prompt per clip) and controlling continuity with:
- reference images
- keyframes (start/end frames)
- frame-chaining (use last frame as next input)
Bottom line for (2):
Yes, it is possible. It is most reliable as multiple short I2V clips plus Keyframes for the doorway transition, rather than one long clip. (Luma AI)
3) For 10 × 1-minute videos per month, which online service is better?
For your stated production goal, the best answer is usually not one service. It is a 2-tool workflow:
- Runway for identity-focused character shots (face consistency via references). (Runway Academy)
- Luma for doorway / scene transitions using Keyframes. (Luma AI)
If you must pick one, choose based on what hurts more: identity drift or transition failures.
Background: why “10 × 60s” needs a production mindset
You are not buying “10 generations.” You are buying 60–120 short clip generations plus retries and stitching.
Runway explicitly supports assembling longer films via its in-app editor:
- “How to create longer videos and films” points you to Video Editor Projects. (Runway)
Runway: best when face/character continuity is the priority
Key operational facts from Runway’s docs:
- Turbo = 5 credits/second
- Gen-4 = 12 credits/second
- Gen-4 requires an input image
- Unlimited plans can generate infinite Gen-4 videos in Explore Mode
- Explore Mode does not use credits (Runway)
Runway pricing page also indicates Unlimited includes 2250 credits monthly (for faster Credits Mode and other tools) plus the “unlimited video generations” framing. (Runway)
What Runway is “best at” for your use case
- Keeping one person consistent across multiple short clips, especially when you use References. (Runway Academy)
- Letting you assemble the minute inside the same ecosystem (fewer moving parts). (Runway)
Luma: best when transitions and directed changes are the priority
Key operational facts from Luma’s docs:
- Keyframes = start frame + end frame transitions. (Luma AI)
- Unlimited plan includes 10,000 monthly credits and Unlimited use in Relaxed Mode. (Luma AI)
- Ray2 generation up to 10 seconds, extend cap 30 seconds (quality may drop). (Luma AI)
What Luma is “best at” for your use case
- Forcing a believable “street → library” doorway transition using Keyframes.
- Handling heavy iteration without blowing up costs, if you rely on Relaxed Mode. (Luma AI)
Pika: useful, often cheaper per “operation,” but usually not the backbone for your exact goal
Pika’s official pricing shows “cost per video” by feature and plan credits. (Pika)
Its FAQ states generation length depends on model/feature and that “Pikaframes generations can be up to 25 seconds” (for a specific model). (Pika)
Pika is often best as:
- B-roll
- stylized cutaways
- fast variants
rather than as the single backbone for “consistent identity + doorway transition + 10×60s/month.”
My concrete recommendation for your monthly target
Best overall reliability for your case
- Runway Unlimited for identity shots, using References and Explore Mode for auditions. (Runway Academy)
- Luma Unlimited for Keyframes transitions and high-retry transition shots. (Luma AI)
- Stitch either in Runway’s editor or your normal NLE. (Runway)
If you must choose one service
- Choose Runway if face consistency is your biggest pain. (Runway Academy)
- Choose Luma if the transition and cinematic control is the biggest pain. (Luma AI)
Summary
- Yes, you can define a face using your photo, but it is usually done via reference images or image-seeded video, not pure text-only T2V. Runway References and subject-reference models are built for this. (Runway Academy)
- Yes, “street photo → walk into a library” is possible, but it is most reliable as multi-shot plus Keyframes for the doorway transition. (Luma AI)
- For 10×60s/month, the most reliable setup is Runway for identity + Luma for transitions, using their relaxed/unlimited modes for retries and stitching clips into minutes. (Runway)