图 / 文之间的相互转换、TTS/ASR/OCRTTSText-to-Speech文本转语音将文字信息转化为语音输出的技术。ASRAutomatic Speech Recognition自动语音识别将语音信号转化为文字的技术OCROptical Character Recognition光学字符识别将图像或扫描件中的文字转化为可编辑的文本的技术。TTSOpenAI 的 tts-1 模型追求的是生成音频的速度from openai import OpenAI client OpenAI() speech_file_path AI_speech.mp3 response client.audio.speech.create( modeltts-1, voicealloy, inputxxx ) response.stream_to_file(speech_file_path)tts-1-hd追求的是声音质量。ASR自动语音识别ASR是另一个受益于大语言模型发展的领域。# 导入所需的库 import os import cv2 # 视频处理 import base64 # 编码帧 from moviepy.editor import VideoFileClip # 音频处理 VIDEO_FILE Good_Driver.mp4 def extract_frames_and_audio(video_file, interval2): encoded_frames [] file_name, _ os.path.splitext(video_file) video_capture cv2.VideoCapture(video_file) total_frame_count int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT)) frame_rate video_capture.get(cv2.CAP_PROP_FPS) frames_interval int(frame_rate * interval) current_frame 0 # 循环遍历视频并以指定的采样率提取帧 while current_frame total_frame_count - 1: video_capture.set(cv2.CAP_PROP_POS_FRAMES, current_frame) success, frame video_capture.read() if not success: break _, buffer cv2.imencode(.jpg, frame) encoded_frames.append(base64.b64encode(buffer).decode(utf-8)) current_frame frames_interval video_capture.release() # 从视频中提取音频 audio_output f{file_name}.mp3 video_clip VideoFileClip(video_file) video_clip.audio.write_audiofile(audio_output, bitrate32k) video_clip.audio.close() video_clip.close() print(f提取了 {len(encoded_frames)} 帧) print(f音频提取到 {audio_output}) return encoded_frames, audio_output # 每2秒提取1帧采样率 encoded_frames, audio_output extract_frames_and_audio(VIDEO_FILE, interval2)