Clinamenic

Binary & Tweed
Expensive in terms of compute? I figured I could do it in short batches, still using Collab. I’d imagine I could reconfigure some of the VQGAn+CLIP code, and implement some code someone else has written for converting speech to text, but it would have to be input into the GAN in a real time fashion (or close enough thereto).

Although actually it wouldn’t need to be real time at all. It would just need to be dynamic, I.e having the text input parameter cycle forward with the ongoing speech, if even with a time delay.

That is, we may need each iteration to run on a different sequence of words, in a continuous fashion.
 

william_kent

Well-known member
I'm presuming you'll be reading while reclining in a leather armchair, with electrodes attached to your skull taking EEG readings which provide the noise prompts?
 

Clinamenic

Binary & Tweed
Maybe we could stagger our the portions of text across a few different GAN runs, then line all the mp4s up into a grid and play that in parallel with the speech recording, so the viewer can track the speech in relation to the partitioned GAN runs that are each running different portions of the excerpt.
 

Clinamenic

Binary & Tweed
or I could manually take a page of GR and take a ten word input, do a few iterations of that, then shift the ten-word sequence forward by one word, do another few iterations, and so on and so on.

I would have to calibrate the iteration length to the average time duration per word, in order to sync the visuals to the speech.
 

catalog

Well-known member
I'm not sure it's possible, but it's surely around the corner. It's actually what I what the tree thing to become. I want the ability to look at a tree, speak to it via the phone and on the phone screen comes the image yoh are seeing, with the augmentation you want.

The technical capacity is there, it just needs to be joined up.
 
Top