The quality of some of the video recordings of PyCon APAC 2014 are really poor (see below examples). Since most of speakers release their slide files under creative common license, we thought we might use these slides to repair the video.
The basic ideal to render the slides as images, and match the video recordings, and then use the rendered slides to replace the poor images in the video recordings.
First, I was thinking about some like feature matching and tried to use ORB to find key points on videos and slides. Unfortunately, the quality of video recordings is so poor, that feature matching does not work, at least not work if we use ORB.
My second attempts is using linear algebra to do template matching. It turns out that CV_TM_CCOEFF_NORMED method works good enough.
However, we need to manually find the position of the slide image in the video recordings. The position depends on speakers laptop and perhaps the video connector. But it seems to be fixed for the entire talk. So we only need to adjust the parameter once for each talk.
With the help of IPython interactive widgets, it is not too difficult to do that manually. And by using SIFT feature matching, we can find the parameters semi-automatically.
All we need to do manually is to find a frame of video recordings and a rendered image of the slides that match each other.
I guess it won't be to hard to make the whole process full automatic, but the semi-automatic tools is good enough for our original purpose.
The followings are the tools we uses.
The following is the outline of our process
- Extract pdf as a series of png images. Althoug there are python modules like wand can do that in python, we shamelessly using shell call to do that with ImageMagick convert.
- Find out the coordinates and size of the slides in the video recording using interactive tool
- convert the images to 256x256 gray scale, using CV_TM_CCOEFF_NORMED to find the matching slide. 256x256 is a quite arbitrary choice, and perhaps more than enough. Perhaps 128x128 should work just fine.
- The cut-off parameter does not need to be to precise. Some value between 0.5 and 0.95 should work most the time.
- Manually put some slides into black list. This is mostly because pdf files does not contain the slides animation that used in the actual talk. If the pdf includes the animation that can be generated into a series of png images, then our algorithm works very well.
- Because don't know how to encode video with audio with OpenCV, So shamelessly call the avconv to do merge the generated video and original audio track.
Followings are some scree shot of the enhanced videos and original videos.
|
Toomore's talk. This recording is the "motivation" of this project. |
|
The text of original recording is unreadable. |
|
Tseng's talk. The quality of the video recording is equally poor. |
|
The final result is quite good, but the slides has been updated, so the match failed for the cover slide. |
|
Cheng-Lung Sung's talk is different, seems like perspective transformation might be needed. |
|
We thought we might need to modified our interactive tools, however, affine transformation works. |
|
Even with recordings with better quality, like the recording of Andy's talk, our tools still enhance the quality significantly. |
|
feature matching in the interactive tool |
The followings are the enhanced videos