SPE Media Lib
SPE Media Lib
Hello ,
Im a norwegian guy hanging in #PS3Dev and #gentoo-ppc64 on irc.freenode.net
I am nearly finished(it works but is not released) with a colorspace converter YV420p ->ARGB.. (more or less the same as YV12->ARGB)
That runs on a spe at more than 60FPS for 1920x1080.
next logical steps is up/down scaling.. then maybe some extra filtering and decoding..
so heres my plan..
We create a SPU Media Lib project here on PS2Dev.org where we define inputs outputs (make reference project on sourceforge). And standards on locations etc etc of binaries. How do handshake and communicate between all the spe's running then we add subprojects of neccesary spu's as we se fit as the lib increases in size.
All spu code needs to be 64 and 32 bit ul compatible..
So please help me create the project and help me write the code..
Thanks
Kristian
Im a norwegian guy hanging in #PS3Dev and #gentoo-ppc64 on irc.freenode.net
I am nearly finished(it works but is not released) with a colorspace converter YV420p ->ARGB.. (more or less the same as YV12->ARGB)
That runs on a spe at more than 60FPS for 1920x1080.
next logical steps is up/down scaling.. then maybe some extra filtering and decoding..
so heres my plan..
We create a SPU Media Lib project here on PS2Dev.org where we define inputs outputs (make reference project on sourceforge). And standards on locations etc etc of binaries. How do handshake and communicate between all the spe's running then we add subprojects of neccesary spu's as we se fit as the lib increases in size.
All spu code needs to be 64 and 32 bit ul compatible..
So please help me create the project and help me write the code..
Thanks
Kristian
Don't do it alone.
-
- Site Admin
- Posts: 347
- Joined: Sat Jan 17, 2004 9:49 am
- Location: Melbourne, Australia
- Contact:
Sounds like a good project. If you would like to host the code at the subversion repository here (svn.ps2dev.org), then please send me a private message with the userid/password you would like and I will create an account for you.
The same goes for anyone else with project ideas for the ps3. The very few rules I have for subversion access are listed at:
http://ps2dev.org/Site_Information/Subversion
David. aka Oobles.
The same goes for anyone else with project ideas for the ps3. The very few rules I have for subversion access are listed at:
http://ps2dev.org/Site_Information/Subversion
David. aka Oobles.
project is up at
http://wiki.ps2dev.org/ps3:spu-medialib
svn
http://svn.pspdev.org/listing.php?repna ... rev=0&sc=0
http://wiki.ps2dev.org/ps3:spu-medialib
svn
http://svn.pspdev.org/listing.php?repna ... rev=0&sc=0
Don't do it alone.
very very nice one unsolo :)
Now, in order to benefit from this in the greatest number of applications without having to tweak each one of them for the PS3, I think the best thing would be to implement this in something like SDL or DirectFBor any other similar media layer (ggi? xv via a custom ps3fb based custom X server?). The idea being to provide transparent SPE based hardware acceleration and vsync support for any app using those backends.
What do you guys think?
Now, in order to benefit from this in the greatest number of applications without having to tweak each one of them for the PS3, I think the best thing would be to implement this in something like SDL or DirectFBor any other similar media layer (ggi? xv via a custom ps3fb based custom X server?). The idea being to provide transparent SPE based hardware acceleration and vsync support for any app using those backends.
What do you guys think?
I think too it is the best option:jimparis wrote:I think making an SPU-accelerated Xv driver with XvMC could be a good place to do it.
- SDL has support for xv
- mplayer lib vo can use xv.
However I don't know how easy (or difficult) it is to add xv into the X server (can it be done without touching the server or does it have to be put into it?).
Some applications (mythtv) have an option to use OpenGL to do the vsync. But from what I can gather, Xv drivers are already supposed to handle vsync internally. For example, if you look at the i810_video.c source in the intel X video driver, it does double buffering inside I810DisplaySurface() and waits for vsync before flipping:mbf wrote:After posting this yesterday, I did some digging on xv but I hit a problem with vblank sync. Not easily done under X it seems. Anyone got an idea on how to do this properly? Does MPlayer or VLC use vsinc in conjunction with xv and how?
Code: Select all
/* wait for the last rendered buffer to be flipped in */
while (((INREG(DOV0STA)&0x00100000)>>20) != pI810Priv->currentBuf) {
if(loops == 200000) {
xf86DrvMsg(pScrn->scrnIndex, X_INFO, "Overlay Lockup\n");
break;
}
loops++;
}
/* buffer swap */
if (pI810Priv->currentBuf == 0)
pI810Priv->currentBuf = 1;
else
pI810Priv->currentBuf = 0;
I810ResetVideo(pScrn);
I810DisplayVideo(pScrn, surface->id, surface->width, surface->height,
surface->pitches[0], x1, y1, x2, y2, &dstBox,
src_w, src_h, drw_w, drw_h);
Well, you'll need to write a new Xv-capable display driver, but with the new modular X.org, that no longer involves rebuilding the whole X tree.ldesnogu wrote:However I don't know how easy (or difficult) it is to add xv into the X server (can it be done without touching the server or does it have to be put into it?).
-
- Posts: 339
- Joined: Thu Sep 29, 2005 4:19 pm
@unsolo a.o.
See:
http://lists.mplayerhq.hu/pipermail/ffm ... 28757.html
I can imagine we do something similar with a bunch of interested ps2dev devs.
See:
http://lists.mplayerhq.hu/pipermail/ffm ... 28757.html
I can imagine we do something similar with a bunch of interested ps2dev devs.
ac3/a52
Also, I noticed on the wiki that a52 development was being looked at. In actual fact, the PS3 implements a regular ALSA driver which supports A52 passthrough, so surround sound from DVDs should work as-is. You don't need to do any decoding, just pass the bitstream straight through.
As part of my python library, I'm looking at writing an SPE sound system, with several goals. Primarily standard sound effects and MP3 decoding for 2-channels but also porting some of liba52 to the SPU (or re-implementing completely) so that I can have spatially located sound effects for those with DTS amps. I'll do my best to keep that part of the library usable from C too!
As part of my python library, I'm looking at writing an SPE sound system, with several goals. Primarily standard sound effects and MP3 decoding for 2-channels but also porting some of liba52 to the SPU (or re-implementing completely) so that I can have spatially located sound effects for those with DTS amps. I'll do my best to keep that part of the library usable from C too!
The latest ADDOn of CELL now having a document called "Cell Programming Primer", which have a section contain a sample program of rgb2y using SPE (Chapter 3 Basics of SPE Programming).
For those who want to learn more about how to use SPE, check that out and you will find it is really informative.
Cell Programming Primer
For those who want to learn more about how to use SPE, check that out and you will find it is really informative.
Cell Programming Primer
nice find laichung :)
@jockyw2001: that was the point of my initial question. Better optimize the lower layers of the OS in order to improve performance for a broader range of applications in one single go. IMHO. However, optimizing MPlayer directly would certainly be more straightforward and fit the needs of most. I'm game for it anyway :)
@ralferoo: do you mean that there is no need to decode AC3/DTS, whatever audio system your PS3 outputs to? So far whith all distros and kernels I tried, the ALSA driver sucked big time for standard stereo output, only cracks and hisses.
@digihoe: that's doable, but it won't work without optimizing MEncoder or x264 specifically for the CellBE.
@jockyw2001: that was the point of my initial question. Better optimize the lower layers of the OS in order to improve performance for a broader range of applications in one single go. IMHO. However, optimizing MPlayer directly would certainly be more straightforward and fit the needs of most. I'm game for it anyway :)
@ralferoo: do you mean that there is no need to decode AC3/DTS, whatever audio system your PS3 outputs to? So far whith all distros and kernels I tried, the ALSA driver sucked big time for standard stereo output, only cracks and hisses.
@digihoe: that's doable, but it won't work without optimizing MEncoder or x264 specifically for the CellBE.
I used to have FC5 installed which worked OK playing WAV files with aplay. I've manually installed a base version of Ubuntu 7.04 which also seems to work fine with both aplay and mpg123.mbf wrote:@ralferoo: do you mean that there is no need to decode AC3/DTS, whatever audio system your PS3 outputs to? So far whith all distros and kernels I tried, the ALSA driver sucked big time for standard stereo output, only cracks and hisses.
There is talk on the forums about cracks and hisses, but I haven't heard any evidence of it myself. So far, all my tests have been up to about 3 minutes as that's how long the MP3s I've tried are.
I might be wrong about the DTS passthrough as "aplay -l" doesn't list an iec958 device, although I was pretty sure I read someone had got it working. This also suggests it doesn't work:
Code: Select all
root@ps3:~# aplay -Dspdif ~ralf/test.dts
ALSA lib pcm.c:2145:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
aplay: main:550: audio open error: No such file or directory
Well, I've done some digging and DTS pass-through is definitely not supported by the current kernel. However...ralferoo wrote:I might be wrong about the DTS passthrough as "aplay -l" doesn't list an iec958 device ... There's still some hope though because it's possible for most amps to recognise a DTS bitstream even without the "None PCM data" option set in the stream. I'll let you know how I get on...
In sound/ppc/snd_ps3_reg.h we see lots of internal hardware definitions including
Code: Select all
S/PDIF Audio Output Channel Channel Status Setting Registers.
Configures channel status bit settings for each block (192 bits).
Output is performed from the MSB(AO_SPDCS0 register bit 31).
The same value is added for subframes within the same frame.
So, whilst the current kernel driver doesn't support this, it's feasible that we could implement this in the future and without requiring a hypervisor fix from Sony.
See http://www.hardwarebook.info/S/PDIF for more about SPDIF if you're interested.
Alignment?
Unsolo, I've started working on a proof of concept clone of ffplay that uses the spu media lib. While browsing through your code, I noticed that you align the memory allocations to 128 bytes boundaries.... For both memalign() and __attribute__ ((aligned(xy))), the alignment value is in bytes, not bits. DMA transfers require you to align to 128 bits (16 bytes) boundaries, so the memalign calls should be changed to memalign(16,xyz).
Edit: looks like "minimum requirement" doesn't mean "best performance". The CBE Architecture reference document states that:
So basically, the choice depends which one is faster: the DMA transfer or the actual data processing AND how significant is the loss of available memory due to fragmentation (with large alignments).
Edit: looks like "minimum requirement" doesn't mean "best performance". The CBE Architecture reference document states that:
Fair enough!For optimal performance of transfers of 128 bytes or more, the source and destination transfer addresses
should be 128-byte aligned (bits 25 through 31 set to 0).
So basically, the choice depends which one is faster: the DMA transfer or the actual data processing AND how significant is the loss of available memory due to fragmentation (with large alignments).
Re: Alignment?
Yes, that 128 bytes comes from L2 cache line sizes.mbf wrote:Edit: looks like "minimum requirement" doesn't mean "best performance". The CBE Architecture reference document states that:Fair enough!For optimal performance of transfers of 128 bytes or more, the source and destination transfer addresses
should be 128-byte aligned (bits 25 through 31 set to 0).
I *guess* it would be enough to align small dynamically allocated memory chunks to the hardware requirements (which depends on DMA packet size) and big chunks to the fastest requirement (128 bytes).So basically, the choice depends which one is faster: the DMA transfer or the actual data processing AND how significant is the loss of available memory due to fragmentation (with large alignments).
The rationale is that anyway for small DMA transfers a significant proportion of time is lost in the setup of the transfer, so a few cycles lost probably matters less than the fragmentation of memory.
One should also take care of allocating in the right order to minimize holes of unallocatable memory :)
Re: Alignment?
Yes, forgot to mention the cache line size.
Question: YUV->RGB conversion then RGB to RGB scaling, or YUV to YUV scaling first then YUV to RGB conversion.... or YUV to RGB and scaling at the same time? Which would be the fastest? YUV scaling first would seem to be the fastest since there is less data to process and scaling the most CPU intensive step, but that's only a guesstimate and I haven't benchmarked it yet.
I might think about this, well sometime :Pldesnogu wrote: One should also take care of allocating in the right order to minimize holes of unallocatable memory :)
Question: YUV->RGB conversion then RGB to RGB scaling, or YUV to YUV scaling first then YUV to RGB conversion.... or YUV to RGB and scaling at the same time? Which would be the fastest? YUV scaling first would seem to be the fastest since there is less data to process and scaling the most CPU intensive step, but that's only a guesstimate and I haven't benchmarked it yet.
Re: SPE Media Lib
Hi, I'm doing some tests on PS3 with your converter but I'm a bit confuse on the way it has to be used.unsolo wrote: I am nearly finished(it works but is not released) with a colorspace converter YV420p ->ARGB.. (more or less the same as YV12->ARGB)
That runs on a spe at more than 60FPS for 1920x1080.
What I did:
- I dowloaded this file
ftp://ftp.ldv.e-technik.tu-muenchen.de/ ... lm_ter.yuv
that is an uncompressed 576i YUV video of 252 frames @25fps.
- I replicated the file 20 times to finally obtain a 5040 frames video
- I modified the number of frames to run through in yuv2rgb.cpp
Code: Select all
int ftot = 5040;
Code: Select all
# ./yuv2rgb 576i25_stockholm_ter_x20.yuv 720 576
The video also plays with many latches.
Maybe I'm missing something, how do you explain these results?
Re: SPE Media Lib
Well the original file is 153,090 KB x 20 = 3,061,800 KB.Pizza67 wrote:- I dowloaded this file
ftp://ftp.ldv.e-technik.tu-muenchen.de/ ... lm_ter.yuv
that is an uncompressed 576i YUV video of 252 frames @25fps.
- I replicated the file 20 times to finally obtain a 5040 frames video
- I modified the number of frames to run through in yuv2rgb.cpp- I ranCode: Select all
int ftot = 5040;
obtaining about 40 FPS, that is worst than your 60FPS@1920x1080.Code: Select all
# ./yuv2rgb 576i25_stockholm_ter_x20.yuv 720 576
The video also plays with many latches.
Maybe I'm missing something, how do you explain these results?
3,061,800 KB / 5040 x 40 = 24,300 KB/s.
You are hard drive speed limited I guess :)
Mplayer works fine with high definition MPEG2 streams: I tried 1080i@50FPS.mbf wrote:It shouldn't be that bad considering that this conversion takes about 20% of the CPU (PPU) time when playing this kind of stuff with MPlayer. Have you tried with MPlayer?Pizza67 wrote:obtaining about 40 FPS, that is worst than your 60FPS@1920x1080.
It plays ok, so a throughput of 40FPS with a 576i video seems really bad in comparison with MPlayer that uses just PPU.
My concern is that it might be a problem of presentation on the ps3fb. I mean, the conversion with SPU should be very fast but the frames swap maybe slows down the execution maybe because of wait for VSync from Hypervisor or something else.
Does it could be an explanation?
Read my post just above yours.Pizza67 wrote:Mplayer works fine with high definition MPEG2 streams: I tried 1080i@50FPS.
It plays ok, so a throughput of 40FPS with a 576i video seems really bad in comparison with MPlayer that uses just PPU.
My concern is that it might be a problem of presentation on the ps3fb. I mean, the conversion with SPU should be very fast but the frames swap maybe slows down the execution maybe because of wait for VSync from Hypervisor or something else.
Does it could be an explanation?
Then see how file is read in yuv2rgb, compare this to file reading in Mplayer. See the difference? :)
The file reading in yuv2rgb is primitive and inefficient, it's only here to demonstrate the use of the library.
I don't say this is the only explanation, yours might be part of the problem too. But there surely is a bottleneck in file reading.
I read your post after I posted mine, sorry :)ldesnogu wrote:But there surely is a bottleneck in file reading.
You're totally right, I forgot to compute the disk throughput. That's definitively the problem.
Mplayer reads a compressed stream so it doesn't reach the disk throughput.
The best way to test the yuv2rgb converter is probably to use always the same frame cached in ram. I think this is done by launching the program without params. ;)
Thanks.
regarding speed
you can easely achive 60/50 fps however keep in mind that you need double buffered input and output.
it runs at 300FPS 1920x1080 if you load to images into ram and test with only that you will se results.
The yuvscaler will achive from 150->299 FPS depending on your scalefactor.
ps its very important to compile with spu-elf-gcc -O2 -fno-exceptions -g to achive good performance and i suggest spu-elf-gcc-4.1.1 barelona patches or spu-elf-gcc-4.3
hope this helps
unsolo
it runs at 300FPS 1920x1080 if you load to images into ram and test with only that you will se results.
The yuvscaler will achive from 150->299 FPS depending on your scalefactor.
ps its very important to compile with spu-elf-gcc -O2 -fno-exceptions -g to achive good performance and i suggest spu-elf-gcc-4.1.1 barelona patches or spu-elf-gcc-4.3
hope this helps
unsolo
Don't do it alone.
Ok time to recruit
who wants to help ? go into spu-medialib section in the forums please.
I need more people and i dont mind helping training them in how to think spu.
Basic consept:
Offloading anything to the spe's gives better overall performance so why not do it.
Currently im looking into if its possible to do make xv work.
and theres a working mplayer-vo using spu-medialib
who wants to help ? go into spu-medialib section in the forums please.
I need more people and i dont mind helping training them in how to think spu.
Basic consept:
Offloading anything to the spe's gives better overall performance so why not do it.
Currently im looking into if its possible to do make xv work.
and theres a working mplayer-vo using spu-medialib
Don't do it alone.
Note about DMA transfers.
128 byte alignment for the DMA is optimal in terms of speed.
Not trivial, but you also need that alignmenet in the local storage. DMAs with addresses aligned in the memory but not aligned in the local storage are slow. Probably, each memory line is accessed twice in that case.
Not trivial, but you also need that alignmenet in the local storage. DMAs with addresses aligned in the memory but not aligned in the local storage are slow. Probably, each memory line is accessed twice in that case.