The hunt for HV's FIFO/Push buffer...
The hunt for HV's FIFO/Push buffer...
Since a parameter of hypervisor's function used to request a bitblt from GPU looks like the one we would use ourself in a Nvidia miniport push buffer (alias FIFO) if we had direct access to RSX... I think there is really a push buffer/FIFO used by hypervisor. A circular main memory area, where a few dwords are written (1 command + few parameters).
There is certainly a variable (tail) that points at the end of these list.
Area may be very short if hypervisor knows linux runs and only bitblt commands will be issued.
When this variable (tail) changes, that means new commands just got written. A direct write to GPU mapped registers is probably made at the same time to warn GPU it should start reading them and where it should stop.
If we manage to survey that variable (tail) and if we manage to put our own commands and parameters in the FIFO/push buffer, immediately after this change, it's our commands that will be executed (if it's too fast, we can leave alone first commands and target next ones).
I'm not naive I doubt software tricks can bypass memory addresses range verification when CPU attempts a write/read somewhere in memory.
But who knows, with marcus' other os demo...
I'm rather thinking about a ram attack. That is hardware mods allowing to spy/change main ram.
I also heard a rumour about a usb chipset in PS3 hardware, for which we have all direct access, and that may act as a dma controller, thus allowing to read/write anywhere.
What's your opinion about this idea?
There is certainly a variable (tail) that points at the end of these list.
Area may be very short if hypervisor knows linux runs and only bitblt commands will be issued.
When this variable (tail) changes, that means new commands just got written. A direct write to GPU mapped registers is probably made at the same time to warn GPU it should start reading them and where it should stop.
If we manage to survey that variable (tail) and if we manage to put our own commands and parameters in the FIFO/push buffer, immediately after this change, it's our commands that will be executed (if it's too fast, we can leave alone first commands and target next ones).
I'm not naive I doubt software tricks can bypass memory addresses range verification when CPU attempts a write/read somewhere in memory.
But who knows, with marcus' other os demo...
I'm rather thinking about a ram attack. That is hardware mods allowing to spy/change main ram.
I also heard a rumour about a usb chipset in PS3 hardware, for which we have all direct access, and that may act as a dma controller, thus allowing to read/write anywhere.
What's your opinion about this idea?
Re: The hunt for HV's FIFO/Push buffer...
EDIT : Sorry I read your post too fast, I thought you wanted to do it in a pure software way :)
Here is the probable way hypervisor works:
1. the processor is either in hypervisor mode or not
2. there are different memory mappings with different protections depending on whether you run in hypervisor mode or not
3. the hypervisor setup code does not want non hypervisor modes to see most of the IO space and certainly not its own private memory.
Basically this means that if the hypervisor doesn't want you to see portions of memory, then there is no way you will see them, unless you enter hypervisor mode.
So if there is an hypervisor API call that can "unprotect" some memory (cf. GPU VRAM access kernel module), or if you find a way to run your code in hypervisor mode, then you will be able to find your memory location. Else... :)
My opinion is simple: you will probably never find any such thing that is accessible when not in hypervisor mode.ps2devman wrote:What's your opinion about this idea?
Here is the probable way hypervisor works:
1. the processor is either in hypervisor mode or not
2. there are different memory mappings with different protections depending on whether you run in hypervisor mode or not
3. the hypervisor setup code does not want non hypervisor modes to see most of the IO space and certainly not its own private memory.
Basically this means that if the hypervisor doesn't want you to see portions of memory, then there is no way you will see them, unless you enter hypervisor mode.
So if there is an hypervisor API call that can "unprotect" some memory (cf. GPU VRAM access kernel module), or if you find a way to run your code in hypervisor mode, then you will be able to find your memory location. Else... :)
- StrontiumDog
- Posts: 55
- Joined: Wed Jun 01, 2005 1:41 pm
- Location: Somewhere in the South Pacific
Re: The hunt for HV's FIFO/Push buffer...
I think you are half right. There is a catch 22 in what you say. If the RSX can access it, then it can be DMA'd by the RSX. So if the HV has such a Fifo buffer for sending commands to the RSX it must, by definition be in an area of memory that can be DMA'd from. By my reading of the CELL specs, DMA is an all or nothing proposition. If Anything can DMA to/from memory then everything can DMA to/from that memory. The HV code is 99.999% for sure locked in an area of memory that can not be DMA'd to/from which precludes it holding such a Fifo. In which case the HV must be split. It must have a DMA accessable portion it uses to communicate to other devices, if it in fact does have a FIFO buffer the RSX can DMA from. This means if a way can be found to control DMA of some other device and that FIFO can be found, it can be manipulated by writing over it with another DMA call.ldesnogu wrote:My opinion is simple: you will probably never find any such thing that is accessible when not in hypervisor mode.ps2devman wrote:What's your opinion about this idea?
Here is the probable way hypervisor works:
1. the processor is either in hypervisor mode or not
2. there are different memory mappings with different protections depending on whether you run in hypervisor mode or not
3. the hypervisor setup code does not want non hypervisor modes to see most of the IO space and certainly not its own private memory.
Basically this means that if the hypervisor doesn't want you to see portions of memory, then there is no way you will see them, unless you enter hypervisor mode.
That's good and bad news.
Bad news:
That means the variable (tail) is probably in a non dma-able area.
Good news:
Still hope for the FIFO/push buffer itself!
Instead of surveying the variable (tail), we can survey the first command and anticipate what command it will be and the number of parameters involved.
First step would be to explore all dma-able areas to check for remaining traces of one of the parameters of the bitblt : "(xres << 16) | yres"
Usually, what is written in FIFO/push buffer is not erased after being
read by GPU. It will just be overwritten when all buffer is used (a jump
to head will occur) since it's a circular buffer. It shouldn't be a too common
value in main ram... Also it should be repeated in memory at exact
same "distance" (a few dwords).
Remember, ralferoo, pointed at this kernel call (let's say HV API) :
(Sony engineer lazyness is our friend, it seems... It's obvious this parameter is pre-formatted for direct sending to FIFO/Push buffer.
Because it's exactly the format expected by bitblt on nv20 chipsets)
Good hunting to all!
Bad news:
That means the variable (tail) is probably in a non dma-able area.
Good news:
Still hope for the FIFO/push buffer itself!
Instead of surveying the variable (tail), we can survey the first command and anticipate what command it will be and the number of parameters involved.
First step would be to explore all dma-able areas to check for remaining traces of one of the parameters of the bitblt : "(xres << 16) | yres"
Usually, what is written in FIFO/push buffer is not erased after being
read by GPU. It will just be overwritten when all buffer is used (a jump
to head will occur) since it's a circular buffer. It shouldn't be a too common
value in main ram... Also it should be repeated in memory at exact
same "distance" (a few dwords).
Remember, ralferoo, pointed at this kernel call (let's say HV API) :
Code: Select all
status = lv1_gpu_context_attribute(ps3fb.context_handle,
L1GPU_CONTEXT_ATTRIBUTE_FB_BLIT,
offset, fb_ioif,
L1GPU_FB_BLIT_WAIT_FOR_COMPLETION |
(xres << 16) | yres,
xres * BPP);
Because it's exactly the format expected by bitblt on nv20 chipsets)
Good hunting to all!
Re: The hunt for HV's FIFO/Push buffer...
Which is why the hypervisor will not allow you direct access to anythingStrontiumDog wrote:This means if a way can be found to control DMA of some other device and that FIFO can be found, it can be manipulated by writing over it with another DMA call.
capable of such DMA. You can control the DMA function of the SFCs in
the SPEs, but those DMA accesses are translated through the PTEs, so
you'll not be able to access anything you can't with regular CPU accesses.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
The ps3 activates the hypervisor when it boots into the other os? This is the only thing I don't understand, what actually tells the hypervisor to activate, isn't it not "on" when running a game? I understand the other os or "boot loader" doesnt actually call the hypervisor, if it did we could have had a modded boot loader but I am guessing the ps3 firmware tells the hypersior to go into block rsx mode(yeah I come up with some good names xD) when the other os is running. If I am wrong please feel free to correct me, I would like to know.
Friend just got a ps3 yesturday (version 1.51) and I have been reading up on it a bit more then before.
Friend just got a ps3 yesturday (version 1.51) and I have been reading up on it a bit more then before.
When booting, the system checks a bit in the flash which says if it should boot GameOS or "Other OS". If it's set to "Other OS", it starts the hypervisor which will in turn start otheros.bld. So when otheros.bld starts, the hypervisor is already active.
Whether GameOS runs without a hypervisor, or just with a different hypervisor, is unknown at this point.
Whether GameOS runs without a hypervisor, or just with a different hypervisor, is unknown at this point.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
ah, I see now thanks for explaining. Other OS is just responsible of launching linux basically then even through modded ones can do so many other things. I know it wont do much but a modded other os that would call the hypervisor, it should report it as active, just a thought?
Well, thanks again for explaining so quick and nice work on the modded other os mc :)
Edited: This is a bit off-topic but I am wondering if I should update or not, I told my friend not too and he knows why and understands that he shouldn't but just want to know what anyone here thinks.
Well, thanks again for explaining so quick and nice work on the modded other os mc :)
Edited: This is a bit off-topic but I am wondering if I should update or not, I told my friend not too and he knows why and understands that he shouldn't but just want to know what anyone here thinks.
The "Other OS" mechanism is intended to support launching any OS under
virtualization, but since this virtualization is not fully transparent, the
OS needs to be adapted to run in this environment, and currently Linux
is the only OS for which such an adaption has been done (since that's
the OS which Sony themselves chose to do it for). But yes, "otheros.bld"
is just intended as a launcher ("bld" = Bootloader) which loads the
actual OS from the harddrive. But any handover between bld and OS
is up to them, the hypervisor will not care but present them with the
exact same virtualized environment.
as active", and who would report whom to whom?
virtualization, but since this virtualization is not fully transparent, the
OS needs to be adapted to run in this environment, and currently Linux
is the only OS for which such an adaption has been done (since that's
the OS which Sony themselves chose to do it for). But yes, "otheros.bld"
is just intended as a launcher ("bld" = Bootloader) which loads the
actual OS from the harddrive. But any handover between bld and OS
is up to them, the hypervisor will not care but present them with the
exact same virtualized environment.
This question I do not understand. What do you mean by "report itI know it wont do much but a modded other os that would call the hypervisor, it should report it as active, just a thought?
as active", and who would report whom to whom?
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
push buffer is open for read-write
The new sources for ps3fb.c http://www.everfall.com/paste/id.php?kjjqgvpbncu0 contain GPU_CMD_BUF_SIZE macro.
It's clear that the memory region [ ps3fb_videomemory.address + ps3fb_videomemory.size - GPU_CMD_BUF_SIZE, ps3fb_videomemory.address + ps3fb_videomemory.size ) is FIFO buffer.
This region dump: http://www.everfall.com/paste/id.php?ieto25cyoy0g
PS. excuse my English.
It's clear that the memory region [ ps3fb_videomemory.address + ps3fb_videomemory.size - GPU_CMD_BUF_SIZE, ps3fb_videomemory.address + ps3fb_videomemory.size ) is FIFO buffer.
This region dump: http://www.everfall.com/paste/id.php?ieto25cyoy0g
PS. excuse my English.
Yes, it is realy push buffer.
The good idea is to test all undocumented entries from http://wiki.ps2dev.org/ps3:hypervisor:l ... _attribute , get push buffer dump for the each one and compare with nouveau NV40 push buffer database.
The good idea is to test all undocumented entries from http://wiki.ps2dev.org/ps3:hypervisor:l ... _attribute , get push buffer dump for the each one and compare with nouveau NV40 push buffer database.
fresh news about fifo control regs.
There is lpar_dma_control area. It can be iomapped. This area is filled with zeroes, only 3 dwords are non-zeroes. dwords with indices 0x10, 0x11, 0x15.
The value of dword 0x11 just after hypervisor blit is 0xe1f0000, few ms later is 0xe1f0048 and 0xe1f00b8 finally.
compare with
http://nouveau.cvs.sourceforge.net/nouv ... iew=markup
Edited: constants
There is lpar_dma_control area. It can be iomapped. This area is filled with zeroes, only 3 dwords are non-zeroes. dwords with indices 0x10, 0x11, 0x15.
The value of dword 0x11 just after hypervisor blit is 0xe1f0000, few ms later is 0xe1f0048 and 0xe1f00b8 finally.
compare with
http://nouveau.cvs.sourceforge.net/nouv ... iew=markup
Code: Select all
711 printf("FIFO put=0x%08x, get=0x%08x\n",
712 fifo_regs[0x40/4],
713 fifo_regs[0x44/4]
714 );
715 FIRE_RING();
716 sleep(1);
717 printf("FIFO put=0x%08x, get=0x%08x\n",
718 fifo_regs[0x40/4],
719 fifo_regs[0x44/4]
720 );
Last edited by IronPeter on Sun Oct 07, 2007 9:34 pm, edited 1 time in total.
funny push buffer disassembling
I've parsed blit push buffer into dma packets ( size of packet, subchannel id, tag ):
http://www.everfall.com/paste/id.php?z51ttk37j71s
My screen resolution is 1280 x 1024 ( 0x500 x 0x400 ).
Funny, hypervisor made 2 blits ( one 1024 x 1024 and one 256 x 1024 ), so parameters of hypervisor call are not "preformated for th GPU".
You can refer this document relating push buffer tags:
http://gitweb.freedesktop.org/?p=nouvea ... veau_reg.h
http://www.everfall.com/paste/id.php?z51ttk37j71s
My screen resolution is 1280 x 1024 ( 0x500 x 0x400 ).
Funny, hypervisor made 2 blits ( one 1024 x 1024 and one 256 x 1024 ), so parameters of hypervisor call are not "preformated for th GPU".
You can refer this document relating push buffer tags:
http://gitweb.freedesktop.org/?p=nouvea ... veau_reg.h
Re: funny push buffer disassembling
Got to love power of 2 sized textures.IronPeter wrote: My screen resolution is 1280 x 1024 ( 0x500 x 0x400 ).
Funny, hypervisor made 2 blits ( one 1024 x 1024 and one 256 x 1024 ), so parameters of hypervisor call are not "preformated for th GPU".
I was able to run blit push buffer from the user land using fifo control regs.
There was some kind of protection. Very weak protection.
It works unstable for now, but it does work. Probably, it's possible to write some kind of 2D support ( stretched blits, color fills, etc ).
The main question is about 3D support. We need so-called "context objects" to be properly initalized. Probably, hypervisor does this work for us. All we need are handles ( and lpar_dma_reports contains something that looks like this handles ). To initialize these objects "by hands" we need to access to very special RSX registers, so called RAMIN area.
PS. I investigate RSX with only open-source information. I have no signed NDA with Sony or NVidia.
There was some kind of protection. Very weak protection.
It works unstable for now, but it does work. Probably, it's possible to write some kind of 2D support ( stretched blits, color fills, etc ).
The main question is about 3D support. We need so-called "context objects" to be properly initalized. Probably, hypervisor does this work for us. All we need are handles ( and lpar_dma_reports contains something that looks like this handles ). To initialize these objects "by hands" we need to access to very special RSX registers, so called RAMIN area.
PS. I investigate RSX with only open-source information. I have no signed NDA with Sony or NVidia.
Ok, things are getting serious now... Hehe.
What is your firmware version?
We need to be careful and detect when this new trick will become unusable in future firmware versions. Someone with infectus and the ability to swap firmware would be the best person for detecting such infamous change...
Thanks for your great work, IronPeter!
PS: If you could publish a minimal source, even unstable, that can be used to test this new trick for each firmware version that would be great! Thanks!
What is your firmware version?
We need to be careful and detect when this new trick will become unusable in future firmware versions. Someone with infectus and the ability to swap firmware would be the best person for detecting such infamous change...
Thanks for your great work, IronPeter!
PS: If you could publish a minimal source, even unstable, that can be used to test this new trick for each firmware version that would be great! Thanks!
The firmware version is 1.8
The trick is very simple, I can describe it without posting the full sources.
Look at the push buffer dump:
http://www.everfall.com/paste/id.php?uxdlpwlbfpo9
It is the image of push buffer after hypervisor blit. The end of buffer is at 0xb8.
The last packet sends zero to the subchannel zero with tag 0x110. Replace the tag with NOP ( 0x100 ) while buffer is kicked by the hypervisor and is executing with the RSX.
Fill push buffer with N + 1 copies of the first 0xb8 bytes.
In cycle for( int i = 0; i < N; ++i ) modify client screen buffer in some way ( fill with random numbers ), kick push buffer via writing ( (uint32_t *)ioremap( lpar_dma_control, 1024) )[ 0x10 ] = 0xe1f0000 + 0xb8 * ( i + 1 ); sleep ( 1 ); The client screen in the xdr memory will blits in the videomemory.
If nobody is unable to repeat these steps I post the full sources.
Edited: bugfix
The trick is very simple, I can describe it without posting the full sources.
Look at the push buffer dump:
http://www.everfall.com/paste/id.php?uxdlpwlbfpo9
It is the image of push buffer after hypervisor blit. The end of buffer is at 0xb8.
The last packet sends zero to the subchannel zero with tag 0x110. Replace the tag with NOP ( 0x100 ) while buffer is kicked by the hypervisor and is executing with the RSX.
Fill push buffer with N + 1 copies of the first 0xb8 bytes.
In cycle for( int i = 0; i < N; ++i ) modify client screen buffer in some way ( fill with random numbers ), kick push buffer via writing ( (uint32_t *)ioremap( lpar_dma_control, 1024) )[ 0x10 ] = 0xe1f0000 + 0xb8 * ( i + 1 ); sleep ( 1 ); The client screen in the xdr memory will blits in the videomemory.
If nobody is unable to repeat these steps I post the full sources.
Edited: bugfix
Last edited by IronPeter on Sun Oct 07, 2007 10:51 pm, edited 1 time in total.
Very nice work, IronPeter!
I find it intriguing that not only are you allowed to define the FIFO region
in user memory, but that you are also allowed to map the control area.
This suggests to me that Sony actually intended to support HW 3D under
Linux, as they could just as easily have made the control area accessible
only to the hypervisor.
I find it intriguing that not only are you allowed to define the FIFO region
in user memory, but that you are also allowed to map the control area.
This suggests to me that Sony actually intended to support HW 3D under
Linux, as they could just as easily have made the control area accessible
only to the hypervisor.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
We are allowed to map only part of mmio register. Only context control registers. I do not know the way to map the global RAMIN area.mc wrote: ... but that you are also allowed to map the control area.
The good source about this stuff:
http://nouveau.freedesktop.org/wiki/HonzaHavlicek
Well, yeah, but that only makes it more likely that it is intentional
that this particular part can be mapped. It would make sense
to only expose the parts needed for performance (= FIFO interface)
and handle access to other parts through the HV.
Thanks for the link, BTW.
that this particular part can be mapped. It would make sense
to only expose the parts needed for performance (= FIFO interface)
and handle access to other parts through the HV.
Thanks for the link, BTW.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
Complete subchannel mapping
It's push buffer dump after hypervisor FB_SETUP. Programmers from Sony decided not to clean up push buffer after context objects set up.
http://www.everfall.com/paste/id.php?ew29498z816w
Enjoi it.
http://www.everfall.com/paste/id.php?ew29498z816w
Enjoi it.
confirmed FIFO hack
Hi all,
I could reproduce the FIFO hack described by IronPeter, using firmware v1.93. I had to setup a large blit (which is decomposed into many 1024x1024 blits by the HV) in order to have time to tweak the FIFO area. Instead of patching with a NOP (which works), I chose to set the FIFO end pointer two operations back. I also had to remove the L1GPU_FB_BLIT_WAIT_FOR_COMPLETION flag from the call to lv1_gpu_context_attribute(), so to sum it up:
I was also able to send various other blit commands in the FIFO, using documentation from the nouveau project. For example, the following FIFO commands will do a YUYV blit instead of a ARGB blit:
Thanks IronPeter for your hard work. I'll now try playing a bit with subchannel bindings, your most recent post looks quite interesting.
I could reproduce the FIFO hack described by IronPeter, using firmware v1.93. I had to setup a large blit (which is decomposed into many 1024x1024 blits by the HV) in order to have time to tweak the FIFO area. Instead of patching with a NOP (which works), I chose to set the FIFO end pointer two operations back. I also had to remove the L1GPU_FB_BLIT_WAIT_FOR_COMPLETION flag from the call to lv1_gpu_context_attribute(), so to sum it up:
Code: Select all
/* large blit */
lv1_gpu_context_attribute(ps3.context_handle,
L1GPU_CONTEXT_ATTRIBUTE_FB_BLIT,
dst_offset,
GPU_IOIF + src_offset,
(1ULL << 31) |
(1280 << 16) | 1280,
1280*4);
/* go back two operations */
ps3.fifo_regs[0x10] -= 8;
/* wait for end of GPU operation */
while (ps3.fifo_regs[0x11] != ps3.fifo_regs[0x10] &&
ps3.fifo_regs[0x15] != ps3.fifo_regs[0x10]);
/* copy our operation to the fifo */
memcpy(&ps3.fifo[fifo_idx], blit_program, sizeof(blit_program));
/* fill the vram in white */
memset(ps3.vram, 0xff, ps3.vram_size);
/* kick off the GPU */
ps3.fifo_regs[0x10] += sizeof(blit_program);
Code: Select all
uint32_t blit_program[] = {
0x00106300, // SURFACE_FORMAT (size 4, subchannel 3)
0x0000000a, // SURFACE_FORMAT_A8R8G8B8
0x14001400, // ((pitch{dst} << 16) | pitch{dst})
0x00000000, // src_offset
0x00000000, // dst_offset
0x0024c2fc, // NV_IMAGE_BLIT_OPERATION (size 9, subchannel 6)
0x00000001, // not dither (0 = dither)
0x00000005, // STRETCH_BLIT_FORMAT_YUYV
0x00000003, // STRETCH_BLIT_OPERATION_COPY
0x00000000, // (dstX << 16 | dstY)
0x02d00400, // ((height << 16) | width)
0x00000000, // (dstX << 16 | dstY)
0x02d00400, // ((height << 16) | width)
0x00100000, // step_x in 12.20 fixed point
0x00100000, // step_y in 12.20 fixed point
0x0010c400, // STRETCH_BLIT_SRC_SIZE (size 4, subchannel 6)
0x02d00400, // ((height << 16) | width)
0x00021400, // pitch_src
// | (STRETCH_BLIT_SRC_FORMAT_ORIGIN_CORNER << 16)
// | (STRETCH_BLIT_SRC_FORMAT_FILTER_POINT_SAMPLE << 24)
0x0d000000, // GPU_IOIF + src_offset
0x00000000, // srcX | (srcY<<16)
};
few words about methodology
Glaurung, nice.
I want to say few words about testing methodology. Segmentation fault in the kernel mode is not very good. I made few things to make life easier:
1.) I extended ps3fb's memory mapping for the last 65536 bytes. open( "/dev/fb0" ), mmap it, use push buffer in the client mode.
2.) I extended ps3fb's ioctl for the fifo control registers read/write.
3.) lv1_gpu_context_intr ( interruption ) is very useful for you. Extend ioctl with this function. And it is better to disable (also by ioctl) regular kernel's lv1_gpu_context_intr in the driver.
I want to say few words about testing methodology. Segmentation fault in the kernel mode is not very good. I made few things to make life easier:
1.) I extended ps3fb's memory mapping for the last 65536 bytes. open( "/dev/fb0" ), mmap it, use push buffer in the client mode.
2.) I extended ps3fb's ioctl for the fifo control registers read/write.
3.) lv1_gpu_context_intr ( interruption ) is very useful for you. Extend ioctl with this function. And it is better to disable (also by ioctl) regular kernel's lv1_gpu_context_intr in the driver.
Glaurung, would you like to test blit from videomemory to the system area? I have no access for ps3 for a short time.
http://wiki.ps2dev.org/ps3:hypervisor:l ... y_allocate returns only 252 megs, the top of the videomemory seems to contatin RAMIN area ( the global GPU control area ).
This area must be protected from read/write, but probably...
http://wiki.ps2dev.org/ps3:hypervisor:l ... y_allocate returns only 252 megs, the top of the videomemory seems to contatin RAMIN area ( the global GPU control area ).
This area must be protected from read/write, but probably...