VFPU diggins

hlide · Post by **hlide** » Tue Nov 07, 2006 9:50 pm

This topic is where we can share our VFPU diggins. This first message should grow more and more as our VFPU diggins make progress.

Code: Select all

/////////////////////////////////////////////////////////////
// VFPU diggins
/////////////////
//
// Authors &#58;
//
//   hlide, Raphael
//
// 2006-11-17 01&#58;05PM
//
/////////////////////////////////////////////////////////////


op  operands                             ticks         latency*
-----------------------------------------

mtv rt, vs.s
&#123;
  vs.s = rt; // rt is general purpose register
&#125;

mfv rt, vs.s
&#123;
  rt = vs.s; // rt is general purpose register
&#125;

-----------------------------------------

mtvc rt, vcr
&#123;
  vcr = rt; // vcr is cop2 control register
&#125;

mfvc rt, vcr
&#123;
  rt = vcr; // vcr is cop2 control register
&#125;

-----------------------------------------

vmtvc vcr, vs.s
&#123;
  vcr = vs.s;
&#125;

vmfvc sd, cr
&#123;
  sd = cr;
&#125;

-----------------------------------------

// rm is general purpose register containing a memory address
lv.s vd.s, offset&#40;rm&#41;
&#123;
   vd.s = offset&#40;rm&#41;;
&#125;

sv.s vd.s, offset&#40;rm&#41;
&#123;
   offset&#40;rm&#41; = vd.s;
&#125;


// rm needs to be aligned to 16bytes &#40;quadword&#41;
lv.q vd, rm                                 1            0       &#40;cache&#41;
&#123;                                       68                  &#40;memory&#41;
   vd&#91;0&#93; = 0&#40;rm&#41;;
   vd&#91;1&#93; = 4&#40;rm&#41;;
   vd&#91;2&#93; = 8&#40;rm&#41;;
   vd&#91;3&#93; = 12&#40;rm&#41;;
&#125;

ulv.q vd, rm                              2            0       &#40;cache&#41;
&#123;                                       68                  &#40;memory&#41;
   vd&#91;0&#93; = 0&#40;rm&#41;;
   vd&#91;1&#93; = 4&#40;rm&#41;;
   vd&#91;2&#93; = 8&#40;rm&#41;;
   vd&#91;3&#93; = 12&#40;rm&#41;;
&#125;

// rm needs to be aligned to 16bytes &#40;quadword&#41;
sv.q vd, rm                                 7            2       &#40;cache&#41;
&#123;                                       111                  &#40;memory&#41;
   0&#40;rm&#41; = vd&#91;0&#93;;
   4&#40;rm&#41; = vd&#91;1&#93;;
   8&#40;rm&#41; = vd&#91;2&#93;;
   12&#40;rm&#41; = vd&#91;3&#93;;
&#125;

usv.q vd, rm                              14            4       &#40;cache&#41;
&#123;                                       111                  &#40;memory&#41;
   0&#40;rm&#41; = vd&#91;0&#93;;
   4&#40;rm&#41; = vd&#91;1&#93;;
   8&#40;rm&#41; = vd&#91;2&#93;;
   12&#40;rm&#41; = vd&#91;3&#93;;
&#125;

-----------------------------------------

// vector register prefixes

vpfxs &#91;?0,?1,?2,?3&#93;
// special prefix for vs like vs.q&#91;X, X, Y, Y&#93; - their values may be &#58;
//  x &#58; vs&#91;0&#93;
//  y &#58; vs&#91;1&#93;
//  z &#58; vs&#91;2&#93;
//  w &#58; vs&#91;3&#93;
//  -x &#58; -vs&#91;0&#93;
//  -y &#58; -vs&#91;1&#93;
//  -z &#58; -vs&#91;2&#93;
//  -w &#58; -vs&#91;3&#93;
//  |x| &#58; |vs&#91;0&#93;| &#40;absolute value of vs&#91;0&#93;&#41;
//  |y| &#58; |vs&#91;1&#93;| &#40;absolute value of vs&#91;1&#93;&#41;
//  |z| &#58; |vs&#91;2&#93;| &#40;absolute value of vs&#91;2&#93;&#41;
//  |w| &#58; |vs&#91;3&#93;| &#40;absolute value of vs&#91;3&#93;&#41;
//  0 &#58; constant 0
//  1 &#58; constant 1
//  2 &#58; constant 2
//  1/2 &#58; constant 1/2
//  3 &#58; constant 3
//  1/3 &#58; constant 1/3
//  1/4 &#58; constant 1/4
//  1/6 &#58; constant 1/6
//
// so vmov.q vd, vs&#91;z, |x|, 0, -x&#93; &#58;
//   vd&#91;0&#93; = vs&#91;3&#93;;
//   vd&#91;1&#93; = |vs&#91;0&#93;|;
//   vd&#91;2&#93; = 0;
//   vd&#91;3&#93; = -vs&#91;0&#93;;

vpfxt &#91;?0,?1,?2,?3&#93;
// special prefix for vt like vt.q&#91;X, X, Y, Y&#93; -  their values may be &#58;
//  x &#58; vt&#91;0&#93;
//  y &#58; vt&#91;1&#93;
//  z &#58; vt&#91;2&#93;
//  w &#58; vt&#91;3&#93;
//  -x &#58; -vt&#91;0&#93;
//  -y &#58; -vt&#91;1&#93;
//  -z &#58; -vt&#91;2&#93;
//  -w &#58; -vt&#91;3&#93;
//  |x| &#58; |vt&#91;0&#93;| &#40;absolute value of vt&#91;0&#93;&#41;
//  |y| &#58; |vt&#91;1&#93;| &#40;absolute value of vt&#91;1&#93;&#41;
//  |z| &#58; |vt&#91;2&#93;| &#40;absolute value of vt&#91;2&#93;&#41;
//  |w| &#58; |vt&#91;3&#93;| &#40;absolute value of vt&#91;3&#93;&#41;
//  0 &#58; constant 0
//  1 &#58; constant 1
//  2 &#58; constant 2
//  1/2 &#58; constant 1/2
//  3 &#58; constant 3
//  1/3 &#58; constant 1/3
//  1/4 &#58; constant 1/4
//  1/6 &#58; constant 1/6
//

vpfxd &#91;?4,?5,?6,?7&#93;
// special prefix for vd like vd.q&#91;0&#58;1, 0&#58;1, 0&#58;1, 0&#58;1&#93; -  their values may be &#58;
// 0&#58;1 &#58; min&#40;1, max&#40;0, vd&#91;i&#93;&#41;&#41;
// -1&#58;1 &#58; min&#40;1, max&#40;-1, vd&#91;i&#93;&#41;&#41;
// m &#58; ???
//
// so vmov.p vd&#91;0&#58;1, -1&#58;1&#93;, sd &#58;
//   vd&#91;0&#93; = min&#40;1, max&#40;0, vs&#91;0&#93;&#41;&#41;;
//   vd&#91;1&#93; = min&#40;1, max&#40;-1, vs&#91;1&#93;&#41;&#41;;


-----------------------------------------

vadd.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vs&#91;i&#93; + vt&#91;i&#93;;
&#125;

vsub.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vs&#91;i&#93; - vt&#91;i&#93;;
&#125;

-----------------------------------------

vdiv.q/t/p/s vd, vs, vt                        56/42/28/14      30/?/?/?
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vs&#91;i&#93; / vt&#91;i&#93;;
&#125;

vmul.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vs&#91;i&#93; * vt&#91;i&#93;;
&#125;

-----------------------------------------

vdot.q/t/p/s sd.s, vs, vt                     1            0
&#123;
  sd.s = 0;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    sd.s += vs&#91;i&#93; * vt&#91;i&#93;;
&#125;

-----------------------------------------

vscl.q/t/p/s vd, vs, vt.s                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vs&#91;i&#93; * vt.s;
&#125;

-----------------------------------------

// Homogenuous dot product
vhdp.q/t/p/s vd.s, vs, vt &#40;UNSURE&#41;               1            0
&#123;
  vd.s = vt&#91;|q/t/p|&#93;;
  for &#40;i = 0; i < |q/t/p|-1; ++i&#41;
    vd.s += vs&#91;i&#93; * vt&#91;i&#93;;
&#125;

-----------------------------------------

vcmp.q/t/p/s f2, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < 5; ++i&#41;
    VFPU_CC&#91;i&#93; = 0;

  VFPU_CC&#91;5&#93; = 1;

  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    VFPU_CC&#91;i&#93; = bcmp&#40;f2, vs&#91;i&#93;, vt&#91;i&#93;&#41;;  // f2 = EQ/NE/LE/LT/GE/GT
  
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
  &#123;
    VFPU_CC&#91;4&#93; ||= VFPU_CC&#91;i&#93;;
    VFPU_CC&#91;5&#93; &&= VFPU_CC&#91;i&#93;;
  &#125;
&#125;

vcmp.q/t/p/s f1, vs                           1            0
&#123;
  for &#40;i = 0; i < 5; ++i&#41;
    VFPU_CC&#91;i&#93; = 0;

  VFPU_CC&#91;5&#93; = 1;

  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    VFPU_CC&#91;i&#93; = ucmp&#40;f1, vs&#91;i&#93;&#41;;  // f1 = EN/EI/EZ/ES/NN/NI/NZ/NS
  
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
  &#123;
    VFPU_CC&#91;4&#93; ||= VFPU_CC&#91;i&#93;;
    VFPU_CC&#91;5&#93; &&= VFPU_CC&#91;i&#93;;
  &#125;
&#125;

vcmp.q/t/p/s f0
&#123;
  for &#40;i = 0; i < 5; ++i&#41;
    VFPU_CC&#91;i&#93; = 0;

  VFPU_CC&#91;5&#93; = 1;

  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    VFPU_CC&#91;i&#93; = f0; // f0 = TR/FL
  
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
  &#123;
    VFPU_CC&#91;4&#93; ||= VFPU_CC&#91;i&#93;;
    VFPU_CC&#91;5&#93; &&= VFPU_CC&#91;i&#93;;
  &#125;
&#125;

-----------------------------------------

vmin.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = min&#40;vs&#91;i&#93;, vt&#91;i&#93;&#41;;
&#125;

vmax.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = max&#40;vs&#91;i&#93;, vt&#91;i&#93;&#41;;
&#125;

-----------------------------------------

vsgn.q/t/p/s vd, vs                           1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;vs&#91;i&#93; < 0.0&#41; ? -1.0 &#58; &#40;vs&#91;i&#93; > 0.0&#41; &#58; 1.0 &#58; 0.0;
&#125;

-----------------------------------------

vcst.q/t/p/s vd, VPFU_SPC_CST                  1            0
&#123;
  // VFPU_HUGE = Inf
  // VFPU_SQRT2 = SQRT&#40;2&#41;
  // VFPU_SQRT1_2 = SQRT&#40;1/2&#41;
  // VFPU_2_SQRTPI = 2/SQRT&#40;PI&#41;
  // VFPU_2_PI = 2/PI
  // VFPU_1_PI = 1/PI
  // VFPU_PI_4 = PI/4
  // VFPU_PI_2 = PI/2
  // VFPU_PI = PI
  // VFPU_E = e
  // VFPU_LOG2E = log2&#40;e&#41;
  // VFPU_LOG10E = log10&#40;e&#41;
  // VFPU_LN2 = ln&#40;2&#41;
  // VFPU_LN10 = ln&#40;10&#41;
  // VFPU_2PI = 2*PI
  // VFPU_PI_6 = PI/6
  // VFPU_LOG10TWO = log10&#40;2&#41;
  // VFPU_LOG2TEN = log2&#40;10&#41;
  // VFPU_SQRT3_2 = sqrt&#40;3&#41;/2

  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vpfu_special_constant&#91;VPFU_SPC_CST&#93;
&#125;

-----------------------------------------

vscmp.q/t/p/s vd, vs, vt                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;vs&#91;i&#93; < vt&#91;i&#93;&#41; ? -1.0 &#58; &#40;vs&#91;i&#93; > vt&#91;i&#93;&#41; ? 1.0 &#58; 0.0;
&#125;

vsge.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;vs&#91;i&#93; >= vt&#91;i&#93;&#41; ? 1.0 &#58; 0.0;
&#125;

vslt.q/t/p/s vd, vs, vt                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;vs&#91;i&#93; < vt&#91;i&#93;&#41; ? 1.0 &#58; 0.0;
&#125;

-----------------------------------------

vi2uc.q vd.s, vs.q                           1            0
&#123;
  vd.s&#91;0&#93;&#40; 0.. 7&#41; = vs.q&#91;0&#93; & 0xFF;
  vd.s&#91;0&#93;&#40; 8..15&#41; = vs.q&#91;1&#93; & 0xFF;
  vd.s&#91;0&#93;&#40;16..23&#41; = vs.q&#91;2&#93; & 0xFF;
  vd.s&#91;0&#93;&#40;24..31&#41; = vs.q&#91;3&#93; & 0xFF;
&#125;

vi2c.q vd.s, vs.q                           1            0
&#123;
  vd.s&#91;0&#93;&#40; 0.. 7&#41; = &#40;vs.q&#91;0&#93; & 0x7F&#41; | &#40;&#40;vs.q&#91;0&#93; & 0x80000000&#41; >> 24&#41;;
  vd.s&#91;0&#93;&#40; 8..15&#41; = &#40;vs.q&#91;1&#93; & 0x7F&#41; | &#40;&#40;vs.q&#91;1&#93; & 0x80000000&#41; >> 24&#41;;
  vd.s&#91;0&#93;&#40;16..23&#41; = &#40;vs.q&#91;2&#93; & 0x7F&#41; | &#40;&#40;vs.q&#91;2&#93; & 0x80000000&#41; >> 24&#41;;
  vd.s&#91;0&#93;&#40;24..31&#41; = &#40;vs.q&#91;3&#93; & 0x7F&#41; | &#40;&#40;vs.q&#91;3&#93; & 0x80000000&#41; >> 24&#41;;
&#125;

-----------------------------------------

vmov.q/t/p/s vd, vs                           1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = vs&#91;i&#93;;
&#125;

-----------------------------------------

vabs.q/t/p/s vd, vs                           1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = |vs&#91;i&#93;|;
&#125;

-----------------------------------------

vneg.q/t/p/s vd, vs                           1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = -vs&#91;i&#93;;
&#125;

-----------------------------------------

vsat0.q/t/p/s vd, vs                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = max&#40;0.0, min&#40;vs&#91;i&#93;, 1.0&#41;&#41;;
&#125;

vsat1.q/t/p/s vd, vs                        1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = max&#40;-1.0, min&#40;vs&#91;i&#93;, 1.0&#41;&#41;;
&#125;

-----------------------------------------

vzero.q/t/p/s vd                           3/?/?/?         2
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 0.0;
&#125;

vone.q/t/p/s vd                              3/?/?/?         2
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 1.0;
&#125;

vidt.q/t/p/s vd                              3/?/?/?         2
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;vd&#91;i&#93;.column == vd&#91;i&#93;.row&#41; ? 1.0 &#58; 0.0;
&#125;

-----------------------------------------

vrcp.q/t/p/s vd, vs                           4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 1.0 / vs&#91;i&#93;;
&#125;

vrsq.q/t/p/s vd, vs                           4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 1.0 / sqrt&#40;vs&#91;i&#93;&#41;;
&#125;

-----------------------------------------

vsin.q/t/p/s vd, vs                           4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = sin&#40;vs&#91;i&#93;*PI/2&#41;;
&#125;

vcos.q/t/p/s vd, vs                           4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = cos&#40;vs&#91;i&#93;*PI/2&#41;;
&#125;

vasin.q/t/p/s vd, vs                        4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = asin&#40;vs&#91;i&#93;&#41; * 2/PI; // not sure about this conversion
&#125;

-----------------------------------------

vexp2.q/t/p/s vd, vs                        4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = exp2&#40;vs&#91;i&#93;&#41;;
&#125;

vlog2.q/t/p/s vd, vs                        4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = log2&#40;vs&#91;i&#93;&#41;;
&#125;

-----------------------------------------

vsqrt.q/t/p/s vd, vs                        4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = sqrt&#40;vs&#91;i&#93;&#41;;
&#125;

-----------------------------------------

vrnds.s vs                    ?      ?
&#123;
  random_seed&#40;vs&#41;;
&#125;

-----------------------------------------

vrndi.q/t/p/s vd                     12/9/6/3      10/7/4/1
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = rand_integer&#40;-1<<31, 1<<31&#41;; // -1<<31 <= vd&#91;i&#93; < 1<<31
&#125;

-----------------------------------------

vrndf1.q/t/p/s vd                     12/9/6/3      10/7/4/1
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = rand_float&#40;0.0, 2.0&#41;; // 0.0 <= vd&#91;i&#93; < 2.0
&#125;

-----------------------------------------

vrndf2.q/t/p/s vd                     12/9/6/3      10/7/4/1
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = rand_float&#40;0.0, 4.0&#41;; // 0.0 <= vd&#91;i&#93; < 4.0
&#125;

-----------------------------------------

// Nvidia Half format &#91;S&#58;1&#93;&#91;E&#58;5&#93;&#91;M&#58;10&#93;
vf2h.p/q vd, vs   &#40;UNSURE&#41;                           1            0
&#123;
  for &#40;i = 0; i < |q/p|/2; ++i&#41;
    vd&#91;i&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;i*2&#93; >> 16&#41; & 0x8000&#41; | &#40;&#40;vs&#91;i*2&#93; >> 13&#41; & 0x03FF&#41;;

    e = &#40;&#40;vs&#91;i*2&#93; >> 23&#41; & 0xFF&#41; - 0x70;
    if &#40;e < 0&#41;
      e = 0;
    if &#40;e > 31&#41;
      e = 31;
      vd&#91;i&#93; &= ~0x03FF;   // -> make too huge numbers infinity
    if &#40;&#40;vs&#91;i*2&#93; & 0x7FFFFF != 0&#41; && &#40;&#40;vs&#91;i*2&#93; >> 23&#41; & 0xFF == 0xFF&#41;&#41;
      vd&#91;i&#93; |= 0x03FF;   // -> But NaNs stay NaNs even with mantissa loss
    vd&#91;i&#93; |= &#40;e << 10&#41;;


    vd&#91;i&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;i*2+1&#93; >> 16&#41; & 0x8000&#41; | &#40;&#40;vs&#91;i*2+1&#93; >> 13&#41; & 0x03FF&#41;;

    e = &#40;&#40;vs&#91;i*2+1&#93; >> 23&#41; & 0xFF&#41; - 0x70;
    if &#40;e < 0&#41;
      e = 0;
    if &#40;e > 31&#41;
      e = 31;
      vd&#91;i&#93; &= ~0x03FF0000;   // -> make too huge numbers infinity
    if &#40;&#40;vs&#91;i*2+1&#93; & 0x7FFFFF != 0&#41; && &#40;&#40;vs&#91;i*2+1&#93; >> 23&#41; & 0xFF == 0xFF&#41;&#41;
      vd&#91;i&#93; |= 0x03FF0000;   // -> But NaNs stay NaNs even with mantissa loss
    vd&#91;i&#93; |= &#40;e << 26&#41;;
&#125;

-----------------------------------------

vsrt1.q vd, vs                              1            0
&#123;
  vd&#91;0&#93; = min&#40;vs&#91;0&#93;, vs&#91;1&#93;&#41;;
  vd&#91;1&#93; = max&#40;vs&#91;1&#93;, vs&#91;0&#93;&#41;;
  vd&#91;2&#93; = min&#40;vs&#91;2&#93;, vs&#91;3&#93;&#41;;
  vd&#91;3&#93; = max&#40;vs&#91;3&#93;, vs&#91;2&#93;&#41;;
&#125;

vsrt2.q vd, vs                              1            0
&#123;
  vd&#91;0&#93; = min&#40;vs&#91;0&#93;, vs&#91;3&#93;&#41;;
  vd&#91;1&#93; = max&#40;vs&#91;1&#93;, vs&#91;2&#93;&#41;;
  vd&#91;2&#93; = min&#40;vs&#91;2&#93;, vs&#91;1&#93;&#41;;
  vd&#91;3&#93; = max&#40;vs&#91;3&#93;, vs&#91;0&#93;&#41;;
&#125;

vsrt3.q vd, vs                              1            0
&#123;
  vd&#91;0&#93; = max&#40;vs&#91;0&#93;, vs&#91;1&#93;&#41;;
  vd&#91;1&#93; = min&#40;vs&#91;1&#93;, vs&#91;0&#93;&#41;;
  vd&#91;2&#93; = max&#40;vs&#91;2&#93;, vs&#91;3&#93;&#41;;
  vd&#91;3&#93; = min&#40;vs&#91;3&#93;, vs&#91;2&#93;&#41;;
&#125;

vsrt4.q vd, vs                              1            0
&#123;
  vd&#91;0&#93; = max&#40;vs&#91;0&#93;, vs&#91;3&#93;&#41;;
  vd&#91;1&#93; = max&#40;vs&#91;1&#93;, vs&#91;2&#93;&#41;;
  vd&#91;2&#93; = min&#40;vs&#91;2&#93;, vs&#91;1&#93;&#41;;
  vd&#91;3&#93; = min&#40;vs&#91;3&#93;, vs&#91;0&#93;&#41;;
&#125;

-----------------------------------------

vbfy1.q/p vd, vs                           1            0
&#123;
  for &#40;i = 0; i < |q/p|; i += 2&#41;
    vd&#91;i+0&#93; = vs&#91;i+0&#93; + vs&#91;i+1&#93;;
    vd&#91;i+1&#93; = vs&#91;i+0&#93; - vs&#91;i+1&#93;;
&#125;

vbfy2.q vd, vs                              1            0   
&#123;
  vd&#91;0&#93; = vs&#91;0&#93; + vs&#91;2&#93;;
  vd&#91;1&#93; = vs&#91;1&#93; + vs&#91;3&#93;;
  vd&#91;2&#93; = vs&#91;0&#93; - vs&#91;2&#93;;
  vd&#91;3&#93; = vs&#91;1&#93; - vs&#91;3&#93;;
&#125;

-----------------------------------------

vocp.q/t/p/s vd, vs                           1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 1.0 - vs&#91;i&#93;;
&#125;

-----------------------------------------

// Funnel add components
vfad.q/t/p/s vd.s, vs                        1            0
&#123;
  vd.s = 0;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd.s += vs&#91;i&#93;;
&#125;

-----------------------------------------

// Average of components
vavg.q/t/p/s vd.s, vs                        1            0
&#123;
  vd.s = 0.0
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd.s += vs&#91;i&#93;;
  vd.s /= |q/t/p/s|;
&#125;

-----------------------------------------

// Round
vf2in.q/t/p/s vd, vs, imm                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = ROUND&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// Trunc
vf2iz.q/t/p/s vd, vs, imm                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = TRUNC&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// Floor
vf2iu.q/t/p/s vd, vs, imm                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = FLOOR&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// Ceil
vf2id.q/t/p/s vd, vs, imm                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = CEIL&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// &#40;float&#41;
vi2f.q/t/p/s vd, vs, imm                     1            0
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;float&#41;&#40;vs&#91;i&#93;&#41; / &#40;float&#41;&#40;1<<imm&#41;;
&#125;

-----------------------------------------

// Conditional move vector on true
vcmovt.q/t/p/s vd, vs, cc &#40;UNSURE&#41;               5            4
&#123;
  switch &#40;cc&#41;
  &#123;
  case 0...5 &#58;
    if &#40;CC&#91;cc&#93; == TRUE&#41;
	  vd = vs;
  case 6&#58;
    for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
	  if &#40;CC&#91;i&#93; == TRUE&#41;
	    vd&#91;i&#93; = vs&#91;i&#93;
  &#125;
&#125;

// Conditional move vector on false
vcmovf.q/t/p/s vd, vs, cc &#40;UNSURE&#41;               5            4
&#123;
  switch &#40;cc&#41;
  &#123;
  case 0...5 &#58;
    if &#40;CC&#91;cc&#93; == FALSE&#41;
	  vd = vs;
  case 6&#58;
    for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
	  if &#40;CC&#91;i&#93; == FALSE&#41;
	    vd&#91;i&#93; = vs&#91;i&#93;
  &#125;
&#125;

-----------------------------------------

// Matrix multiplication
vmmul.q/t/p md, ms, mt                        16/8/4         15/7/3
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = 0;
      for &#40;k = 0; k < |q/t/p|; ++k&#41;
        md&#91;i&#93;&#91;j&#93; += ms&#91;i&#93;&#91;k&#93; * mt&#91;k&#93;&#91;j&#93;;
&#125;

-----------------------------------------

// Matrix-vector transform
vtfm4.q/3.t/2.p vd, md, vt                     4/3/2         3/2/1
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = 0;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      vd&#91;i&#93; += md&#91;i&#93;&#91;j&#93; * vt&#91;j&#93;;
&#125;

-----------------------------------------

// Homogenous transform
vhtfm4.q/3.t/2.p vd, md, vt                     4/3/2         3/2/1
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = 0;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      vd&#91;i&#93; += md&#91;i&#93;&#91;j&#93; * vt&#91;j&#93;;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; /= vd&#91;|q/t/p|&#93;;
&#125;

-----------------------------------------

// Matrix scale
vmscl.q/t/p md, ms, vt.s                     4/3/2         3/2/1
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = ms&#91;i&#93;&#91;j&#93; * vt.s;
&#125;

-----------------------------------------

// Quaternion multiply
vqmul.q vd, vs, vt                           4            3
&#123;
  vd&#91;0&#93; = vs&#91;3&#93; * vt&#91;0&#93; + vs&#91;0&#93; * vt&#91;3&#93; + vs&#91;1&#93; * vt&#91;2&#93; - vs&#91;2&#93; * vt&#91;1&#93;;
  vd&#91;1&#93; = vs&#91;3&#93; * vt&#91;1&#93; + vs&#91;1&#93; * vt&#91;3&#93; + vs&#91;2&#93; * vt&#91;0&#93; - vs&#91;0&#93; * vt&#91;2&#93;;
  vd&#91;2&#93; = vs&#91;3&#93; * vt&#91;2&#93; + vs&#91;2&#93; * vt&#91;3&#93; + vs&#91;0&#93; * vt&#91;1&#93; - vs&#91;1&#93; * vt&#91;0&#93;;
  vd&#91;3&#93; = vs&#91;3&#93; * vt&#91;3&#93; - vs&#91;0&#93; * vt&#91;0&#93; - vs&#91;1&#93; * vt&#91;1&#93; - vs&#91;2&#93; * vt&#91;2&#93;;
&#125;

-----------------------------------------

// Matrix move
vmmov.q/t/p md, ms                           4/3/2         3/2/1
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = ms&#91;i&#93;&#91;j&#93;;
&#125;

-----------------------------------------

// Matrix Identity
vmidt.q/t/p md                              6/5/4         5/4/3
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = &#40;i == j&#41; ? 1.0 &#58; 0.0;
&#125;

-----------------------------------------

// Matrix-zero
vmzero.q/t/p md                              6/5/4         5/4/3
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = 0.0;
&#125;

-----------------------------------------

// Matrix-one
vmone.q/t/p md                              6/5/4         5/4/3
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = 1.0;
&#125;

-----------------------------------------

// Rotation vector
vrot.q/t/p vd, vs.s, &#91;+c/-c/-s/+s/0,...&#93;         2            1
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = &#40;+1.0 | -1.0&#41; * &#40;cos | sin&#41;&#40;vs.s*PI/2.0&#41; | 0;
&#125;

-----------------------------------------

vt4444.q vd, vs                              1            0
&#123;
  vd&#91;0&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;0&#93; & 0xF0000000&#41; >> 16&#41; | &#40;&#40;vs&#91;0&#93; & 0xF00000&#41; >> 12&#41; | &#40;&#40;vs&#91;0&#93; & 0xF000&#41; >> 8&#41; | &#40;&#40;vs&#91;0&#93; & 0xF0&#41; >> 4&#41;;
  vd&#91;0&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;1&#93; & 0xF0000000&#41; >> 16&#41; | &#40;&#40;vs&#91;1&#93; & 0xF00000&#41; >> 12&#41; | &#40;&#40;vs&#91;1&#93; & 0xF000&#41; >> 8&#41; | &#40;&#40;vs&#91;1&#93; & 0xF0&#41; >> 4&#41;;
  vd&#91;1&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;2&#93; & 0xF0000000&#41; >> 16&#41; | &#40;&#40;vs&#91;2&#93; & 0xF00000&#41; >> 12&#41; | &#40;&#40;vs&#91;2&#93; & 0xF000&#41; >> 8&#41; | &#40;&#40;vs&#91;2&#93; & 0xF0&#41; >> 4&#41;;
  vd&#91;1&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;3&#93; & 0xF0000000&#41; >> 16&#41; | &#40;&#40;vs&#91;3&#93; & 0xF00000&#41; >> 12&#41; | &#40;&#40;vs&#91;3&#93; & 0xF000&#41; >> 8&#41; | &#40;&#40;vs&#91;3&#93; & 0xF0&#41; >> 4&#41;;
&#125;

-----------------------------------------

vt5551.q vd, vs                              1            0
&#123;
  vd&#91;0&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;0&#93; & 0x80000000&#41; >> 16&#41; | &#40;&#40;vs&#91;0&#93; & 0xF80000&#41; >> 9&#41; | &#40;&#40;vs&#91;0&#93; & 0xF800&#41; >> 6&#41; | &#40;&#40;vs&#91;0&#93; & 0xF8&#41; >> 3&#41;;
  vd&#91;0&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;1&#93; & 0x80000000&#41; >> 16&#41; | &#40;&#40;vs&#91;1&#93; & 0xF80000&#41; >> 9&#41; | &#40;&#40;vs&#91;1&#93; & 0xF800&#41; >> 6&#41; | &#40;&#40;vs&#91;1&#93; & 0xF8&#41; >> 3&#41;;
  vd&#91;1&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;2&#93; & 0x80000000&#41; >> 16&#41; | &#40;&#40;vs&#91;2&#93; & 0xF80000&#41; >> 9&#41; | &#40;&#40;vs&#91;2&#93; & 0xF800&#41; >> 6&#41; | &#40;&#40;vs&#91;2&#93; & 0xF8&#41; >> 3&#41;;
  vd&#91;1&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;3&#93; & 0x80000000&#41; >> 16&#41; | &#40;&#40;vs&#91;3&#93; & 0xF80000&#41; >> 9&#41; | &#40;&#40;vs&#91;3&#93; & 0xF800&#41; >> 6&#41; | &#40;&#40;vs&#91;3&#93; & 0xF8&#41; >> 3&#41;;
&#125;

-----------------------------------------

vt5650.q vd, vs                              1            0
&#123;
  vd&#91;0&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;0&#93; & 0xF80000&#41; >> 8&#41; | &#40;&#40;vs&#91;0&#93; & 0xFC00&#41; >> 5&#41; | &#40;&#40;vs&#91;0&#93; & 0xF8&#41; >> 3&#41;;
  vd&#91;0&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;1&#93; & 0xF80000&#41; >> 8&#41; | &#40;&#40;vs&#91;1&#93; & 0xFC00&#41; >> 5&#41; | &#40;&#40;vs&#91;1&#93; & 0xF8&#41; >> 3&#41;;
  vd&#91;1&#93;&#40; 0..15&#41; = &#40;&#40;vs&#91;2&#93; & 0xF80000&#41; >> 8&#41; | &#40;&#40;vs&#91;2&#93; & 0xFC00&#41; >> 5&#41; | &#40;&#40;vs&#91;2&#93; & 0xF8&#41; >> 3&#41;;
  vd&#91;1&#93;&#40;16..31&#41; = &#40;&#40;vs&#91;3&#93; & 0xF80000&#41; >> 8&#41; | &#40;&#40;vs&#91;3&#93; & 0xFC00&#41; >> 5&#41; | &#40;&#40;vs&#91;3&#93; & 0xF8&#41; >> 3&#41;;
&#125;

-----------------------------------------

vcrs.t vd, vs, vt                           1            0
&#123;
  vd&#91;0&#93; = vs&#91;1&#93; * vt&#91;2&#93;;
  vd&#91;1&#93; = vs&#91;2&#93; * vt&#91;0&#93;;
  vd&#91;2&#93; = vs&#91;0&#93; * vt&#91;1&#93;;
&#125;

-----------------------------------------

// Negative reciprocal
vnrcp.q/t/p/s vd, vs &#40;UNSURE&#41;                  4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = -1.0 / vs&#91;i&#93;;
&#125;

-----------------------------------------

// Negative sinus
vnsin.q/t/p/s vd, vs &#40;UNSURE&#41;                  4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = -sin&#40;vs&#91;i&#93;*PI/2&#41;;
&#125;

-----------------------------------------

// Reciprocal exponent to base 2
vrexp2.q/t/p/s vd, vs                        4/?/?/?         3
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 1.0 / exp2&#40;vs&#91;i&#93;&#41;;
&#125;

-----------------------------------------

// Vector cross-product
vcrsp.t vd, vs, vt                           3            2
&#123;
  vd&#91;0&#93; = vs&#91;1&#93;*vt&#91;2&#93; - vs&#91;2&#93;*vt&#91;1&#93;;
  vd&#91;1&#93; = vs&#91;2&#93;*vt&#91;0&#93; - vs&#91;0&#93;*vt&#91;2&#93;;
  vd&#91;2&#93; = vs&#91;0&#93;*vt&#91;1&#93; - vs&#91;1&#93;*vt&#91;0&#93;;
&#125;

-----------------------------------------

// Vector determinant
vdet.p vd.s, vs, vt                           1            0
&#123;
  vd.s = vs&#91;0&#93; * vt&#91;1&#93; - vs&#91;1&#93; * vt&#91;0&#93;;
&#125;

-----------------------------------------

v&#40;u&#41;s2i.s vd.p, vs.s                        1            0
&#123;
  vd.p&#91;0&#93; = &#40;vs.s&#91;0&#93;&#40;16..31&#41;&#41; << 16;
  vd.p&#91;1&#93; = &#40;vs.s&#91;0&#93;&#40; 0..15&#41;&#41; << 16;
&#125;

v&#40;u&#41;s2i.p vd.q, vs.p                        1            0
&#123;
  vd.q&#91;0&#93; = &#40;vs.p&#91;0&#93;&#40;16..31&#41;&#41; << 16;
  vd.q&#91;1&#93; = &#40;vs.p&#91;0&#93;&#40; 0..15&#41;&#41; << 16;
  vd.q&#91;2&#93; = &#40;vs.p&#91;1&#93;&#40;16..31&#41;&#41; << 16;
  vd.q&#91;3&#93; = &#40;vs.p&#91;1&#93;&#40; 0..15&#41;&#41; << 16;
&#125;

-----------------------------------------

vi2&#40;u&#41;s.s vd.s, vs.p                        1            0
&#123;
  vd.s&#91;0&#93;&#40;16..31&#41; = vs.p&#91;0&#93; >> 16;
  vd.s&#91;0&#93;&#40; 0..15&#41; = vs.p&#91;1&#93; >> 16;
&#125;

vi2&#40;u&#41;s.p vd.p, vs.q                        1            0
&#123;
  vd.p&#91;0&#93;&#40;16..31&#41; = vs.q&#91;0&#93; >> 16;
  vd.p&#91;0&#93;&#40; 0..15&#41; = vs.q&#91;1&#93; >> 16;
  vd.p&#91;1&#93;&#40;16..31&#41; = vs.q&#91;2&#93; >> 16;
  vd.p&#91;1&#93;&#40; 0..15&#41; = vs.q&#91;3&#93; >> 16;
&#125;

-----------------------------------------

// Nvidia Half format &#91;S&#58;1&#93;&#91;E&#58;5&#93;&#91;M&#58;10&#93;
vh2f.p vd, vs                              1            0
&#123;
  vd&#91;0&#93; = &#40;&#40;vs&#91;0&#93; & 0x8000&#41; << 16&#41; | &#40;&#40;&#40;&#40;vs&#91;0&#93; >> 10&#41; & 0x1F&#41; + 0x70&#41; << 23&#41; | &#40;&#40;vs&#91;0&#93; & 0x03FF&#41; << 13&#41;;
  vd&#91;1&#93; = &#40;vs&#91;0&#93; & 0x80000000&#41; | &#40;&#40;&#40;&#40;vs&#91;0&#93; >> 10&#41; & 0x1F0000&#41; + 0x700000&#41; << 7&#41; | &#40;&#40;vs&#91;0&#93; & 0x03FF0000&#41; >> 3&#41;;
  vd&#91;2&#93; = &#40;&#40;vs&#91;1&#93; & 0x8000&#41; << 16&#41; | &#40;&#40;&#40;&#40;vs&#91;1&#93; >> 10&#41; & 0x1F&#41; + 0x70&#41; << 23&#41; | &#40;&#40;vs&#91;1&#93; & 0x03FF&#41; << 13&#41;;
  vd&#91;3&#93; = &#40;vs&#91;1&#93; & 0x80000000&#41; | &#40;&#40;&#40;&#40;vs&#91;1&#93; >> 10&#41; & 0x1F0000&#41; + 0x700000&#41; << 7&#41; | &#40;&#40;vs&#91;1&#93; & 0x03FF0000&#41; >> 3&#41;;
&#125;

-----------------------------------------

vsocp.p/s vd.q/p, vs.p/s                     1            0
&#123;
  for &#40;i = 0; i < |p/s|; ++i&#41;
    vd&#91;i*2+0&#93; = 1.0 - vs&#91;i&#93;;
    vd&#91;i*2+1&#93; = vs&#91;i&#93;;
&#125;

-----------------------------------------

vsbz.s vd.s, vs.s                           1            0
&#123;
// TODO Byte To Short Extension ?
&#125;

vsbn.s vd.s, vs.s, vt.s                        1            0
&#123;
// TODO Byte to Short Extension ?
&#125;

vlgb.s vd.s, vs.s                           1            0
&#123;
// TODO
&#125;

vwbn.s vd.s, vs.s, imm                        1            0
&#123;
// TODO Byte to Word Extension ?
&#125;

-----------------------------------------

viim.s vd.s, constant integer                   1            0
&#123;
  vd.s = constant integer &#40;between -32768 and 32767 ?&#41;;
&#125;

vfim.s vd.s, constant real                     1            0
&#123;
  vd.s = constant real;
&#125;

-----------------------------------------

vnop                                    1            0
&#123;
  // do nothing except eating 1 cycle
&#125;

-----------------------------------------

vflush                                    5            4
&#123;
  // TODO
&#125;

vsync                                    4            3
&#123;
  // TODO
&#125;

vsync i                                    1            0
&#123;
  // TODO
&#125;





NOTES&#58;

&#40;UNSURE&#41; besides an op means the given C counterpart is questionable

Clock ticks are benched estimates, but should be accurate.

*The latency column is to be understood like this&#58;
the exec cost is the &#40;clock&#41; ticks minus the latency and is unavoidable cost, while latency is the 'playroom' to interleave
the code with other &#40;independant&#41; ops without additional costs.
Unfortunately, this does not seem to work with VFPU ops - so either the VFPU isn't pipelined or most ops with latency
just use the whole pipeline already. It works however with normal mips code &#40;that's how it was benched&#41;. This code
interleaving is recommended especially with matrix and other costly ops.

Raphael · Post by **Raphael** » Wed Nov 08, 2006 5:53 am

hlide wrote: NOTE: in fact i was first puzzled by the <<16 operation but now i find it logical in so far as it simplifies the operation (no need to extend sign this way for vfpu logic circuits).

If you need then to convert them in floats, just do "vi2f vd, vs, 16".

Yep. Had the same problem when I tried converting short arrays to float arrays for VFPU processing in libavcodec. The same goes for the reverse way, ie first do "vf2i vd, vs, 16" and then "vi2(u)s vd, vs".

I'd suggest designing your notation to differentiate between single and quad registers, as sometimes they are combined in operations and it's not immediately clear which operand has which format. Sth. lik vqs/d is quad register and vss/d is single register or alike.
Here's some of my findings:

Code: Select all

vocp.s vsd, vss
&#123;
   vsd = 1.0 - vss
&#125;

vrsq.s vsd, vss
&#123;
  vsd = 1.0 / sqrt&#40;vss&#41;
&#125;

vsat0.q/t/p/s vqd, vqs
&#123;
  &#40;i=0..3&#41;
  vqd&#91;i&#93; = &#40;vqs&#91;i&#93; < 0&#41; ? 0 &#58; &#40;&#40;vqs&#91;i&#93; > 1.0&#41; ? 1.0 &#58; vqs&#91;i&#93;&#41;
&#125;

Apart from that, the vscl operation can also saturate using the destination register extension with brackets:

Code: Select all

vscl.q/t/p/s vqd&#91;L1&#58;T1, L2&#58;T2, L3&#58;T3, L4&#58;T4&#93;, vqs, vst
&#123;
  &#40;i=0..3&#41;
  vqd&#91;i&#93; = CLAMP&#40;vqs&#91;i&#93; * vst, Li, Ti&#41;
&#125;

So you can clamp to range -1:1 for example (useful for normalizations), or any other constants that can be used in those fields.

by the way, psp-documentation from hitmen seems to be in standby :/

Unfortunately, yes :(

hlide · Post by **hlide** » Wed Nov 08, 2006 9:48 am

added nearly all the instructions but a lot to be done too :/

dot_blank · Post by **dot_blank** » Wed Nov 08, 2006 11:03 am

i am finally glad somebody took it up themselves
to start something like this ....cheers hlide and raphael

Raphael · Post by **Raphael** » Wed Nov 08, 2006 1:31 pm

Some things from the list I can complete/confirm:

Code: Select all

// homogenous dot product
vhdp.q/t/p/s sd.s, vs, vt &#40;UNSURE&#41;
&#123;
  sd.s = vt.s;
  for &#40;i = 1; i < |q/t/p|; ++i&#41;
    sd.s += vs&#91;i&#93; * vt&#91;i&#93;;
&#125;

-----------------------------------------

// Funnel add components
vfad.q/t/p/s sd.s, vs
&#123;
  sd.s = 0;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    sd.s += vs&#91;i&#93;;
&#125;

-----------------------------------------

// Average of components
vavg.q/t/p/s sd.s, vs
&#123;
  sd.s = 0.0
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    sd.s += vs&#91;i&#93;;
  sd.s /= |q/t/p/s|;
&#125;

-----------------------------------------

// Round
vf2in.q/t/p/s vd, sd, imm
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = ROUND&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// Trunc
vf2iz.q/t/p/s vd, sd, imm
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = TRUNC&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// Floor
vf2iu.q/t/p/s vd, sd, imm
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = FLOOR&#40;vs&#91;i&#93;&#41; << imm;
&#125;

-----------------------------------------

// Ceil
vf2id.q/t/p/s vd, sd, imm
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = CEIL&#40;vs&#91;i&#93;&#41; << imm;
&#125; 

-----------------------------------------

vi2f.q/t/p/s vd, sd, imm
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = &#40;float&#41;&#40;vs&#91;i&#93; >> imm&#41;;
&#125;

-----------------------------------------

vcmov.q/t/p/s vd, sd, cc &#40;UNSURE&#41;
&#123;
  if &#40;CC&#91;cc&#93;&#41;
     vd = sd;
&#125;

vcmovt.q/t/p/s vd, sd, cc &#40;UNSURE&#41;
&#123;
  if &#40;CC&#91;cc&#93; == TRUE&#41;
     vd = sd;
&#125;

vcmovf.q/t/p/s vd, sd, cc &#40;UNSURE&#41;
&#123;
  if &#40;CC&#91;cc&#93; == FALSE&#41;
     vd = sd;
&#125;

-----------------------------------------

// matrix multiplication
vmmul.q/t/p md, ms, mt
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = 0;
      for &#40;k = 0; k < |q/t/p|; ++k&#41;
        md&#91;i&#93;&#91;j&#93; += ms&#91;i&#93;&#91;k&#93; * mt&#91;k&#93;&#91;j&#93;;
&#125;

-----------------------------------------

// Matrix-vector transform
vtfm4.q/3.t/2.p vd, md, vt
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = 0;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      vd&#91;i&#93; += md&#91;i&#93;&#91;j&#93; * vt&#91;j&#93;;
&#125; 

-----------------------------------------

// homogenous transform
vhtfm4.q/3.t/2.p/1.s vd, md, vt &#40;UNSURE esp 1.s case?&#41;
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 0;
    for &#40;j = 0; j < |q/t/p/s|; ++j&#41;
      vd&#91;i&#93; += md&#91;i&#93;&#91;j&#93; * vt&#91;j&#93;;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; /= vd&#91;|q/t/p/s|&#93;;
&#125;

-----------------------------------------

// matrix scale
vmscl.q/t/p md, ms, st
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = ms&#91;i&#93;&#91;j&#93; * st;
&#125; 

-----------------------------------------

vmmov.q/t/p md, ms
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = ms&#91;i&#93;&#91;j&#93;;
&#125;

-----------------------------------------

vmidt.q/t/p md
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = &#40;i == j&#41; ? 1.0 &#58; 0.0;
&#125;

-----------------------------------------

vmzero.q/t/p md
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = 0.0;
&#125;

-----------------------------------------

vmone.q/t/p md
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    for &#40;j = 0; j < |q/t/p|; ++j&#41;
      md&#91;i&#93;&#91;j&#93; = 1.0;
&#125;

-----------------------------------------

vrot.q/t/p vd, ss, &#91;+c/-c/-s/+s/0,...&#93;
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = +/- cos/sin&#40;ss&#41; | 0;
&#125;

-----------------------------------------

vnrcp.q/t/p/s vd, vs &#40;UNSURE&#41;
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = -1.0 / vs&#91;i&#93;;
&#125;

-----------------------------------------

vnsin.q/t/p/s vd, vs &#40;UNSURE&#41;
&#123;
  for &#40;i = 0; i < |q/t/p|; ++i&#41;
    vd&#91;i&#93; = -sin&#40;vs&#91;i&#93;*PI/2&#41;;
&#125;

-----------------------------------------

vrexp2.q/t/p/s vd, vs
&#123;
  for &#40;i = 0; i < |q/t/p/s|; ++i&#41;
    vd&#91;i&#93; = 1.0 / exp2&#40;vs&#91;i&#93;&#41;;
&#125;

-----------------------------------------

vcrsp.t vd, vs, vt
&#123;
  vd&#91;0&#93; = vs&#91;1&#93;*vt&#91;2&#93; - vs&#91;2&#93;*vt&#91;1&#93;;
  vd&#91;1&#93; = vs&#91;2&#93;*vt&#91;0&#93; - vs&#91;0&#93;*vt&#91;2&#93;;
  vd&#91;2&#93; = vs&#91;0&#93;*vt&#91;1&#93; - vs&#91;1&#93;*vt&#91;0&#93;;
&#125;

-----------------------------------------

I'd also suppose that the half format is [1:5:10], though the conversion steps still has to get found out, but it should be straight forward. No shift arguments there ;)

I wanted to do something like this for some time now, but always was too lazy to begin writing down everything :) I need to slap myself that hlide had to appear before I did something

I wonder what that vcrs.t does, as there already is the cross product. Also vdet.p, though that could possibly just be a simple (vs[0]*vt[1] - vs[1]*vt[0]). Are there definately no .t/q versions? Gonna play around with that when I find time and I'll then add some more things

hlide · Post by **hlide** » Wed Nov 08, 2006 6:02 pm

nice catch for vfad, i was clueless.

opc-mips.c :

there is only one vcrs.t and vdet.p. if vdet.t exists, its opcode would probably be something like 0x67808000 + vd.t + (vs.t << 8). But my opinion is that the computation of a determinant for 3d vector being different than a 2d vector may explain this :

det([a]) = a
det([[a b][c d]]) = ad - bc.
det([[a b c][d e f][g h i]) = aei + dhc + gbf - ceg - fha - ibd.

I would investigate vcrs.t as soon as I can.

I will add your diggins as soon as possible.

N.B.: is the word "diggins" correct or is this a pure invention of mine ? i fail to find a french traduction for this word.

hlide · Post by **hlide** » Wed Nov 08, 2006 9:59 pm

Raphael wrote:Some things from the list I can complete/confirm:
Code: Select all
// homogenous dot product
vhdp.q/t/p/s sd.s, vs, vt &#40;UNSURE&#41;
&#123;
  sd.s = vt.s;
  for &#40;i = 1; i < |q/t/p|; ++i&#41;
    sd.s += vs&#91;i&#93; * vt&#91;i&#93;;
&#125;

vhdp.q ==> return Xs + Ys*Yt + Zs*Zt + Ws*Wt ?

Raphael · Post by **Raphael** » Wed Nov 08, 2006 10:09 pm

hlide wrote:
Raphael wrote:Some things from the list I can complete/confirm:
Code: Select all
// homogenous dot product
vhdp.q/t/p/s sd.s, vs, vt &#40;UNSURE&#41;
&#123;
  sd.s = vt.s;
  for &#40;i = 1; i < |q/t/p|; ++i&#41;
    sd.s += vs&#91;i&#93; * vt&#91;i&#93;;
&#125;
vhdp.q ==> return Xs + Ys*Yt + Zs*Zt + Ws*Wt ?

Oh, no, actually it should be Xs*Xt + Ys*Yt + Zs*Zt + Wt :D But still not sure if that is correct

hlide · Post by **hlide** » Wed Nov 08, 2006 11:30 pm

i'm digging vcrs.t :

Code: Select all


vcrs.t &#91;1 0 0&#93;,&#91;1 0 0&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;1 0 0&#93;,&#91;0 1 0&#93; => &#91;0 0 1&#93; => vd&#91;2&#93; = vs&#91;0&#93; x vt&#91;1&#93; ? 
vcrs.t &#91;1 0 0&#93;,&#91;0 0 1&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;0 1 0&#93;,&#91;1 0 0&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;0 1 0&#93;,&#91;0 1 0&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;0 1 0&#93;,&#91;0 0 1&#93; => &#91;1 0 0&#93; => vd&#91;0&#93; = vs&#91;1&#93; x vt&#91;2&#93; ?
vcrs.t &#91;0 0 1&#93;,&#91;1 0 0&#93; => &#91;0 1 0&#93; => vd&#91;1&#93; = vs&#91;2&#93; x vt&#91;0&#93; ?
vcrs.t &#91;0 0 1&#93;,&#91;0 1 0&#93; => &#91;0 0 0&#93;
vcrs.t &#91;0 0 1&#93;,&#91;0 0 1&#93; => &#91;0 0 0&#93; 


vcrs.t &#91;1 2 0&#93;,&#91;1 2 0&#93; => &#91;0 0 2&#93; => &#91; 0, 0, vs&#91;0&#93; x vt&#91;1&#93; &#93; ! 
vcrs.t &#91;1 2 0&#93;,&#91;0 1 2&#93; => &#91;4 0 1&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, 0, vs&#91;0&#93; x vt&#91;1&#93; &#93; !  
vcrs.t &#91;1 2 0&#93;,&#91;2 0 1&#93; => &#91;2 0 0&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, 0, 0 &#93; !
vcrs.t &#91;0 1 2&#93;,&#91;1 2 0&#93; => &#91;0 2 0&#93; => &#91; 0, vs&#91;2&#93; x vt&#91;0&#93;, 0 &#93; !
vcrs.t &#91;0 1 2&#93;,&#91;0 1 2&#93; => &#91;2 0 0&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, 0, 0 &#93; ! 
vcrs.t &#91;0 1 2&#93;,&#91;2 0 1&#93; => &#91;1 4 0&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, vs&#91;2&#93; x vt&#91;0&#93; &#93; !
vcrs.t &#91;2 0 1&#93;,&#91;1 2 0&#93; => &#91;0 1 4&#93; => &#91; 0, vs&#91;2&#93; x vt&#91;0&#93;, vs&#91;0&#93; x vt&#91;1&#93; &#93; !
vcrs.t &#91;2 0 1&#93;,&#91;0 1 2&#93; => &#91;0 0 2&#93; => &#91; 0, 0, vs&#91;0&#93; x vt&#91;1&#93; &#93; !
vcrs.t &#91;2 0 1&#93;,&#91;2 0 1&#93; => &#91;0 2 0&#93; => &#91; 0, vs&#91;2&#93; x vt&#91;0&#93;, 0 &#93; !

it looks like :

Code: Select all

vcrs.t vd, vs, vt
&#123;
  vd&#91;0&#93; = vs&#91;1&#93; x vt&#91;2&#93;;
  vd&#91;1&#93; = vs&#91;2&#93; x vt&#91;0&#93;;
  vd&#91;2&#93; = vs&#91;0&#93; x vt&#91;1&#93;;
&#125;

hlide · Post by **hlide** » Wed Nov 08, 2006 11:54 pm

i'm digging vdet.p as we suppose it does : vd.s = vs[0] x vt[1] - vs[1] x vt[0].

some tests just to check :

Code: Select all

vdet.p &#91;1 0&#93;,&#91;1 0&#93; => 0
vdet.p &#91;1 0&#93;,&#91;0 1&#93; => 1 => vs&#91;0&#93; x vt&#91;1&#93;
vdet.p &#91;1 0&#93;,&#91;1 1&#93; => 1 => vs&#91;0&#93; x vt&#91;1&#93;
vdet.p &#91;0 1&#93;,&#91;1 0&#93; => -1 => -&#40;vs&#91;1&#93; x vt&#91;0&#93;&#41;
vdet.p &#91;0 1&#93;,&#91;0 1&#93; => 0
vdet.p &#91;0 1&#93;,&#91;1 1&#93; => -1 => -&#40;vs&#91;1&#93; x vt&#91;0&#93;&#41;
vdet.p &#91;1 1&#93;,&#91;1 0&#93; => -1 => -&#40;vs&#91;1&#93; x vt&#91;0&#93;&#41; 
vdet.p &#91;1 1&#93;,&#91;0 1&#93; => 1 =>  vs&#91;0&#93; x vt&#91;1&#93;
vdet.p &#91;1 1&#93;,&#91;1 1&#93; => 0 => vs&#91;0&#93; x vt&#91;1&#93; - vs&#91;1&#93; x vt&#91;0&#93; = 0

Code: Select all

vdet.p vd.s, vs, vt
&#123;
  vd.s = vs&#91;0&#93; * vt&#91;1&#93; - vs&#91;1&#93; * vt&#91;0&#93;;
&#125;

hlide · Post by **hlide** » Wed Nov 08, 2006 11:57 pm

LIST UPDATED !

hlide · Post by **hlide** » Thu Nov 09, 2006 2:28 am

hlide wrote:if vdet.t exists, its opcode would probably be something like 0x67808000 + vd.t + (vs.t << 8).

I tried this one -> crash. So only vdet.p seems to exist.

Raphael · Post by **Raphael** » Thu Nov 09, 2006 4:57 am

hlide wrote:i'm digging vcrs.t :

Code: Select all


vcrs.t &#91;1 0 0&#93;,&#91;1 0 0&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;1 0 0&#93;,&#91;0 1 0&#93; => &#91;0 0 1&#93; => vd&#91;2&#93; = vs&#91;0&#93; x vt&#91;1&#93; ? 
vcrs.t &#91;1 0 0&#93;,&#91;0 0 1&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;0 1 0&#93;,&#91;1 0 0&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;0 1 0&#93;,&#91;0 1 0&#93; => &#91;0 0 0&#93; 
vcrs.t &#91;0 1 0&#93;,&#91;0 0 1&#93; => &#91;1 0 0&#93; => vd&#91;0&#93; = vs&#91;1&#93; x vt&#91;2&#93; ?
vcrs.t &#91;0 0 1&#93;,&#91;1 0 0&#93; => &#91;0 1 0&#93; => vd&#91;1&#93; = vs&#91;2&#93; x vt&#91;0&#93; ?
vcrs.t &#91;0 0 1&#93;,&#91;0 1 0&#93; => &#91;0 0 0&#93;
vcrs.t &#91;0 0 1&#93;,&#91;0 0 1&#93; => &#91;0 0 0&#93; 


vcrs.t &#91;1 2 0&#93;,&#91;1 2 0&#93; => &#91;0 0 2&#93; => &#91; 0, 0, vs&#91;0&#93; x vt&#91;1&#93; &#93; ! 
vcrs.t &#91;1 2 0&#93;,&#91;0 1 2&#93; => &#91;4 0 1&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, 0, vs&#91;0&#93; x vt&#91;1&#93; &#93; !  
vcrs.t &#91;1 2 0&#93;,&#91;2 0 1&#93; => &#91;2 0 0&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, 0, 0 &#93; !
vcrs.t &#91;0 1 2&#93;,&#91;1 2 0&#93; => &#91;0 2 0&#93; => &#91; 0, vs&#91;2&#93; x vt&#91;0&#93;, 0 &#93; !
vcrs.t &#91;0 1 2&#93;,&#91;0 1 2&#93; => &#91;2 0 0&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, 0, 0 &#93; ! 
vcrs.t &#91;0 1 2&#93;,&#91;2 0 1&#93; => &#91;1 4 0&#93; => &#91; vs&#91;1&#93; x vt&#91;2&#93;, vs&#91;2&#93; x vt&#91;0&#93; &#93; !
vcrs.t &#91;2 0 1&#93;,&#91;1 2 0&#93; => &#91;0 1 4&#93; => &#91; 0, vs&#91;2&#93; x vt&#91;0&#93;, vs&#91;0&#93; x vt&#91;1&#93; &#93; !
vcrs.t &#91;2 0 1&#93;,&#91;0 1 2&#93; => &#91;0 0 2&#93; => &#91; 0, 0, vs&#91;0&#93; x vt&#91;1&#93; &#93; !
vcrs.t &#91;2 0 1&#93;,&#91;2 0 1&#93; => &#91;0 2 0&#93; => &#91; 0, vs&#91;2&#93; x vt&#91;0&#93;, 0 &#93; !

it looks like :

Code: Select all

vcrs.t vd, vs, vt
&#123;
  vd&#91;0&#93; = vs&#91;1&#93; x vt&#91;2&#93;;
  vd&#91;1&#93; = vs&#91;2&#93; x vt&#91;0&#93;;
  vd&#91;2&#93; = vs&#91;0&#93; x vt&#91;1&#93;;
&#125;

That makes sense, as it would be one part of the crossproduct. I need to redo my VFPU clocktick bench with those new ops :)
About the vdet.t I don't know. If it exists, it shouldn't crash normally. Is it supported by GCC?

And the homogenuous dot product needs revision to make up for (x*x+y*y+z*z+w):

Code: Select all

    __&#91; homogenous dot product &#93;__

vhdp.q/t/p vd.s, vs, vt &#40;UNSURE&#41;
&#123;
  vd.s = vt&#91;|q/t/p|&#93;;
  for &#40;i = 0; i < |q/t/p|-1; ++i&#41;
    vd.s += vs&#91;i&#93; * vt&#91;i&#93;;
&#125;

So the last component of the second operand vs is considered to be 1.0 basically. Still unsure/needs checking

hlide · Post by **hlide** » Thu Nov 09, 2006 5:05 am

Raphael wrote:About the vdet.t I don't know. If it exists, it shouldn't crash normally. Is it supported by GCC?

as already said, it crashes so it is not supported.

Raphael · Post by **Raphael** » Thu Nov 09, 2006 5:08 am

hlide wrote:
Raphael wrote:About the vdet.t I don't know. If it exists, it shouldn't crash normally. Is it supported by GCC?
as already said, it crashes so it is not supported.

Oh yes, I misread the "if" there :D It makes sense to not exist, since as you said the 3d vector determinant needs three input vectors. And that's not possible at all

hlide · Post by **hlide** » Fri Nov 10, 2006 6:15 pm

Raphael wrote:
hlide wrote:
Raphael wrote:About the vdet.t I don't know. If it exists, it shouldn't crash normally. Is it supported by GCC?
as already said, it crashes so it is not supported.
Oh yes, I misread the "if" there :D It makes sense to not exist, since as you said the 3d vector determinant needs three input vectors. And that's not possible at all

if you have their cycles, it would be interesting to add in the list. :)

Raphael · Post by **Raphael** » Fri Nov 17, 2006 1:21 am

Do you have any idea which version of binutils/pspsdk I need to have, to be able to use the vbtf1/2 ops? I just tried updating pspsdk but that didn't help yet, the ops still aren't recognized. I tried updating binutils, but somehow that failed, so I need to try again.
I'll have an update to the document soon. A few new ops decoded plus most clock ticks.

hlide · Post by **hlide** » Fri Nov 17, 2006 1:34 am

i'm using DevkitPro and the last devkitPSP release 8.

http://sourceforge.net/project/showfile ... _id=157350

hlide · Post by **hlide** » Fri Nov 17, 2006 1:39 am

oh my ! shouldn't be vbfy1/2 ?

I'm sorry, I DID misname them. I updated the text with the correct names.

Raphael · Post by **Raphael** » Fri Nov 17, 2006 1:50 am

hlide wrote:oh my ! shouldn't be vbfy1/2 ?

I'm sorry, I DID misname them. I updated the text with the correct names.

Heh, that did the trick :) thanks

Raphael · Post by **Raphael** » Fri Nov 17, 2006 2:47 am

Update to the document:
- added some ops C counterpart (vi2c, vqmul, ..)
- added lv/sv ops for completeness
- added clock ticks for nearly all ops (only some for .t/.p/.s versions are missing)
- moved operand prefixes up to pretty much the top

Code: Select all

deleted

hlide · Post by **hlide** » Fri Nov 17, 2006 6:27 am

Code: Select all

vpfxs &#91;?0,?1,?2,?3&#93;

?0, ?1, ?2 or ?3 can be &#58;

x &#58; vs&#91;0&#93;
y &#58; vs&#91;1&#93;
z &#58; vs&#91;2&#93;
w &#58; vs&#91;3&#93;
-x &#58; -vs&#91;0&#93;
-y &#58; -vs&#91;1&#93;
-z &#58; -vs&#91;2&#93;
-w &#58; -vs&#91;3&#93;
|x| &#58; |vs&#91;0&#93;| &#40;absolute value of vs&#91;0&#93;&#41;
|y| &#58; |vs&#91;1&#93;| &#40;absolute value of vs&#91;1&#93;&#41;
|z| &#58; |vs&#91;2&#93;| &#40;absolute value of vs&#91;2&#93;&#41;
|w| &#58; |vs&#91;3&#93;| &#40;absolute value of vs&#91;3&#93;&#41;
0 &#58; constant 0 
1 &#58; constant 1 
2 &#58; constant 2 
1/2 &#58; constant 1/2 
3 &#58; constant 3 
1/3 &#58; constant 1/3 
1/4 &#58; constant 1/4 
1/6 &#58; constant 1/6 

---------------------------------
vpfxt &#91;?0,?1,?2,?3&#93;

same thing as vpfxs but for vt register

---------------------------------
vpfxd &#91;?4,?5,?6,?7&#93;

?4, ?5, ?6 and ?7 can be &#58;

&#91;0&#58;1&#93; &#58; saturated between 0 and 1,
&#91;-1&#58;1&#93; &#58; saturated between -1 and 1,
m &#58; ??? unknown

They are "documented" in opcodes\mips-dis.c :

Code: Select all

static const char * const pfx_cst_names&#91;8&#93; = &#123;
  "0",  "1",  "2",  "1/2",  "3",  "1/3",  "1/4",  "1/6"
&#125;;

static const char * const pfx_swz_names&#91;4&#93; = &#123;
  "x",  "y",  "z",  "w"
&#125;;

static const char * const pfx_sat_names&#91;4&#93; = &#123;
  "",  "&#91;0&#58;1&#93;",  "",  "&#91;-1&#58;1&#93;"
&#125;;

...

            case '0'&#58;
            case '1'&#58;
            case '2'&#58;
            case '3'&#58;
              &#123;
                unsigned int pos = *d, base = '0';
                unsigned int negation = &#40;l >> &#40;pos - &#40;base - VFPU_SH_PFX_NEG&#41;&#41;&#41; & VFPU_MASK_PFX_NEG;
                unsigned int constant = &#40;l >> &#40;pos - &#40;base - VFPU_SH_PFX_CST&#41;&#41;&#41; & VFPU_MASK_PFX_CST;
                unsigned int abs_consthi =
                    &#40;l >> &#40;pos - &#40;base - VFPU_SH_PFX_ABS_CSTHI&#41;&#41;&#41; & VFPU_MASK_PFX_ABS_CSTHI;
                unsigned int swz_constlo = &#40;l >> &#40;&#40;pos - base&#41; * 2&#41;&#41; & VFPU_MASK_PFX_SWZ_CSTLO;

                if &#40;negation&#41;
                  &#40;*info->fprintf_func&#41; &#40;info->stream, "-"&#41;;
                if &#40;constant&#41;
                  &#123;
                    &#40;*info->fprintf_func&#41; &#40;info->stream, "%s",
                                           pfx_cst_names&#91;&#40;abs_consthi << 2&#41; | swz_constlo&#93;&#41;;
                  &#125;
                else
                  &#123;
                    if &#40;abs_consthi&#41;
                      &#40;*info->fprintf_func&#41; &#40;info->stream, "|%s|",
                                             pfx_swz_names&#91;swz_constlo&#93;&#41;;
                    else
                      &#40;*info->fprintf_func&#41; &#40;info->stream, "%s",
                                             pfx_swz_names&#91;swz_constlo&#93;&#41;;
                  &#125;
              &#125;
              break;

            case '4'&#58;
            case '5'&#58;
            case '6'&#58;
            case '7'&#58;
              &#123;
                unsigned int pos = *d, base = '4';
                unsigned int mask = &#40;l >> &#40;pos - &#40;base - VFPU_SH_PFX_MASK&#41;&#41;&#41; & VFPU_MASK_PFX_MASK;
                unsigned int saturation = &#40;l >> &#40;&#40;pos - base&#41; * 2&#41;&#41; & VFPU_MASK_PFX_SAT;

                if &#40;mask&#41;
                  &#40;*info->fprintf_func&#41; &#40;info->stream, "m"&#41;;
                else
                  &#40;*info->fprintf_func&#41; &#40;info->stream, "%s",
                                         pfx_sat_names&#91;saturation&#93;&#41;;
              &#125;
              break;

Raphael · Post by **Raphael** » Fri Nov 17, 2006 8:43 pm

Another update:
- added vsrt*, vsocp, vf2h/vh2f
- added prefix information from hlide's last post
- removed exec cycles from exec/latency column (better readability) and added missing latencies for .t/p/s variations
- added '?' where clock ticks information is missing

only missing ops now are vcmp versions, byte to X extensions and vflush as well as vsync.

The information should next be formatted in a better readable way into a .pdf or something.

Code: Select all

deleted

hlide · Post by **hlide** » Fri Nov 17, 2006 9:41 pm

vsrt1/2/3/4.q vd, vs are very tough ones but i think to discover what they do :

Code: Select all

vsrt1.q vd, vs
&#123;
  vd&#91;0&#93; = min&#40;vs&#91;0&#93;, vs&#91;1&#93;&#41;;
  vd&#91;1&#93; = max&#40;vs&#91;1&#93;, vs&#91;0&#93;&#41;;
  vd&#91;2&#93; = min&#40;vs&#91;2&#93;, vs&#91;3&#93;&#41;;
  vd&#91;3&#93; = max&#40;vs&#91;3&#93;, vs&#91;2&#93;&#41;;
&#125;

vsrt2.q vd, vs
&#123;
  vd&#91;0&#93; = min&#40;vs&#91;0&#93;, vs&#91;3&#93;&#41;;
  vd&#91;1&#93; = max&#40;vs&#91;1&#93;, vs&#91;2&#93;&#41;;
  vd&#91;2&#93; = min&#40;vs&#91;2&#93;, vs&#91;1&#93;&#41;;
  vd&#91;3&#93; = max&#40;vs&#91;3&#93;, vs&#91;0&#93;&#41;;
&#125;

vsrt3.q vd, vs
&#123;
  vd&#91;0&#93; = max&#40;vs&#91;0&#93;, vs&#91;1&#93;&#41;;
  vd&#91;1&#93; = min&#40;vs&#91;1&#93;, vs&#91;0&#93;&#41;;
  vd&#91;2&#93; = max&#40;vs&#91;2&#93;, vs&#91;3&#93;&#41;;
  vd&#91;3&#93; = min&#40;vs&#91;3&#93;, vs&#91;2&#93;&#41;;
&#125;

vsrt4.q vd, vs
&#123;
  vd&#91;0&#93; = max&#40;vs&#91;0&#93;, vs&#91;3&#93;&#41;;
  vd&#91;1&#93; = max&#40;vs&#91;1&#93;, vs&#91;2&#93;&#41;;
  vd&#91;2&#93; = min&#40;vs&#91;2&#93;, vs&#91;1&#93;&#41;;
  vd&#91;3&#93; = min&#40;vs&#91;3&#93;, vs&#91;0&#93;&#41;;
&#125;

I wish Raphael can confirm those operations.

I used 4 vectors as vs :
[1 2 3 4]
[2 3 4 1]
[3 4 1 2]
[4 1 2 3]

results for vsrt1 :
[1 2 3 4]
[2 3 4 1] => 4->1 and 1->4
[3 4 1 2]
[4 1 2 3] => 4->1 and 1->4

results for vsrt2 :
[1 2 3 4]
[1 3 4 2] => 2->1 and 1->2
[2 1 4 3] => 3->2 and 4->1 and 1->4 and 2->3
[4 1 2 3] => 4->1 and 1->4

results for vsrt3 :
[2 1 4 3] => 1->2 and 2->1 and 3->4 and 4->3
[3 2 4 1] => 2->3 and 3->2
[4 3 2 1] => 3->4 and 4->3 and 1->2 and 2->1
[4 1 3 2] => 2->3 and 3->2

results for vsrt4 :
[4 3 2 1] => 1->4 and 2->3 and 3->2 and 4->1
[2 4 3 1] => 3->4 and 4->3
[3 4 1 2]
[4 2 1 3] => 1->2 and 2->1

Due to their apparent "random" permutations, i felt min and max were probably the key to their weirdness.

hlide · Post by **hlide** » Fri Nov 17, 2006 9:45 pm

oh i miss you post, Raphael ! well i can compare yours addition with mine. :)

hlide · Post by **hlide** » Fri Nov 17, 2006 10:00 pm

Raphael:

ok, we found the same thing for vsrt1/2/3/4, that should be okay.

I updated the textfile in the first message, so i think you can erase your long text to alleviate the number of page to browse :).

By the way, groepaz plans to update his document with our findings.

Raphael · Post by **Raphael** » Fri Nov 17, 2006 10:59 pm

Heh, finally, he already said he'd update it when I first posted my VFPU clock cycles :P
EDIT: I think we can leave only your min/max code for vsrt*, it's shorter and easier to read
Oh, and do you know how you can seed the random generator for VFPU?

hlide · Post by **hlide** » Fri Nov 17, 2006 11:07 pm

Raphael wrote:Heh, finally, he already said he'd update it when I first posted my VFPU clock cycles :P
EDIT: I think we can leave only your min/max code for vsrt*, it's shorter and easier to read
Oh, and do you know how you can seed the random generator for VFPU?

VFPU has control registers and some are relative to random seed i guess. They are documented in groepaz's document.

Code: Select all

128 	VFPU_PFXS 	Source prefix stack
129 	VFPU_PFXT 	Target prefix stack
130 	VFPU_PFXD 	Destination prefix stack
131 	VFPU_CC 	Condition information
132 	VFPU_INF4 	VFPU internal information 4
133 	VFPU_RSV5 	Not used &#40;reserved&#41;
134 	VFPU_RSV6 	Not used &#40;reserved&#41;
135 	VFPU_REV 	VFPU revision information
136 	VFPU_RCX0 	Pseudorandom number generator information 0
137 	VFPU_RCX1 	Pseudorandom number generator information 1
138 	VFPU_RCX2 	Pseudorandom number generator information 2
139 	VFPU_RCX3 	Pseudorandom number generator information 3
140 	VFPU_RCX4 	Pseudorandom number generator information 4
141 	VFPU_RCX5 	Pseudorandom number generator information 5
142 	VFPU_RCX6 	Pseudorandom number generator information 6
143 	VFPU_RCX7 	Pseudorandom number generator information 7

Raphael · Post by **Raphael** » Fri Nov 17, 2006 11:16 pm

Yeah, just stumbled upon those too. Hm, unfortunately I have no clue how to use them. Would be nice though, seeing how the vector random generator only takes 3 cycles to generate one random number.

hlide · Post by **hlide** » Fri Nov 17, 2006 11:30 pm

vone.q and vzero.q take 3 cycles !?

vmov.q vd, vs[1, 1, 1, 1] and vmov.q vd, vs[0, 0, 0, 0] don't give us better cycles ? (at least 2 cyles instead of 3 ?), do they ?

random stuff, i'm trying to see how to use them.

forums.ps2dev.org

VFPU diggins

VFPU diggins

Re: VFPU diggins