netkas.org forum
May 29, 2017, 03:04:09 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Information for registering users http://forum.netkas.org/index.php/topic,2246.0.html
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Kepler cards and PCIE speed, how to tell what your card is running at  (Read 20712 times)
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« on: March 04, 2014, 10:51:11 PM »

With each new family of cards the means to turn on PCIE 2 functions has moved.

With  NVIDIA cards it will be a byte or two somewhere in the rom. With AMD cards it has been a 10k resistor that either gets added or removed. These AMD resistors are the easiest to find when you have identical cards for PC and Mac. This is why 7950/70 was so quickly found. This is also why 7870 HASNT been found, nothing to compare too.

NVIDIA has changed the "switch" in the rom with each new family of cards. I have found it every time but it always takes awhile. With the Kepler cards they upped the game. There are now TWO switches. One turns the actual function on, one changes what is noted in System Profiler.

In addition, Kepler cards are the least flash friendly cards I have ever toyed with. There are now 5 sections to rom all of which are interconnected. Changing a few bytes in one may break the whole rom if corresponding bytes aren't changed in some or all of other 5 sections. Adding to this, they have tried to cut down on RMAs induced by yocals flashing modded roms that brick the cards. To do this Nvflash utility does certain comparisons before flashing. If it sees changes where they aren't allowed, it won't flash.

So working on Kepler roms was much more challenging than any previous roms. The cards related to shipping cards were difficult, the GK110 cards were nearly impossible.

So after a couple weeks of painful testing I was able to isolate where the functional PCIE 2.0 switch was. Easy to demonstrate that PCIE 2.0 is enabled in OSX and Windows. In Windows it is especially easy as without our modded rom all cards are stuck at PCIE 1.0.

A quick running of GPU-Z shows this. With unflashed card the speed is always 1. With flashed card the speed will show as capable of 2 but running at 1.

You then hit the little "?" icon to run stress test and it draws a little moving graphic, speed instantly jumps to 2.

With unflashed card, speed will stay at 1 no matter what you do.

To see PCIE speed in OSX is tricker. I spent 2 weeks trying to isolate the cosmetic part after the functional part was found. Frequently this resulted in remove and replace EEPROM surgeries. I was also  informed that during these 2 weeks I was especially unpleasant to be around. One day I decided that the fact that it was 100% functional was more important and that people were waiting for the larger variety of cards that we could make available.

So I stopped looking for the cosmetic switch.

So, every single Kepler GPU we have sent out has said "2.5" in the speed section. The only exceptions are the 680s. These run with just a mildly modded version of stock Mac 680 rom so they have both functional and cosmetic switches on.

To find true speed in OSX, the best choice is a little command line tool called "lspci". You run it in verbose mode and it gives specs on everything on PCI bus. Buried in the minutiae is your GPU. Run originally it will show GPU in low power mode running at 2.5 Gt/s. If you then run something benchmarkyish and then click on terminal window and hit "up arrow" followed by return it will repeat the command and THIS TIME you will see GPU running at 5.0 Gt/s. Shut down benchmark and do "up arrow" and return again and you will see that card has gone back into low power mode, this is EXACTLY how it is meant to function.

For those frightened of terminal window there are 2 other GUI options. OpenCl Oceanwaves shows bandwidth before it runs benchmark. Numbers around 2-3,000 mean PCIE 1.0 speed. Numbers around 5-6,000 denote PCIE 2.0 speed. Note that running the actual benchmark in Mavericks requires a quartz debug "disable beam sync" to get past 30 or 60 hz limit of display refresh. However, seeing the PCIE bandwidth doesn't require this fix. Download and run it, bingo, instant answer.

If you have CUDA installed you can view the numbers in CUDA-Z. Host to Device numbers in 2-3,000 range are, well just look at what I typed above for Ocean waves, same numbers.

I will come back and attach screen shots and links. But anyone and everyone is welcome to try this out. Keep in mind that recent  NVIDIA drivers enable PCIE 2.0 in OSX on 4,1/5,1 Mac Pros but not on 3,1. Booting into Windows it is easier to tell right away. Our EFI'd cards are the only ones running at PCIE 2.0, unflashed Nvidia cards are all stuck at 1.0.

Note the GTX680 pre-flash, it lists the potential and actual bus speeds as PCIE 1.0. Once it is flashed it will show PCIE 2.0 as possible with PCIE 1.0 as current until you hit the little "?" to add 3D stress to card.



* 290 at 1.gif (21.42 KB, 393x486 - viewed 797 times.)

* 680@1.gif (21.33 KB, 393x486 - viewed 689 times.)

* 680@2.gif (20.98 KB, 393x486 - viewed 658 times.)
« Last Edit: March 05, 2014, 02:23:56 AM by Rominator » Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« Reply #1 on: March 05, 2014, 01:45:05 AM »

https://dl.dropbox.com/u/88732606/Apps/lspci%20V1.1.pkg.zip

install, restart, and run with "lspci -vv" in terminal

search for "10DE" if you have Nvidia, "1002" if AMD/ATI.

02:00.0 Class 0300: Unknown device 10de:100c (rev a1)
   Subsystem: Unknown device 3842:3790
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
   Latency: 0
   Interrupt: pin A routed to IRQ 19
   Region 0: Memory at 8a000000 (32-bit, non-prefetchable)
   Region 1: Memory at <ignored> (64-bit, prefetchable)
   Region 3: Memory at <ignored> (64-bit, prefetchable)
   Region 5: I/O ports at 2000 [disabled]
   Capabilities: [60] Power Management version 3
      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
      Status: D0 PME-Enable- DSel=0 DScale=0 PME-
   Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
      Address: 00000000fee00000  Data: 4072
   Capabilities: [78] Express (v2) Endpoint, MSI 00
      DevCap:   MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
      DevCtl:   Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
         RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
         MaxPayload 128 bytes, MaxReadReq 512 bytes
      DevSta:   CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
      LnkCap:   Port #8, Speed 5GT/s, Width x16, ASPM unknown, Latency L0 <512ns, L1 <4us
         ClockPM+ Suprise- LLActRep- BwNot-
      LnkCtl:   ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
         ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
      LnkSta:   Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Note that the first few "10DE"s you find will be the audio portion that isn't used much in OSX.

Look for device id of your card.

Here is an unflashed 680:

02:00.0 Class 0300: Unknown device 10de:1180 (rev a1)
   Subsystem: Unknown device 10de:0969
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
   Latency: 0
   Interrupt: pin A routed to IRQ 19
   Region 0: Memory at 8a000000 (32-bit, non-prefetchable)
   Region 1: Memory at <ignored> (64-bit, prefetchable)
   Region 3: Memory at <ignored> (64-bit, prefetchable)
   Region 5: I/O ports at 2000 [disabled]
   Capabilities: [60] Power Management version 3
      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
      Status: D0 PME-Enable- DSel=0 DScale=0 PME-
   Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
      Address: 00000000fee00000  Data: 4072
   Capabilities: [78] Express (v1) Endpoint, MSI 00
      DevCap:   MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
      DevCtl:   Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
         RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
         MaxPayload 128 bytes, MaxReadReq 512 bytes
      DevSta:   CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
      LnkCap:   Port #8, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
         ClockPM+ Suprise- LLActRep- BwNot-
      LnkCtl:   ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
         ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
      LnkSta:   Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
   Capabilities: [b4] Vendor Specific Information <?>
   Capabilities: [100] #10de
   Capabilities: [118] #3f

And here is the GTX680 that has been flashed, note that just like as shown in GPU-Z in Windows, it shows "capability" as 5.0 but current status as 2.5.

02:00.0 Class 0300: Unknown device 10de:1180 (rev a1)
   Subsystem: Unknown device 3842:0969
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
   Latency: 0
   Interrupt: pin A routed to IRQ 19
   Region 0: Memory at 8a000000 (32-bit, non-prefetchable)
   Region 1: Memory at <ignored> (64-bit, prefetchable)
   Region 3: Memory at <ignored> (64-bit, prefetchable)
   Region 5: I/O ports at 2000 [disabled]
   Capabilities: [60] Power Management version 3
      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
      Status: D0 PME-Enable- DSel=0 DScale=0 PME-
   Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
      Address: 00000000fee00000  Data: 4072
   Capabilities: [78] Express (v2) Endpoint, MSI 00
      DevCap:   MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
      DevCtl:   Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
         RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
         MaxPayload 128 bytes, MaxReadReq 512 bytes
      DevSta:   CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
      LnkCap:   Port #8, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
         ClockPM+ Suprise- LLActRep- BwNot-
      LnkCtl:   ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
         ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
      LnkSta:   Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
   Capabilities: [b4] Vendor Specific Information <?>
   Capabilities: [100] #10de
   Capabilities: [118] #3f
« Last Edit: March 05, 2014, 02:29:37 AM by Rominator » Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« Reply #2 on: March 05, 2014, 01:48:57 AM »

http://cuda-z.sourceforge.net

Look on the "pinned" column in the "Performance" pane.

I will add from an unflashed 680 to show difference


* Screen Shot 2014-03-04 at 4.42.20 PM.png (75.6 KB, 546x529 - viewed 871 times.)

* Screen Shot 2014-03-04 at 4.56.39 PM.png (71.25 KB, 538x528 - viewed 708 times.)
« Last Edit: March 05, 2014, 02:02:19 AM by Rominator » Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« Reply #3 on: March 05, 2014, 01:51:20 AM »

Ignore the breasts if they bother you

http://www.datafilehost.com/d/42edaf5d


* Screen Shot 2014-03-04 at 4.43.01 PM.png (56.07 KB, 268x531 - viewed 699 times.)

* Screen Shot 2014-03-04 at 4.57.56 PM.png (56.32 KB, 286x539 - viewed 543 times.)
« Last Edit: March 05, 2014, 02:02:41 AM by Rominator » Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
kdekid
Newbie

Offline Offline

Posts: 2


« Reply #4 on: March 05, 2014, 04:22:54 AM »

http://cuda-z.sourceforge.net

Look on the "pinned" column in the "Performance" pane.

I will add from an unflashed 680 to show difference

Just FYI, my unflashed EVGA GTX 660 Ti is running at full speed in 10.9.2. Well, maybe full speed isn't entirely correct -- Device-to-Device speed is half of what you have.


* Screenshot 2014-03-04 22.20.28.jpg (236.45 KB, 1404x938 - viewed 597 times.)
« Last Edit: March 05, 2014, 04:24:44 AM by kdekid » Logged
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« Reply #5 on: March 05, 2014, 05:00:51 AM »

http://cuda-z.sourceforge.net

Look on the "pinned" column in the "Performance" pane.

I will add from an unflashed 680 to show difference

Just FYI, my unflashed EVGA GTX 660 Ti is running at full speed in 10.9.2. Well, maybe full speed isn't entirely correct -- Device-to-Device speed is half of what you have.

As I noted at least once, a 4,1 or 5,1 Mac Pro will run at 5.0 in OSX. (after 10.8.3 I think, could be wrong)

It is only in Windows that you will find speed stuck at 2.5.

Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
mysticalos
Hero Member
*****
Offline Offline

Posts: 581


« Reply #6 on: March 05, 2014, 08:12:54 AM »

HACKINTOSH  OS X 10.9.2 Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz 3500 MHz
GPU           GeForce GTX 780    1032 MHz       60.6 fps
AA OFF (default) 
Bandwidthes:  device>host:10571.5MB/s host>device:   8426.2MB/s device >device: 158462.5MB/s

Does this mean I win?

Is this running at 3.0? Seems a bit high for 2.0


* bandwidths.jpg (180.16 KB, 944x770 - viewed 701 times.)
Logged
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« Reply #7 on: March 05, 2014, 10:37:32 AM »

Yep, you win.

What the Mac Pro 6,1 should have been.
Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
frankiee
Newbie

Offline Offline

Posts: 21


« Reply #8 on: March 05, 2014, 06:59:52 PM »

OK, I think I win, too:  Grin

Host to Device: ~11GiB/s (Cuda Z)

So that is PCI3? Running a stock TITAN with no modifications at all.
Logged
Sebinouse
Jr. Member
**
Offline Offline

Posts: 63



« Reply #9 on: March 17, 2015, 11:59:37 AM »

I've been playing with my eGPU and I try to understand the results :
GPU>PCIe>CPU : 4,5 GB/s

Hardware :
PNY GTX750ti : PCIe v3.0 x16
EXP GDC adapter to Mini-PCIe : PCIe x16 with DMI 5GT (PCIe v2.0 ?) to Mini-PCIe x1
Intel NUC D54250 : Intel Core i5-4250 (DMI 5GT, PCIe v2.0, 12 lanes: 4 x1 + 2 x4)

It seems that the adapter limits the bandwidth to one lane of PCIe v2.0, so I guess I don't have any chance to get faster transfers ...

Useful table from wikipedia



* Capture d’écran 2015-03-11 à 21.24.03.jpg (189.8 KB, 966x792 - viewed 505 times.)

* Capture d’écran 2015-03-17 à 12.00.00.jpg (56.31 KB, 670x198 - viewed 3970 times.)
« Last Edit: March 24, 2015, 08:29:00 AM by Sebinouse » Logged
Sebinouse
Jr. Member
**
Offline Offline

Posts: 63



« Reply #10 on: March 19, 2015, 12:39:04 PM »

Look on the "pinned" column in the "Performance" pane

CUDA Z Pinned Device to Host : 390 MiB/s
OceanWave GPU>PCIe>CPU : 4,5 GB/s
 Huh
Which one is right ?
(I will try LSPCI next)


* Cuda_Z_Perf.png (83.6 KB, 649x642 - viewed 596 times.)
Logged
Rominator
Hero Member
*****
Offline Offline

Posts: 2118



« Reply #11 on: March 24, 2015, 07:55:41 AM »

The CUDA-Z exhibits what you would expect.
Logged

Before asking a question, check your "Personal Settings" and be sure that you have "Brain Services" set to "On".
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines
SMFAds for Free Forums
Valid XHTML 1.0! Valid CSS!