Metastock Plugins
Images/Products/MetaStock%20Products/DMAHst%20Metastock.jpg' alt='Metastock Plugins' title='Metastock Plugins' />Ami. Broker Knowledge Base. From time to time users approach us asking various questions related to multithreading such as Why hisher formula does not run 3. Will 1. 6 core processor be twice as fast as 8 core. Why their CPU does not show 1. The reason of all those questions is lack of understanding of multithreading and laws governing computing in general. In this article we will try to address some of those misunderstandings and misconceptions. We assume that the reader already read Efficient use of multitreading from Ami. Brokers Users guide and is fully aware how work is distributed in many threads in the Analysis window. We also assume that the reader already read Peformance Tunning chapter of Ami. Brokers guide. These two parts of the manual explain fundamental concepts and are essential to understanding of what is written below. Another fundamental reading is Amdahls Law article in the Wikipedia that explains theoretical speedup limit of any multi threaded program. In short Amdahls law says that if 9. CPUs and how many cores you have is 2. Let us focus on Analysis window performance ExplorationScanBacktestingOptimization. Any operation in the Analysis window involves preparing data this involves reading data from the database, data compression to selected interval, filtering, padding, etc. AFL engine for execution setting up built in arrays, stops, parsing of your formula. NOT for every symbol. Ami. Broker is highly parallel multithreading application, so most of steps are done in multiple threads. Specifically only first and last 1. It is worth noting that steps 1 4 are done on every symbol, while step 5 is only done once for all symbols. In addition to that program spends some time handling the UI things like updating UI controls like progress bar and reacting to your mouse keyboard input which is of course done in single main UI thread. There is one exception, a special case Individual optimization. In individual optimization step 1 is done only once for one symbol, and all other steps 2 5 so including last one are done in multiple threads. Download The Foundation by Southwind v13. Int AFL Code. To Install this indicator. Extract the FoundationInt. Traders, We are taking all your queries on writing your scriptstrategy on algoZ in the blog, Code your Technical Analysis Strategy. I have gotten a few queries. This page lists comparison between nimbleDataProPlus and nimbleDataPro. Now is where Amdahls law kicks in. By adding threadscoresprocessors you can only decrease parallel parts 2. You cant backtest faster than you can readprepare the data. As for data access the database is shared resource, no matter where it resides. MetaStockPro_1.png' alt='Metastock Plugins' title='Metastock Plugins' />If it resides on hard disk, it is single physical device that does not speed up with increasing number of CPUs. If it resides in RAM, it is still single physical RAM, that has limit on bandwidth and fixed latency regardless how many processors you throw to the mix. Download Graphics Programs In C Examples'>Download Graphics Programs In C Examples. Even if it is in L3 Level 3 cache on the processor, it is still single L3 cache shared by multiple cores. And it is worth nothing, that L3 cache even on most modern processors operate on half the speed of the core, so single core can actually saturate bandwidth of L3 cache if doing nothing but reading or writing large chunks of data fromto it. In many cases this means that processor must wait for memory, unless it is doing complex computations involving only minimum amount of data. These are for example real world measurement results for triple channel RAM controller on Intel i. CPU measured using memtest. Data location Bandwidth MBsecL1. L2. 30. 72. 2L3. 24. Play Save The Sheriff Hacked Software more. Metastock Plugins' title='Metastock Plugins' />RAM1. Only L1 cache runs at full core speed. As you can see L3 cache has half the bandwidth and RAM has 14 of bandwidth of L1 cache. Of course disk speeds even SSD are far cry behind 1. GBsec offered by RAM. Metastock Plugins' title='Metastock Plugins' />In case of portfolio backtest a final backtest phase portfolio backtesting is one per backtest, done once for all symbols, so naturally it is done in single thread as opposed to first phase that is done on every symbol in parallel. Now, knowing this all you may wonder how to use all that knowledge in practice. For example it allows to understand the limits of achievable speed gains for given formula and plan your hardware purchases or find ways to improve run times. As we learned from the above the only parts that can be speed ed up by adding more cores are those that are run in parallel multiple threads. In practice it means your AFL formula code. What is more the more time is spent in parallel part the better it scales on multiple cores. This means that simple formulas DO NOT scale too well because they are too simple to put enough strain on CPU and are mainly memory data access bound. All your simple moving average cross overs are just too simple to keep CPU busy for longer time, especially when there is not too much data to process. Let us take this trivial formula for example period Optimize period, 1. Buy Cross C, MA C, period Sell Cross MA C, period , C and run Optimize Individual Optimize on symbol that has 2. Now switch to Info tab in the Analysis window and you will see this output this example comes from 4 core 8 thread Intel i. Individual optimize started. Completed in 0. 4. Number of rows 5. Timings data 0. UI thread 0. So our 5. 00 step optimization on 2. What you see there are some cryptic numbers that you might wonder what they mean. Here is the explanation for the backtestoptimizationa data time spent accessingpreparing the datab setup time spent preparing AFL enginec afl time spent executing your formula first phase of backtestd job post processing here signals are collected and trading simulation is performed in case of individual optimizee lock time spent waiting in critical section lock accessing shared signal tablef pbt portfolio backtesting code not used in individual optimizationg UI thread time spent in UI thread in total data pbt UI handling single threaded timeh worker threads time spent in worker parallel threads setupafljoblock multi threaded time. Firstly it may look surprising that worker threads time is 3. But this time is a SUM of times spent in all 8 threads. They ran in parallel. Each was running for 3. Now you suddenly realize the power of multi threading So now it would seem that our formula run 0. You may ask why not 8x We had 8 threads, didnt weFirst reason is the Amdahls law serial time 0. Let us check how much time would it really take if we limited to one thread only. Try running with pragma statement limiting number of threads pragma maxthreads 1period Optimize period, 1. Buy Cross C, MA C, period Sell Cross MA C, period , C Suddenly the result is Individual optimize started. Completed in 1. 6. Number of rows 5. Timings data 0. UI thread 0. What Entire optimization took just 1. Why worker thread is 1. It was 3. 2. 6 What happened There are couple of reasons for that a Hyper threading as soon as you exceed CPU core count and start to rely on hyperthreading running 2 threads on single core you find out that hyperthreading does not deliver 2x performance. If your code is NOT doing complicated things like lots of trigonometric functions that put FPU busy or other number crunching, the hyperthreading will not give you 2x performance. On simple tasks it struggles to deliver 3. Turbo boost modern CPUs have different settings for single core turbo boost and multi core turbo boost. The effect is that CPU can reach raise clock to 4. GHz when running single core only but limit to 3. GHz when running multi threaded code.