前回、シングルスレッド編
これの続き
いろいろ試して一番早かったのはこれ
//分散 private double Variance(byte[] ary) { var myBag = new ConcurrentBag<long>(); Parallel.ForEach( Partitioner.Create(0, ary.Length, ary.Length / Environment.ProcessorCount), (range) => { long subtotal = 0;//スレッドごとの小計用 for (int i = range.Item1; i < range.Item2; i++) { subtotal += ary[i] * ary[i]; } myBag.Add(subtotal);//排他処理で追加 }); double average = MyAverage(ary);//平均値取得 //分散 = 2乗の平均 - 平均の2乗 return (myBag.Sum() / (double)ary.Length) - (average * average); } //平均値 private double MyAverage(byte[] ary) { ConcurrentBag<long> myBag = new ConcurrentBag<long>(); Parallel.ForEach(Partitioner.Create(0, ary.Length), (range) => { long subtotal = 0; for (int i = range.Item1; i < range.Item2; i++) { subtotal += ary[i]; } myBag.Add(subtotal); }); return myBag.Sum() / (double)ary.Length; }
using System.Collections.Concurrent;
PartitionerとConcurrentBagはどちらもSystem.Collections.Concurrent
分散 = 2乗の平均 - 平均の2乗
これをマルチスレッドで
Partitionerクラスで配列を区分けして、Parallel.ForEachでマルチスレッド処理
スレッドごと計算した小計は、排他処理で要素を追加できるコレクションのConcurrentBagクラスを使って集計
System.Numerics.Vectorなんていらんかったんや(極論)
計測用アプリ
ボタンの一斉テストかそれぞれのtestボタン押して待つだけ
配列の値はアプリの起動時にランダムに設定するので、分散の値は毎回変化する
組み合わせは
- 計算方法:分散の求め方その1がV1 or その2がV2
- スレッド:シングルはForLoop、マルチはParallel.ForかForEach
- Vector:不使用 or 使用
- 計算精度:int(整数) or double(小数点有り)
分散の求め方その1、その2は前回と同じで
その1:偏差の2乗の平均
その2:2乗の平均 - 平均の2乗
Test01_V1_ForLoop_double
求め方その1でシングルスレッドでdouble型で計算
Test19_V2_ParallelForEach_VectorByteWiden
求め方はその2、マルチスレッド、byte型Vectorで計算
ダウンロード先
ファイル名は、20200213_分散を求めるマルチスレッド編.zip
作成と計測環境
- CPU AMD Ryzen 5 2400G(4コア8スレッド)
- MEM DDR4-2666
- Window 10 Home 64bit
- Visual Studio 2019 Community .NET Core 3.1 WPF C#
.NET FrameworkだとSystem.Numericsの参照の追加とかあってめんどくさいけど、.NET Coreならラク
グラフにして
シングルスレッドdouble型で計算するtest01の速度を1にして
数値が赤のテストはシングルスレッド
塗りつぶしのないグラフのテストはint型計算なので精度が落ちるけど誤差程度かも
青色グラフは普通に計算
赤色グラフはSystem.Numerics.Vector(SIMD)を使って計算
緑色グラフはLINQで計算
マルチスレッド化で
どのテストでもシングルからマルチスレッド化で約4倍速くなっている、使ったCPUは4コア8スレッドなので、論理コア数の8じゃなくて物理コア数分速くなったことになる。16コアCPUなら16倍速くなったりするのかなあ
CPUのSIMDを使うSystem.Numerics.Vector
残念ながら今回のテストでは使わないほうが速かった、けどVector.Countの大きいCPUなら逆転するかも?
CPU使用率
一斉テストしたときのCPU使用率、マルチスレッドのときは使用率が上がっているけど100%には届かずに80%くらいだった、なんでかなあ
<Window x:Class="_20200213_分散を求めるマルチスレッド編.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:local="clr-namespace:_20200213_分散を求めるマルチスレッド編" mc:Ignorable="d" Title="MainWindow" Height="700" Width="614"> <Grid> <StackPanel> <StackPanel.Resources> <Style TargetType="StackPanel"> <Setter Property="Margin" Value="2"/> </Style> <Style TargetType="Button"> <Setter Property="Width" Value="60"/> </Style> <Style TargetType="TextBlock"> <Setter Property="Margin" Value="2,0"/> </Style> </StackPanel.Resources> <TextBlock x:Name="MyTextBlock" Text="text" HorizontalAlignment="Center"/> <TextBlock x:Name="MyTextBlockVectorCount" Text="vectorCount" HorizontalAlignment="Center"/> <TextBlock x:Name="MyTextBlockCpuThreadCount" Text="threadCount" HorizontalAlignment="Center"/> <StackPanel Orientation="Horizontal" HorizontalAlignment="Center"> <Button x:Name="ButtonAll" Content="一斉テスト" Margin="20,0" Width="120"/> <TextBlock x:Name="TbAll" Text="time"/> <Button x:Name="ButtonReset" Content="reset" Margin="20,0"/> </StackPanel> <StackPanel Orientation="Horizontal" Background="#100000FF"> <Button x:Name="Button1" Content="test1"/> <TextBlock x:Name="Tb1" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal" Background="#100000FF"> <Button x:Name="Button2" Content="test2"/> <TextBlock x:Name="Tb2" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button3" Content="test3"/> <TextBlock x:Name="Tb3" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button4" Content="test4"/> <TextBlock x:Name="Tb4" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button5" Content="test5"/> <TextBlock x:Name="Tb5" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button6" Content="test6"/> <TextBlock x:Name="Tb6" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button7" Content="test7"/> <TextBlock x:Name="Tb7" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal" Background="#100000FF"> <Button x:Name="Button8" Content="test8"/> <TextBlock x:Name="Tb8" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button9" Content="test9"/> <TextBlock x:Name="Tb9" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button10" Content="test10"/> <TextBlock x:Name="Tb10" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button11" Content="test11"/> <TextBlock x:Name="Tb11" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button12" Content="test12"/> <TextBlock x:Name="Tb12" Text="time"/> </StackPanel> <Border Height="1" Background="Orange" UseLayoutRounding="True"/> <StackPanel Orientation="Horizontal" Background="#100000FF"> <Button x:Name="Button13" Content="test13"/> <TextBlock x:Name="Tb13" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button14" Content="test14"/> <TextBlock x:Name="Tb14" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button15" Content="test15"/> <TextBlock x:Name="Tb15" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal" Background="#100000FF"> <Button x:Name="Button16" Content="test16"/> <TextBlock x:Name="Tb16" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button17" Content="test17"/> <TextBlock x:Name="Tb17" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button18" Content="test18"/> <TextBlock x:Name="Tb18" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button19" Content="test19"/> <TextBlock x:Name="Tb19" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button20" Content="test20"/> <TextBlock x:Name="Tb20" Text="time"/> </StackPanel> <Border Height="1" Background="Orange" UseLayoutRounding="True"/> <StackPanel Orientation="Horizontal"> <Button x:Name="Button21" Content="test21"/> <TextBlock x:Name="Tb21" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button22" Content="test22"/> <TextBlock x:Name="Tb22" Text="time"/> </StackPanel> <!--<StackPanel Orientation="Horizontal"> <Button x:Name="Button23" Content="test23"/> <TextBlock x:Name="Tb23" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button24" Content="test24"/> <TextBlock x:Name="Tb24" Text="time"/> </StackPanel> <StackPanel Orientation="Horizontal"> <Button x:Name="Button25" Content="test25"/> <TextBlock x:Name="Tb25" Text="time"/> </StackPanel>--> </StackPanel> </Grid> </Window>
using System; using System.Linq; using System.Threading.Tasks; using System.Windows; using System.Windows.Controls; using System.Numerics; using System.Threading; using System.Collections.Concurrent; using System.Diagnostics; using System.Collections.Generic; namespace _20200213_分散を求めるマルチスレッド編 { /// <summary> /// Interaction logic for MainWindow.xaml /// </summary> public partial class MainWindow : Window { private byte[] MyByteAry; private const int LOOP_COUNT = 100; private const int ELEMENT_COUNT = 10_000_000;//1000万 //private const int ELEMENT_COUNT = 3264 * 2448;//7,990,272、今使っているスマカメは800万 //private const int ELEMENT_COUNT = 7680 * 4320;//33,177,600、8K解像度は、3300万 private double MyAverage; public MainWindow() { InitializeComponent(); MyInitialize(); SetRandomByte(); ButtonReset.Click += (s, e) => MyReset(); ButtonAll.Click += (s, e) => MyExeAll(); Button1.Click += (s, e) => MyExe(Test01_V1_ForLoop_double, Tb1, MyByteAry); Button2.Click += (s, e) => MyExe(Test02_V1_ForLoop_int, Tb2, MyByteAry); Button3.Click += (s, e) => MyExe(Test03_V1_ParallelFor_double_ConcurrentBag, Tb3, MyByteAry); Button4.Click += (s, e) => MyExe(Test04_V1_ParallelFor_Integer_ConcurrentBag, Tb4, MyByteAry); Button5.Click += (s, e) => MyExe(Test05_V1_ParallelFor_Integer_InterlockedAdd, Tb5, MyByteAry); Button6.Click += (s, e) => MyExe(Test06_V1_ParallelForEech_double, Tb6, MyByteAry); Button7.Click += (s, e) => MyExe(Test07_V1_ParallelForEech_Integer, Tb7, MyByteAry); //vector Button8.Click += (s, e) => MyExe(Test08_V1_ForLoop_VectorDouble, Tb8, MyByteAry); Button9.Click += (s, e) => MyExe(Test09_V1_ParallelForEech_VectorDouble, Tb9, MyByteAry); Button10.Click += (s, e) => MyExe(Test10_V1_ParallelForEech_VectorInteger, Tb10, MyByteAry); Button11.Click += (s, e) => MyExe(Test11_V1_ParallelForEech_VectorFloat, Tb11, MyByteAry); Button12.Click += (s, e) => MyExe(Test12_V1_ParallelForEech_Vector4, Tb12, MyByteAry); Button13.Click += (s, e) => MyExe(Test13_V2_ForLoop, Tb13, MyByteAry); Button14.Click += (s, e) => MyExe(Test14_V2_ParallelFor, Tb14, MyByteAry); Button15.Click += (s, e) => MyExe(Test15_V2_ParallelForEech, Tb15, MyByteAry); Button16.Click += (s, e) => MyExe(Test16_V2_ForLoop_VectorDouble, Tb16, MyByteAry); Button17.Click += (s, e) => MyExe(Test17_V2_ParallelForEech_VectorDouble, Tb17, MyByteAry); Button18.Click += (s, e) => MyExe(Test18_V2_ParallelForEech_VectorInt, Tb18, MyByteAry); Button19.Click += (s, e) => MyExe(Test19_V2_ParallelForEech_VectorByteWiden, Tb19, MyByteAry); Button20.Click += (s, e) => MyExe(Test20_V2_ParallelForEech_Vector4, Tb20, MyByteAry); Button21.Click += (s, e) => MyExe(Test21_V1_ParallelLinq, Tb21, MyByteAry); Button22.Click += (s, e) => MyExe(Test22_V2_ParallelLinq, Tb22, MyByteAry); //Button24.Click += (s, e) => MyExe(Test22_V2_ParallelLinq, Tb24, MyByteAry); //Button25.Click += (s, e) => MyExe(Test21_V1_ParallelLinq, Tb25, MyByteAry); } private void MyInitialize() { this.Title = this.ToString(); MyTextBlock.Text = $"byte型配列の値の分散、要素数{ELEMENT_COUNT.ToString("N0")}の分散を{LOOP_COUNT}回求める"; MyTextBlockVectorCount.Text = $"Vector<long>.Count = {Vector<long>.Count}"; string str = $"VectorCount : Long={Vector<long>.Count}, Double={Vector<double>.Count}, int={Vector<int>.Count}, flort={Vector<float>.Count}, short={Vector<short>.Count}, byte={Vector<byte>.Count}"; MyTextBlockVectorCount.Text = str; MyTextBlockCpuThreadCount.Text = $"CPUスレッド数:{Environment.ProcessorCount.ToString()} thread"; } private void SetRandomByte() { MyByteAry = new byte[ELEMENT_COUNT]; var r = new Random(); r.NextBytes(MyByteAry); //MyByteAry = new byte[] { 20, 21, 7, 12 }; //Span<byte> span = new Span<byte>(MyByteAry); //span.Fill(255); //MyByteAry = span.ToArray(); //MyByteAry[0] = 0; //var span = new Span<byte>(MyByteAry); //span.Fill(2); //MyByteAry[ELEMENT_COUNT - 1] = 1; MyAverage = GetAverage(MyByteAry);//要素の平均値 } //平均値 private double GetAverage(byte[] ary) { long total = 0; Parallel.ForEach(Partitioner.Create(0, ary.Length), (range) => { long subtotal = 0; for (int i = range.Item1; i < range.Item2; i++) { subtotal += ary[i]; } Interlocked.Add(ref total, subtotal); }); return total / (double)ary.Length; } #region 分散の求め方その1 = 偏差の2乗の平均 //普通のForループ、double型 private double Test01_V1_ForLoop_double(byte[] ary) { //平均との差(偏差)の2乗を合計 double total = 0; for (int i = 0; i < ary.Length; i++) { //total += Math.Pow(ary[i] - MyAverage, 2.0);//遅い33秒 double diff = ary[i] - MyAverage;//速い0.9秒 total += diff * diff; } //合計 / 要素数 = 分散 return total / ary.Length; } //普通のForループ、int型 private double Test02_V1_ForLoop_int(byte[] ary) { //平均との差の2乗を合計 long total = 0; int average = (int)MyAverage; int ii;//ループの外に出したほうが誤差程度に速い for (int i = 0; i < ary.Length; i++) { //total += (int)Math.Pow(ary[i] - average, 2); //total += (ary[i] - average) * (ary[i] - average);//こっちのほうが↑より10倍以上速い ii = ary[i] - average; total += ii * ii; } //合計 / 要素数 = 分散 return total / (double)ary.Length; } //マルチスレッド //Parallel.For、すべてdouble型で計算 private double Test03_V1_ParallelFor_double_ConcurrentBag(byte[] ary) { //平均との差(偏差)の2乗を合計 var myBag = new ConcurrentBag<double>(); Parallel.For<double>(0, ary.Length, () => 0, (j, loopState, subtotal) => { double diff = ary[j] - MyAverage;//差 return subtotal += diff * diff;//差の2乗の小計 }, (x) => myBag.Add(x));//CuncurrentBagは排他処理で要素を追加できる double total = myBag.Sum();//合計 //合計 / 要素数 = 分散 return total / ary.Length; } //Parallel.For、偏差を整数で計算してdouble型との差を見る private double Test04_V1_ParallelFor_Integer_ConcurrentBag(byte[] ary) { var myBag = new ConcurrentBag<long>(); int average = (int)MyAverage;//平均値を整数に Parallel.For<long>(0, ary.Length, () => 0, (j, loopState, subtotal) => { long diff = ary[j] - average;//整数で計算 return subtotal += diff * diff; }, (x) => myBag.Add(x)); long total = myBag.Sum(); //最後だけdouble型 return (double)total / ary.Length; } //Parallel.For、整数で計算 //排他処理での合計をInterlocked.Addにして、ConcurrentBagとの差を見る private double Test05_V1_ParallelFor_Integer_InterlockedAdd(byte[] ary) { //平均との差(偏差)の2乗を合計 long total = 0; int average = (int)MyAverage; Parallel.For<long>(0, ary.Length, () => 0, (j, loopState, subtotal) => { int diff = ary[j] - average;//差 return subtotal += diff * diff;//差の2乗の小計 }, (x) => Interlocked.Add(ref total, x));//排他処理で合計、InterlockedAddは整数しか扱えない //合計 / 要素数 = 分散 return (double)total / ary.Length; } // c# — C#で整数の配列を合計する方法 //https://www.it-swarm.dev/ja/c%23/c%EF%BC%83%E3%81%A7%E6%95%B4%E6%95%B0%E3%81%AE%E9%85%8D%E5%88%97%E3%82%92%E5%90%88%E8%A8%88%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95/968057375/ //Paralle.ForEach、double型 private double Test06_V1_ParallelForEech_double(byte[] ary) { //平均との差(偏差)の2乗を合計 var myBag = new ConcurrentBag<double>(); Parallel.ForEach( Partitioner.Create(0, ary.Length), (range) => { double subtotal = 0; for (int i = range.Item1; i < range.Item2; i++) { double diff = ary[i] - MyAverage; subtotal += diff * diff; } myBag.Add(subtotal);//小計を追加 }); double total = myBag.Sum();//合計 //合計 / 要素数 = 分散 return total / ary.Length; } //Paralle.ForEach、整数型で計算 private double Test07_V1_ParallelForEech_Integer(byte[] ary) { //平均との差(偏差)の2乗を合計 long total = 0; int average = (int)MyAverage; Parallel.ForEach( Partitioner.Create(0, ary.Length), (range) => { long subtotal = 0; int diff; for (int i = range.Item1; i < range.Item2; i++) { diff = ary[i] - average; subtotal += diff * diff; } Interlocked.Add(ref total, subtotal); }); //合計 / 要素数 = 分散 return (double)total / ary.Length; } //ここからVector //Vector<double>で計算、シングルスレッド private double Test08_V1_ForLoop_VectorDouble(byte[] ary) { //var vAverage = new Vector<double>(MyAverage); int simdLength = Vector<double>.Count; int lastIndex = ary.Length - (ary.Length % simdLength); Vector<double> v; var ss = new double[simdLength]; double total = 0; for (int i = 0; i < lastIndex; i += simdLength) { for (int j = 0; j < simdLength; j++) { //ss[j] = ary[i + j]; ss[j] = ary[i + j] - MyAverage;//平均との差 } v = new Vector<double>(ss); //平均との差はVector.Subtractを使うより個別に引き算したほうが気持ち速い //v = System.Numerics.Vector.Subtract(vAverage, new Vector<double>(ss)); //差の2乗を合計 total += System.Numerics.Vector.Dot(v, v); } //配列とVectorCountの剰余分も加算 total += MySuquareSumOfDeviation偏差の2乗和(ary, lastIndex, ary.Length, MyAverage); return total / ary.Length; } //マルチスレッド //Vector<double>で計算、スレッドごとの集計もdouble型 //ConcurentBagならdouble型も扱える //パーティション区切りはCPUスレッド数で分けている //分けた後の要素数がVectorCountより小さいとエラーになる //例えばCPUスレッド数8、Vector<double>.Countが4の環境だと //8*4=32、要素数32以上の配列じゃないとエラーになる private double Test09_V1_ParallelForEech_VectorDouble(byte[] ary) { int simdLength = Vector<double>.Count; var bag = new ConcurrentBag<double>(); //パーティションサイズ、要素数をCPUスレッド数で割ったサイズに設定した int rangeSize = ary.Length / Environment.ProcessorCount; Parallel.ForEach(Partitioner.Create(0, ary.Length, rangeSize), (range) => { double subtotal = 0; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; double[] aa = new double[simdLength]; Vector<double> v; for (int i = range.Item1; i < lastIndex; i += simdLength) { //偏差の配列作成 for (int j = 0; j < simdLength; j++) { aa[j] = ary[i + j] - MyAverage;//偏差 } //偏差の配列からVector作成 v = new Vector<double>(aa); subtotal += System.Numerics.Vector.Dot(v, v); //VectorCountで割り切れなかった余り要素の2乗和も加算 subtotal += MySuquareSumOfDeviation偏差の2乗和(ary, lastIndex, range.Item2, MyAverage); } //排他処理でスレッドごとの小計をコレクションに追加 bag.Add(subtotal); }); return bag.Sum() / ary.Length; } //Vector<int>で計算、小計、合計はlongで計算、intだと小計でもオーバーフローした private double Test10_V1_ParallelForEech_VectorInteger(byte[] ary) { int simdLength = Vector<int>.Count; long total = 0; int average = (int)MyAverage; int rangeSize = ary.Length / Environment.ProcessorCount; Parallel.ForEach(Partitioner.Create(0, ary.Length, rangeSize), (range) => { MyVariance(range, simdLength, ary, average, ref total); //全く同じ処理なんだけど↓より↑のほうが速い //int subtotal = 0; //int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; //int[] aa = new int[simdLength]; //Vector<int> v; //for (int i = range.Item1; i < lastIndex; i += simdLength) //{ // //偏差の配列作成 // for (int j = 0; j < simdLength; j++) // { // aa[j] = ary[i + j] - average;//偏差 // } // //偏差の配列からVector作成 // v = new Vector<int>(aa); // subtotal += System.Numerics.Vector.Dot(v, v); // //VectorCountで割り切れなかった余りの2乗和も加算 // subtotal += (int)MySuquareSumOfDeviation偏差の2乗和(ary, lastIndex, range.Item2, MyAverage); //} ////排他処理でスレッドごとの小計を加算、InterlockedAddは整数しか扱えない //Interlocked.Add(ref total, subtotal); }); return (double)total / ary.Length; } private void MyVariance(Tuple<int, int> range, int simdLength, byte[] ary, int average, ref long total) { long subtotal = 0; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; int[] aa = new int[simdLength]; Vector<int> v; for (int i = range.Item1; i < lastIndex; i += simdLength) { //偏差の配列作成 for (int j = 0; j < simdLength; j++) { aa[j] = ary[i + j] - average;//偏差 } //偏差の配列からVector作成 v = new Vector<int>(aa); subtotal += System.Numerics.Vector.Dot(v, v); //VectorCountで割り切れなかった余りの2乗和も加算 subtotal += (int)MySuquareSumOfDeviation偏差の2乗和(ary, lastIndex, range.Item2, MyAverage); } //排他処理でスレッドごとの小計を加算、InterlockedAddは整数しか扱えない Interlocked.Add(ref total, subtotal); } //Vector<float>で計算、小計はdouble、合計もdouble //ConcurentBagで排他処理合計 private double Test11_V1_ParallelForEech_VectorFloat(byte[] ary) { int simdLength = Vector<float>.Count; var bag = new ConcurrentBag<double>(); float average = (float)MyAverage; int rangeSize = ary.Length / Environment.ProcessorCount; Parallel.ForEach(Partitioner.Create(0, ary.Length, rangeSize), (range) => { double subtotal = 0; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; float[] aa = new float[simdLength]; Vector<float> v; for (int i = range.Item1; i < lastIndex; i += simdLength) { for (int j = 0; j < simdLength; j++) { aa[j] = ary[i + j] - average; } v = new Vector<float>(aa); subtotal += System.Numerics.Vector.Dot(v, v); subtotal += MySuquareSumOfDeviation偏差の2乗和(ary, lastIndex, range.Item2, MyAverage); } //排他処理でスレッドごとの小計をコレクションに追加 bag.Add(subtotal); }); double total = bag.Sum(); return total / ary.Length; } //Vector4 private double Test12_V1_ParallelForEech_Vector4(byte[] ary) { int simdLength = 4; var bag = new ConcurrentBag<double>(); var vAverage = new Vector4((float)MyAverage); int rangeSize = ary.Length / Environment.ProcessorCount; Parallel.ForEach(Partitioner.Create(0, ary.Length, rangeSize), (range) => { double subtotal = 0; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; float[] aa = new float[simdLength]; Vector4 v; for (int i = range.Item1; i < lastIndex; i += simdLength) { v = new Vector4(ary[i], ary[i + 1], ary[i + 2], ary[i + 3]); v = Vector4.Subtract(v, vAverage); subtotal += Vector4.Dot(v, v); subtotal += MySuquareSumOfDeviation偏差の2乗和(ary, lastIndex, range.Item2, MyAverage); } bag.Add(subtotal); }); double total = bag.Sum(); return total / ary.Length; } #endregion 分散の求め方その1ここまで //------------------------------------------------------------- #region 分散の求め方その2 = 2乗の平均 - 平均の2乗 //分散 = 2乗の平均 - 平均の2乗 //シングルスレッド private double Test13_V2_ForLoop(byte[] ary) { //2乗和 double total = 0; for (int i = 0; i < ary.Length; i++) { total += ary[i] * ary[i]; } //2乗の平均 total /= ary.Length; //2乗の平均 - 平均の2乗 return total - (MyAverage * MyAverage); } //マルチスレッドここから private double Test14_V2_ParallelFor(byte[] ary) { //2乗和 long total = 0; //var options = new ParallelOptions(); //options.MaxDegreeOfParallelism = Environment.ProcessorCount; Parallel.For<long>(0, ary.Length, () => 0, (j, state, subtotal) => { return subtotal += ary[j] * ary[j]; }, (x) => Interlocked.Add(ref total, x)); return ((double)total / ary.Length) - (MyAverage * MyAverage);//2乗の平均 - 平均の2乗 } //整数で計算 private double Test15_V2_ParallelForEech(byte[] ary) { //2乗和 long total = 0; var partition = Partitioner.Create(0, ary.Length, ary.Length / Environment.ProcessorCount); Parallel.ForEach(partition, (range) => { long subtotal = 0; for (int i = range.Item1; i < range.Item2; i++) { subtotal += ary[i] * ary[i]; } Interlocked.Add(ref total, subtotal);//スレッドごとの小計 }); return ((double)total / ary.Length) - (MyAverage * MyAverage);//2乗の平均 - 平均の2乗 } //ここからVector private double Test16_V2_ForLoop_VectorDouble(byte[] ary) { int simdLength = Vector<double>.Count; int lastIndex = ary.Length - (ary.Length % simdLength); Vector<double> v; var ss = new double[simdLength]; double total = 0; for (int i = 0; i < lastIndex; i += simdLength) { for (int j = 0; j < simdLength; j++) { ss[j] = ary[i + j]; } v = new Vector<double>(ss); total += System.Numerics.Vector.Dot(v, v); } //配列とVectorCountの剰余分も加算 total += MySuquareSum2乗和(ary, lastIndex, ary.Length, simdLength); return (total / ary.Length) - (MyAverage * MyAverage); } //Vector<double>で計算、byte型の2乗和を求めるからdouble型はあんまり意味ない private double Test17_V2_ParallelForEech_VectorDouble(byte[] ary) { var myBag = new ConcurrentBag<double>(); var partition = Partitioner.Create(0, ary.Length, ary.Length / Environment.ProcessorCount); int simdLength = Vector<double>.Count; Parallel.ForEach(partition, (range) => { double subtotal = 0; var aa = new double[simdLength]; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; Vector<double> v; for (int i = range.Item1; i < lastIndex; i += simdLength) { for (int j = 0; j < simdLength; j++) { aa[j] = ary[i + j]; } v = new Vector<double>(aa); subtotal += System.Numerics.Vector.Dot(v, v); } //配列とVectorCountの剰余分も加算 subtotal += MySuquareSum2乗和(ary, range.Item1, range.Item2, simdLength); myBag.Add(subtotal); }); double total = myBag.Sum(); return (total / ary.Length) - (MyAverage * MyAverage); } //Vector<int>で計算、整数で計算 private double Test18_V2_ParallelForEech_VectorInt(byte[] ary) { long total = 0; int simdLength = Vector<int>.Count; Parallel.ForEach( Partitioner.Create(0, ary.Length, ary.Length / Environment.ProcessorCount), (range) => { long subtotal = 0; var aa = new int[simdLength]; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; Vector<int> v; for (int i = range.Item1; i < lastIndex; i += simdLength) { for (int j = 0; j < simdLength; j++) { aa[j] = ary[i + j]; } v = new Vector<int>(aa); subtotal += System.Numerics.Vector.Dot(v, v); } //配列とVectorCountの剰余分も加算 subtotal += MySuquareSum2乗和(ary, range.Item1, range.Item2, simdLength); Interlocked.Add(ref total, subtotal); }); return ((double)total / ary.Length) - (MyAverage * MyAverage); } //Widen //uint 4,294,967,295まで、ushort 65535まで //ushortでもオーバーフローするのでuintまで伸長してからドット積 private double Test19_V2_ParallelForEech_VectorByteWiden(byte[] ary) { long total = 0; var partition = Partitioner.Create(0, ary.Length, ary.Length / Environment.ProcessorCount); int simdLength = Vector<byte>.Count; Parallel.ForEach(partition, (range) => { long subtotal = 0; int lastIndex = range.Item2 - ((range.Item2 - range.Item1) % simdLength); for (int i = range.Item1; i < lastIndex; i += simdLength) { System.Numerics.Vector.Widen(new Vector<byte>(ary, i), out Vector<ushort> v1, out Vector<ushort> v2); System.Numerics.Vector.Widen(v1, out Vector<uint> vv1, out Vector<uint> vv2); System.Numerics.Vector.Widen(v2, out Vector<uint> vv3, out Vector<uint> vv4); subtotal += System.Numerics.Vector.Dot(vv1, vv1); subtotal += System.Numerics.Vector.Dot(vv2, vv2); subtotal += System.Numerics.Vector.Dot(vv3, vv3); subtotal += System.Numerics.Vector.Dot(vv4, vv4); } //配列とVectorCountの剰余分も加算 subtotal += MySuquareSum2乗和(ary, range.Item1, range.Item2, simdLength); Interlocked.Add(ref total, subtotal); }); return ((double)total / ary.Length) - (MyAverage * MyAverage); } //Vector4はfloat型で計算 private double Test20_V2_ParallelForEech_Vector4(byte[] ary) { var myBag = new ConcurrentBag<double>(); var partition = Partitioner.Create(0, ary.Length, ary.Length / Environment.ProcessorCount); int simdLength = 4;//Vector4は4固定 Parallel.ForEach(partition, (range) => { double subtotal = 0; int lastIndex = range.Item2 - (range.Item2 - range.Item1) % simdLength; Vector4 v; for (int i = range.Item1; i < lastIndex; i += simdLength) { v = new Vector4(ary[i], ary[i + 1], ary[i + 2], ary[i + 3]); subtotal += Vector4.Dot(v, v); } //配列とVectorCountの剰余分も加算 subtotal += MySuquareSum2乗和(ary, range.Item1, range.Item2, simdLength); myBag.Add(subtotal); }); double total = myBag.Sum(); return (total / ary.Length) - (MyAverage * MyAverage); } #endregion 分散の求め方その2ここまで //Linq private double Test21_V1_ParallelLinq(byte[] ary) { //return ary.AsParallel().Select(x => Math.Pow(x - MyAverage, 2)).Sum() / ary.Length;//遅い11秒 return ary.AsParallel().Select(x => (x - MyAverage) * (x - MyAverage)).Sum() / ary.Length;//3.5秒 } private double Test22_V2_ParallelLinq(byte[] ary) { return (ary.AsParallel().Select(x => x * x).Sum(x => (long)x) / (double)ary.Length) - (MyAverage * MyAverage);//こっちのほうが↓より3割速い、精度は同じ //return (ary.AsParallel().Select(x => x * x).Sum(x => (double)x) / ary.Length) - (MyAverage * MyAverage); } private void MyReset() { var neko = EnumerateDescendantObjects2<TextBlock>(this); foreach (var item in neko) { item.Text = ""; } MyInitialize(); } #region コントロールの列挙 private static List<T> EnumerateDescendantObjects2<T>(DependencyObject obj) where T : DependencyObject { var l = new List<T>(); foreach (object child in LogicalTreeHelper.GetChildren(obj)) { if (child is T) { l.Add((T)child); } if (child is DependencyObject dobj) { foreach (T cobj2 in EnumerateDescendantObjects2<T>(dobj)) { l.Add(cobj2); } } } return l; } #endregion //指定インデックスから最後までの偏差の2乗和を返す //VectorCountで割り切れなかった余り用、分散の求め方その1用 private double MySuquareSumOfDeviation偏差の2乗和(byte[] ary, int beginIndex, int endIndex, double average) { double total = 0; for (int i = beginIndex; i < endIndex; i++) { //total += Math.Pow(ary[i] - average, 2.0);//MathPowは遅い var deviation = ary[i] - average;//偏差 total += deviation * deviation; } return total; } /// <summary> /// byte型配列用、分散の求め方その2用、VectorCountで割り切れなかった余りの要素の2乗和を返す、要素数10でVectorCountが4のとき10%4=2なので、最後の2つの要素が対象になる /// </summary> /// <param name="ary">配列</param> /// <param name="beginIndex">範囲の開始インデックス</param> /// <param name="endIndex">範囲の終了インデックス</param> /// <param name="simdLength">Vector[T].Count</param> /// <returns></returns> private int MySuquareSum2乗和(byte[] ary, int beginIndex, int endIndex, int simdLength) { //あまりの位置のインデックス int lastIndex = endIndex - (endIndex - beginIndex) % simdLength; //2乗して合計 int total = 0; for (int i = lastIndex; i < endIndex; i++) { total += ary[i] * ary[i]; } return total; } private void MyExe(Func<byte[], double> func, TextBlock tb, byte[] ary) { double total = 0; var sw = new Stopwatch(); sw.Start(); for (int i = 0; i < LOOP_COUNT; i++) { total = func(ary); } sw.Stop(); //tb.Text = $"処理時間:{sw.Elapsed.TotalSeconds.ToString("00.000")}秒 分散 = {total.ToString("F10")} {System.Reflection.RuntimeReflectionExtensions.GetMethodInfo(func).Name}"; this.Dispatcher.Invoke(() => { tb.Text = $"処理時間:{sw.Elapsed.TotalSeconds.ToString("00.000")}秒 分散 = {total.ToString("F15")} {System.Reflection.RuntimeReflectionExtensions.GetMethodInfo(func).Name}"; }); } private async void MyExeAll() { var sw = new Stopwatch(); sw.Start(); this.IsEnabled = false; await Task.Run(() => MyExe(Test01_V1_ForLoop_double, Tb1, MyByteAry)); await Task.Run(() => MyExe(Test02_V1_ForLoop_int, Tb2, MyByteAry)); await Task.Run(() => MyExe(Test03_V1_ParallelFor_double_ConcurrentBag, Tb3, MyByteAry)); await Task.Run(() => MyExe(Test04_V1_ParallelFor_Integer_ConcurrentBag, Tb4, MyByteAry)); await Task.Run(() => MyExe(Test05_V1_ParallelFor_Integer_InterlockedAdd, Tb5, MyByteAry)); await Task.Run(() => MyExe(Test06_V1_ParallelForEech_double, Tb6, MyByteAry)); await Task.Run(() => MyExe(Test07_V1_ParallelForEech_Integer, Tb7, MyByteAry)); await Task.Run(() => MyExe(Test08_V1_ForLoop_VectorDouble, Tb8, MyByteAry)); await Task.Run(() => MyExe(Test09_V1_ParallelForEech_VectorDouble, Tb9, MyByteAry)); await Task.Run(() => MyExe(Test10_V1_ParallelForEech_VectorInteger, Tb10, MyByteAry)); await Task.Run(() => MyExe(Test11_V1_ParallelForEech_VectorFloat, Tb11, MyByteAry)); await Task.Run(() => MyExe(Test12_V1_ParallelForEech_Vector4, Tb12, MyByteAry)); await Task.Run(() => MyExe(Test13_V2_ForLoop, Tb13, MyByteAry)); await Task.Run(() => MyExe(Test14_V2_ParallelFor, Tb14, MyByteAry)); await Task.Run(() => MyExe(Test15_V2_ParallelForEech, Tb15, MyByteAry)); await Task.Run(() => MyExe(Test16_V2_ForLoop_VectorDouble, Tb16, MyByteAry)); await Task.Run(() => MyExe(Test17_V2_ParallelForEech_VectorDouble, Tb17, MyByteAry)); await Task.Run(() => MyExe(Test18_V2_ParallelForEech_VectorInt, Tb18, MyByteAry)); await Task.Run(() => MyExe(Test19_V2_ParallelForEech_VectorByteWiden, Tb19, MyByteAry)); await Task.Run(() => MyExe(Test20_V2_ParallelForEech_Vector4, Tb20, MyByteAry)); await Task.Run(() => MyExe(Test21_V1_ParallelLinq, Tb21, MyByteAry)); await Task.Run(() => MyExe(Test22_V2_ParallelLinq, Tb22, MyByteAry)); //await Task.Run(() => MyExe(Test08_V1ST_VectorDouble, Tb23, MyByteAry)); //await Task.Run(() => MyExe(Test08_V1ST_VectorDouble, Tb24, MyByteAry)); this.IsEnabled = true; sw.Stop(); TbAll.Text = $"一斉テスト時間:{sw.Elapsed.TotalSeconds.ToString("000.000")}秒"; } } }
関連記事
次は3日後
12日後はIntrinsicsでSIMDでドット積
ポインタ使用はめんどくさいけど、これを使えば分散も今回より速くなるはず
前回2日前はシングルスレッドで分散
10日前は配列の合計値
配列のマルチスレッド処理はここ
1ヶ月前は配列の最大値、最小値