library(plyr) シミュレーションをする場合は、r_plyを使う。単純に式評価を繰り返して、結果を得る。 こっちは、自分の理解用なので、端折りが多い。 サイコロを10回振った場合の偏向分散を、1000回調べて、統計数字を見る。 summary(raply(1000, function(){data <- sample(1:6,10, replace=T); sum( (data-mean(data))^2 )/(length(data))})) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.410 2.050 2.600 2.618 3.200 5.560 > summary(raply(1000, function(){data <- sample(1:6,10, replace=T); sum( (data-mean(data))^2 )/(length(data)-1)})) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.7222 2.2330 2.9330 2.9080 3.5560 5.8220 > 10回だと、結構違う。 meanで、2.618 <=> 2.908 標準偏差だと、このルートなので、1.6 <=> 1.7 くらい。サイコロの出目の振れが。 sapplyで囲む(apply属復活、、、) 平均値のみを採用。 不偏分散 var.nb <- sapply(5:100,function(n){mean(raply(1000, + function(){ + data <- sample(1:6, n, replace=T); + sum((data-mean(data))^2 )/(length(data)-1); + } + ))}) [1] 2.931600 2.929233 2.911286 2.950946 2.880556 2.932211 2.927891 2.863106 [9] 2.895897 2.921995 2.934257 2.938175 2.939331 2.883641 2.896994 2.924153 [17] 2.924633 2.900058 2.936791 2.908478 2.926730 2.908208 2.938328 2.921710 [25] 2.921217 2.898013 2.907714 2.926938 2.926216 2.921503 2.922926 2.920809 [33] 2.888413 2.906972 2.940397 2.929619 2.942357 2.934956 2.896492 2.935275 [41] 2.899258 2.904971 2.919401 2.911190 2.935952 2.909979 2.938710 2.931902 [49] 2.903561 2.925579 2.942227 2.919331 2.911908 2.910317 2.910058 2.934226 [57] 2.916754 2.920607 2.898412 2.921514 2.921038 2.914850 2.909988 2.898586 [65] 2.917582 2.916998 2.919342 2.905161 2.910937 2.925926 2.917979 2.932928 [73] 2.918373 2.928385 2.930993 2.939203 2.926241 2.919778 2.926027 2.918926 [81] 2.921166 2.910206 2.910255 2.902594 2.910219 2.910978 2.911598 2.912503 [89] 2.910899 2.927821 2.928744 2.917812 2.924014 2.917722 2.910959 2.928153 偏向分散 var.n <- sapply(5:100,function(n){mean(raply(1000, + function(){ + data <- sample(1:6, n, replace=T); + sum((data-mean(data))^2 )/(length(data)); + } + ))}) [1] 2.399360 2.406611 2.480694 2.530422 2.546741 2.596840 2.679736 2.706236 [9] 2.700462 2.738546 2.710356 2.701363 2.728865 2.729630 2.766853 2.754800 [17] 2.757510 2.790182 2.794363 2.795488 2.820093 2.827675 2.811097 2.832218 [25] 2.805039 2.810347 2.824429 2.851028 2.823164 2.838049 2.842198 2.833519 [33] 2.811055 2.836715 2.831904 2.829951 2.834606 2.859405 2.839651 2.866138 [41] 2.839880 2.834746 2.851852 2.862608 2.862696 2.835504 2.872491 2.869796 [49] 2.859652 2.860018 2.866114 2.863238 2.863912 2.876199 2.868884 2.876741 [57] 2.872930 2.851649 2.852424 2.883022 2.878173 2.882389 2.874185 2.887665 [65] 2.880503 2.880378 2.858376 2.877339 2.887407 2.869049 2.864188 2.874948 [73] 2.873556 2.890123 2.871637 2.873974 2.888595 2.890701 2.888224 2.880468 [81] 2.890426 2.880222 2.900448 2.878312 2.869455 2.891076 2.894434 2.881902 [89] 2.890678 2.884995 2.894999 2.880912 2.874608 2.892088 2.891683 2.893812 グラフ化 data <- rbind(data.frame( bias="yes",分散=var.b,試行回数=5:100), data.frame(試行回数=5:100,分散=var.nb,bias="no")) > qplot(試行回数,分散,data=data, geom="line",colour=bias) |
R関連 >