Ruby Enterprise のGCの評価(その3)

申し訳ありません。前回の評価は間違っていました。

GC.copy_on_write_friendly = true

と、書かないとRuby Enterpriseの特徴であるCoW対応が無効になっているみたいです。
GC.copy_on_write_friendly = true
をつけて、再度試してみました。

その結果・・・、遅くなりました涙)
全体の実行時間で前回の結果は、Ruby Enterpriseが307秒でRuby 1.8.7-p22が292秒でしたが、今回は309秒です。

プロファイルの結果です。

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 33.29    102.98   102.98 58787647     0.00     0.00  rb_eval
 13.16    143.68    40.70 692211053     0.00     0.00  rb_call0
  6.90    165.01    21.33 368068019     0.00     0.00  st_lookup
  6.31    184.52    19.51 692211053     0.00     0.00  rb_call
  3.21    194.44     9.92 341393389     0.00     0.00  rb_newobj
  2.68    202.72     8.28    16990     0.00     0.00  garbage_collect
  2.60    210.77     8.05 375211939     0.00     0.00  gc_mark
  2.26    217.76     6.99 431608775     0.00     0.00  rb_bf_mark_table_contains
  2.09    224.21     6.45 475725978     0.00     0.00  rb_bf_mark_table_heap_contains
  1.76    229.67     5.46    33980     0.00     0.00  mark_locations_array
  1.76    235.10     5.43 22716253     0.00     0.00  st_foreach
  1.48    239.68     4.58 62669730     0.00     0.00  gc_mark_children
  1.19    243.36     3.68 226673991     0.00     0.00  ivar_get
  1.15    246.92     3.56 49484355     0.00     0.00  st_insert
  1.13    250.41     3.49 221030022     0.00     0.00  rb_bf_mark_table_add
  1.04    253.63     3.22 68483553     0.00     0.00  pointer_set_insert
  0.98    256.67     3.04 80600372     0.00     0.00  ruby_xmalloc
  0.96    259.63     2.96 113526801     0.00     0.00  flo_mul
  0.93    262.52     2.89 221030038     0.00     0.00  rb_bf_mark_table_heap_remove
  0.84    265.11     2.59 11631995     0.00     0.00  st_free_table
  0.83    267.68     2.57 29872924     0.00     0.00  rb_obj_is_kind_of
  0.75    270.00     2.32 86679635     0.00     0.00  rb_bf_mark_table_remove
  0.72    272.23     2.23 220312076     0.00     0.00  rb_float_new
  0.71    274.42     2.19 10805853     0.00     0.00  rb_yield_0
  0.66    276.47     2.05 417544996     0.00     0.00  numhash
  0.61    278.35     1.88                             __chkstk
  0.58    280.14     1.79 49482156     0.00     0.00  rb_ivar_set

さて、AutohrNariさんhttp://d.hatena.ne.jp/authorNari/20080625/1214315712がぶん回すので、遅くなるよって言っている処理がどのくらい起きているか調べてみました。

static inline struct heaps_slot *
find_heap_slot_for_object(RVALUE *object)
{
	register int i;

	/* Look in the cache first. */
	if (last_heap != NULL && object >= last_heap->slot
	 && object < last_heap->slotlimit) {
	        cnthit++;                                  //  追加しました
		return last_heap;
	}
	cntmiss++;     	                                   //  追加しました
	for (i = 0; i < heaps_used; i++) {
		struct heaps_slot *heap = &heaps[i];
		if (object >= heap->slot
		 && object < heap->slotlimit) {
			/* Cache this result. According to empirical evidence, the chance is
			 * high that the next lookup will be for the same heap slot.
			 */
			last_heap = heap;
			return heap;
		}
	}
	return NULL;
}

こんな感じで、カウンターを追加してキャッシュのヒット率を調べてみました。

その結果です。

Heap Cache hit: 554237669
Heap Cache miss: 185080763

数が大きくて分かりずらいですが、hit / (hit + miss)を計算すると、0.749となりヒット率が約75%となります。まあまあの効果じゃないかなと思います。現状のCRubyのis_pointer_to_heapは同様にヒープ中をぶん回しているので、同様のキャッシュを入れると効果があるのかもしれないなとか思いました。もっとも、Ruby Enterprisと違い使用頻度はとても少なそうですが。