Class: PtOnlineSchemaChange::Monitor
- Inherits:
-
Object
- Object
- PtOnlineSchemaChange::Monitor
- Includes:
- ElasticAPM::SpanHelpers
- Defined in:
- app/my_lib/pt_online_schema_change/monitor.rb
Constant Summary collapse
- PROGRESS_LOG_FILE =
Rails.root.join('log/pt_osc_progress.log')
- DEFAULT_POLL_INTERVAL =
seconds
30- DEFAULT_LAG_THRESHOLD =
seconds
10
Class Method Summary collapse
-
.check_replication_lag(table_name, threshold) ⇒ Object
Check and warn about replication lag.
-
.monitor_table(table_name, poll_interval, lag_threshold, log_to_file) ⇒ Object
Internal monitoring loop.
-
.pt_osc_running?(table_name) ⇒ Boolean
Check if PT-OSC is currently running on a table.
-
.replication_lag ⇒ Integer?
Get current replication lag in seconds.
-
.start_monitoring(table_name, options = {}) ⇒ Thread
Simple monitoring setup for use in migrations.
-
.stop_monitoring(monitor_thread, timeout = 30) ⇒ Boolean
Stops monitoring and waits for thread completion.
Class Method Details
.check_replication_lag(table_name, threshold) ⇒ Object
Check and warn about replication lag
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'app/my_lib/pt_online_schema_change/monitor.rb', line 155 def self.check_replication_lag(table_name, threshold) lag = replication_lag return unless lag && lag > threshold BUSINESS_LOGGER.warn('High replication lag detected during PT-OSC', { table_name: table_name, lag_seconds: lag, threshold_seconds: threshold, }) APMErrorHandler.report('High replication lag during PT-OSC', context: { table_name: table_name, lag: lag, threshold: threshold, }) end |
.monitor_table(table_name, poll_interval, lag_threshold, log_to_file) ⇒ Object
Internal monitoring loop
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'app/my_lib/pt_online_schema_change/monitor.rb', line 117 def self.monitor_table(table_name, poll_interval, lag_threshold, log_to_file) start_time = Time.current while pt_osc_running?(table_name) break if Thread.current[:stop_requested] # Check replication lag check_replication_lag(table_name, lag_threshold) # Log progress elapsed = Time.current - start_time = "PT-OSC running on #{table_name} for #{elapsed.round}s" BUSINESS_LOGGER.info(, { table_name: table_name, elapsed_seconds: elapsed.round, operation: 'pt_osc_progress', }) # Log to file if requested if log_to_file File.open(PROGRESS_LOG_FILE, 'a') do |f| f.puts "#{Time.current.iso8601} - #{}" end end sleep poll_interval end total_time = Time.current - start_time BUSINESS_LOGGER.info('PT-OSC monitoring completed', { table_name: table_name, total_duration_seconds: total_time.round, }) end |
.pt_osc_running?(table_name) ⇒ Boolean
Check if PT-OSC is currently running on a table
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'app/my_lib/pt_online_schema_change/monitor.rb', line 81 def self.pt_osc_running?(table_name) # Check for PT-OSC trigger table (created during operation) trigger_table = "_#{table_name}_new" result = ActiveRecord::Base.connection.execute( "SHOW TABLES LIKE '#{trigger_table}'", ) result.count > 0 rescue StandardError => e BUSINESS_LOGGER.error('Failed to check PT-OSC status', { table_name: table_name, error: e., }) false end |
.replication_lag ⇒ Integer?
Get current replication lag in seconds
103 104 105 106 107 108 109 110 111 112 113 |
# File 'app/my_lib/pt_online_schema_change/monitor.rb', line 103 def self.replication_lag result = ActiveRecord::Base.connection.execute('SHOW SLAVE STATUS') row = result.first return nil unless row && row['Seconds_Behind_Master'] row['Seconds_Behind_Master'].to_i rescue StandardError => e BUSINESS_LOGGER.error('Failed to check replication lag', { error: e. }) nil end |
.start_monitoring(table_name, options = {}) ⇒ Thread
Simple monitoring setup for use in migrations
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'app/my_lib/pt_online_schema_change/monitor.rb', line 23 def self.start_monitoring(table_name, = {}) poll_interval = [:poll_interval] || DEFAULT_POLL_INTERVAL lag_threshold = [:lag_threshold] || DEFAULT_LAG_THRESHOLD log_to_file = .fetch(:log_to_file, true) BUSINESS_LOGGER.set_business_context({ table_name: table_name, operation: 'pt_osc_monitoring' }) BUSINESS_LOGGER.info('Starting PT-OSC monitoring', { table_name: table_name, poll_interval: poll_interval, lag_threshold: lag_threshold, }) Thread.new do Thread.current[:name] = "pt_osc_monitor_#{table_name}" monitor_table(table_name, poll_interval, lag_threshold, log_to_file) rescue StandardError => e BUSINESS_LOGGER.error('PT-OSC monitoring failed', { table_name: table_name, error: e., }) APMErrorHandler.report(e, context: { table_name: table_name, operation: 'pt_osc_monitoring' }) end end |
.stop_monitoring(monitor_thread, timeout = 30) ⇒ Boolean
Stops monitoring and waits for thread completion
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'app/my_lib/pt_online_schema_change/monitor.rb', line 55 def self.stop_monitoring(monitor_thread, timeout = 30) return true unless monitor_thread&.alive? monitor_thread[:stop_requested] = true monitor_thread.join(timeout) if monitor_thread.alive? BUSINESS_LOGGER.warn('PT-OSC monitoring thread did not stop gracefully', { timeout: timeout }) monitor_thread.kill false else BUSINESS_LOGGER.info('PT-OSC monitoring stopped successfully') true end rescue StandardError => e BUSINESS_LOGGER.error('Error stopping PT-OSC monitoring', { error: e. }) APMErrorHandler.report(e, context: { operation: 'stop_pt_osc_monitoring' }) false end |