For one of my projects, we have a list of around 100 domains in a ‘blacklist’. Because of the way the blacklist works, we need to allow non-programmers to enter sites using simple wildcards. Then I convert that into a regular expression:
Regexp.new("^#{regexp.gsub(".","\\.").gsub("*",".*")}$",Regexp::IGNORECASE)
When spidering sites, if a site we find is in this list, we exclude it. I was building a regular expression of all these sites and ‘unioning’ them all together into one mega-regular expression. Then, to see if a domain is on the ‘blacklist’, i just do:
domain =~ my_mega_regex
However, what I found is that a regular expression longer than about 159 separate predicates causes ruby to segfault. This happens on ruby 1.8.4 and 1.8.5. Here’s the simplist code I can repro this with:
r=Regexp.new(/^$/);1.upto(1000) { |i| r= Regexp.union(r,Regexp.new("^#{i}$"));puts i unless "foo" =~ r }