Skip to content

pmozil/zot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

3 Commits

Repository files navigation

Zot - a basic DFA regex implementation in zig

It's speed is comparable to POSIX C regex!

This project uses a nondeterministic FA to create the regex, and then converts it into a deterministic one.

Here's an example of how to use the regex structure:

conststd=@import("std"); constregex=@import("regex"); pubfnmain() !void{varbuf: [131072]u8=undefined; varalloc=std.heap.FixedBufferAllocator.init(&buf); varallocator=alloc.allocator(); varx: [24]u8="0x(0|1|2|3|4|5|6|7|8|8)+".*; vary: [6]u8="0x1223".*; varw: [4]u8="1223".*; varre=regex.Regex.new(allocator); std.debug.print("\n{s}\n", .{tryre.infixToPostfix(&x)}); tryre.compile(&x); std.debug.print("Match{s}:{}\n", .{y, re.match(&y) }); std.debug.print("Match{s}:{}\n", .{w, re.match(&w) }); std.debug.print("Total states:{}\n", .{re.autom.?.states.items.len}); for (0.., re.autom.?.states.items) |item, state|{std.debug.print( "Item #{} transitions (start ={}, finish ={}):\n", .{item, state.start, state.finish }, ); for (state.transitions.items) |trans|{std.debug.print( " - From:{} to{}, symbol ={s}\n", .{item, trans.dest, [1]u8{trans.symbolorelse121} }, )} std.debug.print("\n", .{})} }

And here's wat the program will output:

0x.01|2|3|4|5|6|7|8|8|+. Match 0x1223: true Match 1223: false Total states: 4 Item #0 transitions (start = true, finish = false): - From: 0 to 1, symbol = 0 Item #1 transitions (start = false, finish = false): - From: 1 to 2, symbol = x Item #2 transitions (start = false, finish = false): - From: 2 to 3, symbol = 0 - From: 2 to 3, symbol = 7 - From: 2 to 3, symbol = 6 - From: 2 to 3, symbol = 5 - From: 2 to 3, symbol = 4 - From: 2 to 3, symbol = 3 - From: 2 to 3, symbol = 2 - From: 2 to 3, symbol = 1 - From: 2 to 3, symbol = 8 Item #3 transitions (start = false, finish = true): - From: 3 to 3, symbol = 0 - From: 3 to 3, symbol = 7 - From: 3 to 3, symbol = 6 - From: 3 to 3, symbol = 5 - From: 3 to 3, symbol = 4 - From: 3 to 3, symbol = 3 - From: 3 to 3, symbol = 2 - From: 3 to 3, symbol = 1 - From: 3 to 3, symbol = 8 

It even works with most unicode strings (though this was accidental)

conststd=@import("std"); constregex=@import("regex"); pubfnmain() !void{varbuf: [131072]u8=undefined; varalloc=std.heap.FixedBufferAllocator.init(&buf); varallocator=alloc.allocator(); varx="ä(bc|d)+".*; vary="äbcbcd".*; varw: [4]u8="1223".*; varre=regex.Regex.new(allocator); std.debug.print("\n{s}\n", .{tryre.infixToPostfix(&x)}); tryre.compile(&x); std.debug.print("Match{s}:{}\n", .{y, re.match(&y) }); std.debug.print("Match{s}:{}\n", .{w, re.match(&w) })}
ä.bc.d|+. Match äbcbcd: true Match 1223: false 

About

A basic DFA-based regex written in zig

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages