(VB6) Parser Combinators

最近花相當多時間研究如何在VB6這種環境中實作Parser Combinators。相關論文如 “Higher-Order Functions for Parsing" https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf 經常用函數語言做例子,我看了論文之後,得到許多概念。目前最後做到的VB6 Parser相當受那些論文影響。

首先定義ParseTree。一段文字輸入之後,會剖析成parsed和rest二部份。ParseTree定義為鏈接串列,節點中有欄位parsed代表一個剖析出來的數字、rest代表剖析之後剩下的數字、然後有successor和predecessor代表下一個或上一個部份。可能還需要有個type表示parsed的類型,而這個type跟未來要實作的語意很有關係。ParseTree的用法,如果節點有successor,就要到下一個節點找其他剖析結果;而如果節點沒有successor,就要在rest找剖析之後剩下的文字。

''ParseTree.cls
Public Parsed As String
Public Rest As String
Public Predecessor As ParseTree
Public Successor As ParseTree
Public Type As String

Public Sub Layout()
    Debug.Print "P[" & Me.Parsed & "]",
    If Me.Successor Is Nothing Then
        Debug.Print "R[" & Me.Rest & "]"
    Else
        Call Me.Successor.Layout
    End If
End Sub

Parser程式。首先是Parser的最基本單元,fail_parsing/1,判定一個ParseTree是否剖析失敗。

''Parse.cls
Private Function fail_parsing(Parsed As ParseTree) As Boolean
    fail_parsing = Parsed Is Nothing
End Function

一開始想要剖析一個詞或是字串。一個剖析程式只是將輸入文字中想拿出的一段文字內容取出視為parsed,剩下的放在rest。主要的剖析程式是以下這樣子:其中主要判定字母或字元是不是英文字母的判斷式,定義在別的地方;而當輸入是空字串時,先認定為剖析失敗。目前還沒處理語意,所以還沒使用到parsed type;至於將來使用到BNF並且語意明確時,這裡的parser全都要加上parsed type。

''Parse.cls
Public Function ParseTerm(cmd As String) As ParseTree
    Dim i As Long
    Dim symbol1 As String
    Dim parsed1 As ParseTree
    cmd = Trim(cmd)
    If cmd = "" Then
        Set parsed1 = Nothing
    Else
        For i = 1 To Len(cmd)
            symbol1 = Mid(cmd, i, 1)
            If Not category.InTerm(symbol1) Then
                Exit For
            End If
        Next i
        If i = 1 Then
            Set parsed1 = Nothing
        Else
            Set parsed1 = New ParseTree
            parsed1.Parsed = Left(cmd, i - 1)
            parsed1.Rest = Mid(cmd, i)
        End If
    End If
    Set ParseTerm = parsed1
End Function

Public Function ParseString(cmd As String) As ParseTree
    Dim i As Long
    Dim parsed1 As ParseTree
    cmd = Trim(cmd)
    If cmd = "" Then
        Set parsed1 = Nothing
    Else
        Set parsed1 = ParseTerm(cmd)
        If fail_parsing(parsed1) Then
            Set parsed1 = Nothing
        Else
            For i = 1 To Len(parsed1.Parsed)
                If Not category.IsAlphabet(Mid(parsed1.Parsed, i, 1)) Then
                    Exit For
                End If
            Next i
            If i = 1 Then
                Set parsed1 = Nothing
            End If
        End If
    End If
    Set ParseString = parsed1
End Function

如下,剖析數字也相當類似剖析文字那樣。數字也是term,所以ParseNumber/1裡頭借用了ParseTerm/1。

''Parse.cls
Public Function ParseNumber(cmd As String) As ParseTree
    Dim i As Long
    Dim parsed1 As ParseTree
    cmd = Trim(cmd)
    Set parsed1 = ParseTerm(cmd)
    If cmd = "" Then
        Set parsed1 = Nothing
    Else
        If fail_parsing(parsed1) Then
            Set parsed1 = Nothing
        Else
            For i = 1 To Len(parsed1.Parsed)
                If Not category.IsNumber(Mid(parsed1.Parsed, i, 1)) Then
                    Exit For
                End If
            Next i
            If i = 1 Then
                Set parsed1 = Nothing
            End If
        End If
    End If
    Set ParseNumber = parsed1
End Function

至於以下,剖析一個符號,要先指定要剖析哪個符號。

''Parse.cls
Public Function ParseSymbol(Symbol As String, cmd As String) As ParseTree
    Dim parsed1 As ParseTree
    cmd = Trim(cmd)
    If cmd = "" And Symbol <> "" Then
        Set parsed1 = Nothing
    Else
        Set parsed1 = New ParseTree
        With parsed1
            Select Case Left(cmd, 1)
            Case Symbol
                .Parsed = Symbol
                .Rest = Mid(cmd, Len(Symbol) + 1)
            Case Else
                Set parsed1 = Nothing
            End Select
        End With
    End If
    Set ParseSymbol = parsed1
End Function

以上有了基本單元,ParseString/1剖析一段文字、ParseNumber/1剖析數字、ParseTerm/1剖析文字或數字、ParseSymbol/2剖析符號。

接下來要定義parser combinator。首先我們需要ParseSequence/2,它拿到一串parser命令,就會把輸入文字先用第一個parser剖析,剖析剩下的文字用第二個parser剖析,然後是第三個parser,以此類推。如果任何一個parser剖析失敗,則ParseSequence/2必須傳回Nothing。如果全都剖析成功,最後是將全部的parsed合併為一個字串,也將最後的rest拿出來,一起放到一個節點中傳回。

''Parse.cls
Public Function ParseSequence(cmd As String, ParseFns() As String) As ParseTree
    Dim i As Long
    Dim fn1 As String
    Dim arg1 As String
    Dim parsed1 As ParseTree
    Dim p As ParseTree
    Dim q As ParseTree
    Dim parser1 As New Parser
    Dim cmd1 As String
    cmd = Trim(cmd)
    If cmd = "" Then
        Set parsed1 = Nothing
    Else
        cmd1 = cmd
        Set parsed1 = New ParseTree
        Set p = parsed1
        For i = LBound(ParseFns) To UBound(ParseFns)
            arg1 = ""
            If InStr(ParseFns(i), "[") = 0 Then
                fn1 = ParseFns(i)
            Else
                Dim pos1 As Long
                pos1 = InStr(ParseFns(i), "[")
                fn1 = Left(ParseFns(i), pos1 - 1)
                arg1 = Mid(ParseFns(i), pos1 + 1, InStr(pos1 + 1, ParseFns(i), "]") - pos1 - 1)
            End If
            If arg1 = "" Then
                Set p.Successor = CallByName(Me, fn1, VbMethod, cmd)
            Else
                Set p.Successor = CallByName(Me, fn1, VbMethod, arg1, cmd)
            End If
            If fail_parsing(p.Successor) Then
                Exit For
            Else
                cmd = p.Successor.Rest
                Set q = p
                Set p = p.Successor
            End If
        Next i
        If i < UBound(ParseFns) Then
            Set parsed1 = Nothing
        Else
            With parsed1
                .Parsed = MergeParsed(parsed1)
                .Rest = EventualRest(parsed1)
                Set .Successor = Nothing
            End With
        End If
    End If
    Set parser1 = Nothing
    Set ParseSequence = parsed1
End Function

另一個重要的parser combinator是ParseAlternative/2。如果有ParseAlternative/2加上ParseSequence/2,就可以在程式中寫BNF。注意:ParseAlternative/2接受ParamArray參數ListOfParseFns() As Variant,而ParseSequence/2接受陣列參數ParseFns() As String。

''Parse.cls
Public Function ParseAlternative(cmd As String, ParamArray ListOfParseFns() As Variant) As ParseTree
    ''缺漏待補
End Function

我需要的BNF是像Excel的函數一樣。

Expression ::= String ( Equation-Sequence )

Equation ::= String ( Equation-Sequence ) Equation-Sequence
           | Number Equation-Sequence

Equation-Sequence ::= , Equation
                    | Equation
                    | empty-string

Number ::= Digit Number
         | Number

Digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

我的BNF核心部份的程式如下:ParseEquation/1和ParseEquSeq/1是交互遞迴程式,撰寫及測試上有一點點難度。目前的程式碼還沒有用到ParseAlternative/n,而是用巢狀If-Else處理。

''Parse.cls
Public Function ParseEquation(cmd As String) As ParseTree
    Dim i As Long
    Dim count_symbol As Long
    Dim args() As String
    Dim parsed1 As ParseTree
    Dim parser1 As New Parser
    cmd = Trim(cmd)
    If cmd = "" Then
        Set parsed1 = Nothing
    Else
        args = Split("ParseString::ParseSymbol[(]::ParseEquSeq::ParseSymbol[)]::ParseEquSeq", "::")
        Set parsed1 = ParseSequence(cmd, args)
        If fail_parsing(parsed1) Then
            args = Split("ParseNumber::ParseEquSeq", "::")
            Set parsed1 = ParseSequence(cmd, args)
        End If
    End If
    Set parser1 = Nothing
    Set ParseEquation = parsed1
End Function

Public Function ParseEquSeq(cmd As String) As ParseTree
    Dim i As Long
    Dim count_symbol As Long
    Dim args() As String
    Dim parsed1 As ParseTree
    Dim parser1 As New Parser
    cmd = Trim(cmd)
    args = Split("ParseSymbol[,]::ParseEquation", "::")
    Set parsed1 = ParseSequence(cmd, args)
    If fail_parsing(parsed1) Then
        args = Split("ParseEquation", "::")
        Set parsed1 = ParseSequence(cmd, args)
        If fail_parsing(parsed1) Then
            Set parsed1 = New ParseTree
            parsed1.Rest = cmd
        End If
    End If
    Set parser1 = Nothing
    Set ParseEquSeq = parsed1
End Function

以上,是我最近在努力撰寫的VB6 Parser。所需應用的場合是在VB6程式中使用domain-specific language,而目前暫時將DSL設計為看起來像Excel函數的語言。

廣告

About 黃耀賢 (Yau-Hsien Huang)

熱愛 Erlang ,並且有相關工作經驗。喜歡程式語言。喜歡邏輯。目前用 Python 工作。
本篇發表於 Programming。將永久鏈結加入書籤。

2 Responses to (VB6) Parser Combinators

  1. swoods 說道:

    What is category?

  2. 黃耀賢 (Yau-Hsien Huang) 說道:

    I recall it. The ‘category’ means a set of functions for checking types or kinds of data, and it’s slightly related to the Category Theory. And the “category.bas" is a VB6 module.

迴響已關閉。