Introduction To Computer Theory Daniel I A Cohen Solution Manual

introduction to computer theory 2nd edition pdf download

Introduction To Computer Theory 2nd Edition Textbook. May 2nd, - Automata Theory By Daniel Cohen pdf Free Download Here Automata Theory. Download & View Daniel I. A. Cohen - Introduction To Computer Theory - 2nd Edition - Chapter 2 as PDF for free. More details. Pages: 7. Preview; Full text. Introduction to Computer Theory By Sir Daniel A Cohen 2nd Edition Pdf Free Download. and also free download solution manual or solved Introduction to.

Introduction to computer theory 2nd edition pdf download - site, with

Introduction to computer theory 2nd edition pdf download - clearly sorry

Introduction to Computer Theory by Daniel I. a. Cohen - Second Edition

Short Description

Download Introduction to Computer Theory by Daniel I. a. Cohen - Second Edition

Description

INTRODUCTION TO COMPUTER THEORY

INTRODUCTION TO COMPUTER THEORY SECOND EDITION

Daniel I. A. Cohen Hunter College City Unil•ersity f New York

John Wiley & Sons, Inc.

ACQUISITIONS EDITOR

Regina Brooks

MARKETING MANAGER

Jay Kirsch

SENIOR PRODUCTION EDITOR DESIGN SUPERVISOR

Tony VenGraitis

Anne Marie Renzi

MANUFACTURING MANAGER

Mark Cirillo

ILLUS TRATION COORDINATOR PRODUCTION MANAGEMENT

Rosa Bryant

1. Carey Publishing Service

Recognizing the importance of preserving what has been written. it is a policy of John Wiley

& Sons. Inc. to have books of enduring value published

in the United States printed on acid-free paper. and we exert our best efforts to that end.

The paper in this book was manufactured by a mill whose forest management programs include sustained yield harvesting of its timberlands. Sustained yield harvesting principles ensure that the number of trees cut each year does not exceed the amount of new growth. Copyright© , , by John Wiley

& Sons, Inc.

All rights reserved. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections or of the United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Rosewood Drive, Danvers, MA , () ,

fax

() Requests to the Publisher for permission should be addressed to the

Permissions Department, John Wiley () ,

& Sons, Inc., River Street, Hoboken, NJ ,

fax () , E-Mail: [email&#;protected]

To order books or for customer service please, call 1() -CALL -WILEY ().

20

19

18

17

16

15

14

13

12

Au Professeur M.-P. Schiitzenberger rnmme un temoignage de profonde et ajjectueuse reconnaissance

During the preparation of this sernnd edition Alonzo Church has passed away at the age of As a mathematical logician he was a theoretician par excellence and preeminent in the developmenr

FAs and Their Languages

65

Starting at the start state, anything but the sequence baa will drop down into the collecting bucket at the bottom, never to be seen again. Even the word baabh will fail. It will reach the final state marked with a + , but then the next letter will suicide over the edge. The language accepted by this FA is L = ( baa }

•

EXAMPLE

The FA below accepts exactly the two strings baa and ab:

(I ,

h

Big machine, small language.

•

EXAMPLE Let us take a

trickier ex am p l e Consider the FA shown below: .

a

What is the language accepted by this machine? We start at state I , and if we are read ing a word starting with an a, we go straight to the final state 3. We can stay at state 3 as long as we continue to read onl y a 's. Therefore, a l l words of the form aa *

66

CHAPTER

5

Finite Automata

are accepted by this machine. What if we began with some a 's that take us to state 3 but then we read a b? This then transports us to state 2. To get back to the final state, we must pro ceed to state 4 and then to state 3. These trips require two more b's to be read as input. No tice that in states 2, 3, and 4 all a 's that are read are ignored. Only b's cause a change of state. Recapitulating what we know : If an input string begins with an a and then has some h's, it must have 3 h's to return us to state 3, or 6 h's to make the trip (state 2, state 4, state 3) twice, or 9 b's, or 1 2 h's and so on. In other words, an input string starting with an a and having a total number of h's divisible by 3 will be accepted. If it starts with an a and has a total number of h's not divisible by 3, then the input is rejected because its path through the machine ends at state 2 or 4. What happens to an input string that begins with a h? It finds itself in state 2 and needs two more b's to get to state 3 (these h's can be separated by any number of a's). Once in state 3, it needs no more h's, or three more h's, or six more h's, and so on. All in all , an input string, whether beginning with an a or a h, must have a total number of b's divisible by 3 to be accepted. It is also clear that any string meeting this requirement will reach the final state. The language accepted by this machine can be defined by the regular expression a*(a*ba*ba*ba*)*(a + a*ba*ba*ba*)

The only purpose for the last factor is to guarantee that A is not a possibility because it is not accepted by the machine. If we did not mind A being included in the language, we could have used this simpler FA: (/

(/

(/

The regular expression (a + ba* ba*b)+

also defines the original (non-A) language, whereas the regular expression (a*ba*ba*ba*)*

•

defines the language of the second machine. EXAMPLE

The following FA accepts only the word A : a. h

~

67

FAs and Their Languages

Notice that the left state is both a start and a final state. All words other than A go to the right state and stay there. • EXAMPLE

Consider the following FA: h

a

(!

h

No matter which state we are in, when we read an a, we go to the right-hand state, and when we read a h, we go to the left-hand state. Any input string that ends in the + state must end in the letter a, and any string ending in a must end in + . Therefore, the language ac cepted by this machine is (a + b)*a

•

EXAMPLE

The language in the example above does not include A. I f we add A, we get the language of all words that do not end in h. This is accepted by the FA below: a

h

h

a

• EXAMPLE

Consider the following FA:

a

The only letter that causes motion between the states is a; h s leave the machine in the same state. We start at . If we read a first a, we go to + . A second a takes us back. A third a takes us to + again. We end at + after the first, third, fifth, seventh, a. The language accepted by this machine is all words with an odd number of a 's, which could also be de fi ned by the regular expression '

-

.

b*ab*(ab*ab*)*

.

.

•

68

CHAPTER

5

Finite Automata

EXAMPLE

Consider the following FA: a. b

b a a

b

This machine will accept the language of all words with a double a in them somewhere. We stay in the start state until we read our first a. This moves us to the middle state. If the very next letter is another a, we move to the + state, where we must stay and eventually be accepted. If the next letter is a b, however, we go back to - to wait for the next a. We can identify the purposes that these states serve in the machine as follows: Start state The previous input letter (if there was one) was not an a. Middle state We have just read an a that was not preceded by an a . Final state We have already encountered a double a and we are going to sit here un til the input is exhausted. Clearly, if we are in the start state and we read an a, we go to the middle state, but if we read a h, we stay in the start state. When in the middle state, an a sends us to nirvana, where ultimate acceptance awaits us, whereas a h sends us back to start, hoping for the fi rst a of a double letter. The l anguage accepted by this machine can also be defined by the regular expression (a + b)*aa(a + b)*

•

EXAMPLE

The following FA accepts all words that have different first and last letters. If the word be gins with an a, to be accepted it must end with a h and vice versa. a

h

a

If we start with an a, we take the high road and jump back and forth between the two top states ending on the right (at + ) only if the last letter read is a h. If the first letter read is a h, we go south. Here, we get to the + on the bottom only when we read a as the last letter.

69

EVEN-EVEN Revisited

This can be better understood by examining the path through the FA of the input string aabhaahh, as shown below:

0 It will be useful for us to consider this FA as having a primitive memory device. For the top two states, no matter how much bouncing we do between them, remember that the first letter read from the input string was an a (otherwise, we would never have gotten up here to begin with). For the bottom two states, remember that the first input letter was a h. Lower non + state The input started with a h and the last letter we have read from the input string is also a h. Lower + state The input started with a h and the last letter read so far is an a . •

1} EVEN-EVEN REVISITED EXAMPLE As the next example of an FA in this chapter, let us consider the picture below: h

a

a

h

To process a string of letters, we start at state I , which is in the upper left of the picture. Every time we encounter a letter a in the input string, we take an a train. There are four edges labeled a. All the edges marked a go either from one of the upper two states (states I and 2) to one of the lower two states (states 3 and 4 ), or else from one of the lower two states

70

CHAPTER

5

Finite Automata

to one of the upper two states. If we are north and we read an a, we go south. If we are south and we read an a, we go north. The letter a reverses our up/down status. What happens to a word that gets accepted and ends up back in state 1 ? Without know ing anything else about the string, we can say that it must have had an even number of a 's in it. Every a that took us south was balanced by some a that took us back north. We crossed the Mason - Dixon line an even number of times, one for each a. So, every word in the lan guage of this FA has an even number of a 's in it. Also, we can say that every input string with an even number of a 's will finish its path in the north (state 1 or 2). There is more that we can say about the words that are accepted by this machine. There are four edges labeled h. Every edge labeled h either takes us from one of the two states on the left of the picture (states I and 3) to one of the two states on the right (states 2 and 4), or else takes us from one of the two states on the right to one of the two states on the left. Every h we encounter in the input is an east/west reverser. If the word starts out in state 1 , which is on the left, and ends up back in state 1 (on the left), it must have crossed the Mississippi an even number of times. Therefore, all the words in the language accepted by this FA have an even number of h 's as well as an even number of a 's. We can also say that every input string with an even number of h's will leave us in the west (state 1 or 3 ). These are the only two conditions on the language. All words with an even number of a's and an even number of h's must return to state I . All words that return to state 1 are in EVEN-EVEN. All words that end in state 2 have crossed the Mason - Dixon line an even number of times but have crossed the Mississippi an odd number of times; therefore, they have an even number of a's and an odd number of b's. All the words that end in state 3 have an even number of h's but an odd number of a 's. All words that end in state 4 have an odd number of a's and an odd number of h's. So again, we see that all the EVEN-EVEN words must end in state I and be accepted. One regular expression for the language EVEN-EVEN was discussed in detail in the previous chapter. • Notice how much easier it is to understand the FA than the regular expression. Both methods of defining languages have advantages, depending on the desired application. But in a theory course we rarely consider applications except in the following example. EXAMPLE

We are programmers hired to write a word processor. As part of this major program, we must build a subroutine that scans any given input string of English letters and spaces and lo cates the first occurrence of the substring cat whether it is a word standing alone or part of a longer word such as abdicate. We envision the need for four states: State 1 State 2 State 3 State 4

We have not just read a c; this is the start state. The last letter read was a c. The last letter read was an a that came after a c. We have just encountered the substring cat and control of this program must transfer somewhere else.

If we are in state 1 and read anything but a c, we stay there. In state 1 if we read a c, we go unconditionally to state 2.

71

Problems

I f w e are in state 2 and we read a n a, w e g o t o state 3. I f w e read another c , w e stay in state 2 because this other c may be the beginning of the substring cat. If we read anything else, we go back to state I . If we are in state 3 and we read a t, then we go to state 4. If we read any other letter ex cept c, we have to go back to state I and start all over again, but if we read a c, then we go to state 2 because this could be the start of something interesting. The machine looks like this: any letter

a l l except c

all except c and t

The input Boccaccio will go through the sequence of states 1 - l - 1 l - 1 and the input will not be accepted. The input desiccate will go through the states: l - 1 - l - 1 - 1 and terminate (which • in this example is some form of acceptance) before reading the final e.

1} PROBLEMS 58

1.

Write out the transition tables for the FAs on pp. 56, defined by pictures.

2.

Build an FA that accepts only the language of all words with h as the second letter. Show both the picture and the transition table for this machine and find a regular expres sion for the language.

3.

Build an FA that accepts only the words baa, ah, and ahh and no other strings longer or shorter.

4.

(i) Build an FA with three states that accepts all strings. (ii) Show that given any FA with three states and three + 's, it accepts all input strings. (iii) If an FA has three states and only one + , must it reject some inputs?

5.

(i) Build an FA that accepts only those words that have more than four letters. (ii) Build an FA that accepts only those words that have fewer than four letters. (iii) Build an FA that accepts only those words with exactly four letters.

6.

Build an FA that accepts only those words that do not end with ha.

7.

Build an FA that accepts only those words that begin or end with a double letter.

8.

Build an FA that accepts only those words that have an even number of substrings ah.

9.

(both), 63, 64, and 69 that were

(i) Recall from Chapter 4 the language of all words over the alphabet l a h } that have both the letter a and the letter h in them, but not necessarily in that order. Build an FA that accepts this language.

72

CHAPTER 5

Finite Automata

(ii) Build an FA that accepts the language of all words with only a 's or only h's in them. Give a regular expression for this language.

Consider all the possible FAs over the alphabet ( a h I that have exactly two states. An FA must have a designated start state, but there are four possible ways to place the + 's:

type 1

type 2

type 3

type 4

Each FA needs four edges (two from each state), each of which can lead to either of the states. There are 24 = 1 6 ways to arrange the labeled edges for each of the four types of FAs. Therefore, in total there are 64 different FAs of two states. However, they do not represent 64 nonequivalent FAs because they are not all associated with different lan guages. All type l FAs do not accept any words at all, whereas all FAs of type 4 accept all strings of a 's and h's. (i) Draw the remaining FAs of type 2. (ii) Draw the remaining FAs of type 3. (iii) Recalculate the total number of two-state machines using the transition table defin ition. Show that there are exactly different finite automata with three states the alphabet ( a h ) , where x is always the start state.

.r,

y,

:

over

Suppose a particular FA, called FIN, has the property that it had only one final state that was not the start state. During the night, vandals come and switch the + sign with the sign and reverse the direction of all the edges. (i) Show that the picture that results might not actually be an FA at all by giving an example. (ii) Suppose, however, that in a particular case what resulted was, in fact, a perfectly good FA. Let us call it NIF. Give an example of one such machine. (iii) What is the relationship between the language accepted by FIN and the language accepted by NIF as described in part (ii)? Why? (iv) One of the vandals told me that if in FIN the plus state and the minus state were the same state, then the language accepted by the machine could contain only palin dromic words. Defeat this vandal by example.

We define a removable state as a state such that if we erase the state itself and the edges that come out of it, what results is a perfectly good-looking FA. (i) Give an example of an FA that contains a removable state. (ii) Show that if we erase a removable state the language defined by the reduced FA is exactly the same as the language defined by the old FA.

(i) Build an FA that accepts the language of all strings of a's and h's such that the next-to-last letter is an a. (ii) Build an FA that accepts the language of all strings of length 4 or more such that the next-to-last letter is equal to the second letter of the input string.

Problems

73

Build a machine that accepts all strings that have an even length that is not divisible by 6.

Build an FA such that when the labels a and b are swapped the new machine is different from the old one but equivalent (the language defined by these machines is the same).

Describe in English the languages accepted by the following FAs: (i) b

a, b

a

a, b

(ii)

a. h

(iii)

"·

a. h h

a. h

(iv) Write regular expressions for the languages accepted by these three machines.

74

CHAPTER 5

Finite Automata

The following is an FA over the alphabet � = { a b c } . Prove that it accepts all strings that have an odd number of occurrences of the substring abc.

a

b, ('

Consider the following FA:

a. b

a . h a. h a. b a. b a. b a, h a . b

Problems

75

(i) Show that any input string with more than three letters is not accepted by this FA. (ii) Show that the only words accepted are a, aab, and bah. (iii) Show that by changing the location of + signs alone, we can make this FA accept the language { bb aha bba } . (iv) Show that any language in which the words have fewer than four letters can be accepted by a machine that looks like this one with the + signs in different places. (v) Prove that if l is a finite language, then there is some FA that accepts l extending the binary-tree part of this machine several more layers if necessary.

Let us consider the possibility of an infinite automaton that starts with this infin ite bi nary tree:

Let l be any infinite language of strings of a's and h's whatsoever. Show that by the ju dicious placement of + 's, we can tum the picture above into an infinite automaton to accept the language l. Show that for any given finite string, we can determine from this machine, in a finite time, whether it is a word in l. Discuss why this machine would not be a satisfactory language-definer for l.

CHAPTER 6

Transition Graphs

� RELAXING THE RESTRICTION O N INPUTS We saw in the last chapter that we could build an FA that accepts only the word baa. The ex ample we gave required five states primarily because an FA can read only one letter from the input string at a time. Suppose we designed a more powerrul machine that could read either one or two letters of the input string at a time and could change its state based on this input information. We might design a machine like the one below:

Because when we say "build a machine," all we have to do is scribble on paper- we do not have to solder, weld, and screw - we could easily change the rules of what constitutes a machine and allow such pictures as the one above. The objects we deal with in this book are only mathematical models. In general, practically anything can be a mathematical model so long as it is a well-defined set of rules for playing with some abstract constructs, but the ob vious question remains: a mathematical model of what? The FAs defined in the previous chapter started out on a dubious note when they were analogized to being mathematical models of children's games. However, we did later pro duce some reasons for thinking that they were of use to computer science because they rep resent, in a meaningful way, states in certain programmable algorithms. The mathematical models that we shall introduce in this chapter will differ in a significant way. We cannot as of yet explain the direct application of these entities to the normal experience of a program76

77

Relaxing the Restriction on Inputs

m ing student. That does not mean that their importance must be accepted on blind faith merely patience. They will be of utmost practical value for us in the all-important next chap ter. Beyond that service, the underlying special features that distinguish them from FAs will introduce us to a theme that will recur often in our study of computer theory. As for the mo ment, we are proposing to investigate a variation of FAs. There are still states and edges that consume input letters, but we have abandoned the requirement that the edges eat just one let ter at a time. As we shall see soon, this is accompanied by several other coordinated adjust ments. If we are interested in a machine that accepts only the word baa, why stop at assuming that the machine can read just two letters at a time? A machine that accepts this word and that can read up to three letters at a time from the input string could be built with even fewer states:

or even

� baa � +

If we interpret the picture on the right as an FA-like machine, we see that not only does

baa alone get to the final state, but all other input strings end up actually nowhere. If we start in the minus state and the first letter of the input is an a, we have no direction as to what to do. The picture on the left at least tells us that when the input fails to be of the desired form, we must go to the garbage collection state and read through the rest of the input string in the full knowledge that we can never leave there. The picture on the right gives us another problem with the input baabb. The first three letters take us to the accept state, but then something undetermined (presumably bad) hap pens when we read any more of the input letters. According to the rules of FAs, one cannot stop reading input letters until the input string completely runs out. The picture on the right does not tell us where to go for most of the situations we may have to face while reading in puts. By convention, we shall assume that there is associated with the picture, but not drawn, some trash-can state that we must go to when we fail to be able to make any of the allowable indicated legal edge crossings in the picture. Once in this state, we must abandon all hope of ever leaving and getting to acceptance. Many of the FAs in the previous chapter had such in escapable nonacceptance black holes that had to be drawn in detail. We now consider the two pictures above to be equivalent for all practical purposes. They are only distinguishable in trivial ways, such as by having a different number of states, but they accept the exact same language. Rather than an imaginary hell-state as we have described just now, it is more stan dard to introduce a new term to describe what happens when an input is running on a ma chine and gets into a state from which it cannot escape though it has not yet been ful l y read.

78

CHAPTER 6

Transition Graphs

DEFINITION

When an input string that has not been completely read reaches a state (final or otherwise) that it cannot leave because there is no outgoing edge that it may follow, we say that the input (or the machine) crashes at that state. Execution then terminates and the input must be rejected. • Let us make note of the fact that on an FA it is not possible for any input to crash be cause there is always an outgoing a-edge and an outgoing b-edge from each state. As long as there remain letters unread, progress is possible. There are now two different ways that an input can be rejected: It could peacefully trace a path ending a nonfinal state, or it could crash while being processed. These two different ways of being unsuccessful are the experience of all programmers. If we hypothesize that a machine can read one or two letters at a time, then one can be built using only two states that can recognize all words that contain a double letter: a. h

n. h a a . hh

If we are going to bend the rules to allow for a machine like the last one, we must real ize that we have changed something more fundamental than just the way the edges are la beled or the number of letters read at a time. This last machine makes us exercise some choice in its running. We must decide how many letters to read from the input string each time we go back for more. This decision is quite important. Let us say, for example, that the input string is baa. It is easy to see how this string can be accepted by this machine. We first read the letter b, which leaves us back at the start state by taking the loop on the left. Then we decide to read both letters aa at once, which allows us to take the highway to the final state where we end. However, if after reading the single character b, we then decided to read the single character a, we would loop back and be stuck at the start state again. When the third letter is read, we would stil l be at the starting post. We could not then accept this string. There are two different paths that the input baa can take through this machine. This is totally different from the situation we had before, especially because one path leads to acceptance and one to rejection. Another bad thing that might have happened is that we could have started processing the string baa by reading the first two letters at once. Because ba is not a double letter, we could not move to the final state. In fact, when we read ba, no edge tells us where to go, because ba is not the label of any edge leaving the start state. The processing of this string breaks down at this point and the machine crashes. So, there is the inherent possibility of reading variable amounts of letters from the input at each state. Therefore, the input string can follow a variety of paths through the machine, differing not only in their edge-length but also in their final disposition. Some paths may lead to acceptance the usual way and some to rejec tion two ways: either by ending in a nonfinal state or by causing the whole machine to crash. What shall we say? Is this input string part of the language of this machine or not? It cannot be made to depend on the cleverness or whim of the machine operator and the number of let ters he or she feels like inputting at each state - it must be an absolute yes or no, or else the language is not well defined in the sense that we have been using. The result of these considerations is that if we are going to change the definition of our abstract machine to allow for more than one letter to be read at a time, we must also change

79

Relaxing the Restriction on Inputs

the definition of acceptance. We shall say that a string is accepted by a machine if there is way it could be processed so as to arrive at a final state. There may also be ways in which this string does not get to a final state, but we ignore all failures. We are about to create machines in which any edge in the picture can be labeled by any string of alphabet letters, but first we must consider some additional consequences. We could now encounter the following problem:

some

On this machine, we can accept the word baab in two different ways. First, we could take ba from the start state to state l and then ab would take us to the final state. Or else we could read the three letters baa and go to state 2 from which the final letter, b, would take us to the final state. Previously, when we were dealing only with FAs, we had a unique path through the ma chine for every input string. Now some strings have no paths at all, while some have several. We now have observed many of the difficulties inherent in expanding our definition of "machine" to allow word-labeled edges (or, equivalently, to reading more than one letter of input at a time). We shall leave the definition of the finite automaton alone and call these new machines transition graphs because they are more easily understood when defined di rectly as graphs than as tables later turned into pictures. DEFINITION A transition graph, abbreviated

TG, is a collection of three things:

1. A finite set of states, at least one of which is designated as the start state ( - ) and some (maybe none) of which are designated as final states ( + ) .

An alphabet I of possible input letters from which input strings are fonned. 3. A finite set of transitions (edge labels) that show how to go from some states to some others, based on reading speci fied substrings of input letters (possibly even the null • string A ). 2.

When we give a pictorial representation of a transition graph, clause 3 in the definition means that every edge is labeled by some string or strings of letters, not necessarily only one letter. We are also not requiring that there be any specific number of edges emanating from any state. Some states may have no edge coming out of them at all , and some may have thousands (e.g., edges labeled a, aa, aaa, aaaa, . . . ). Transition graphs were invented by John Myhill in 1 for reasons revealed in the next chapter. A successful path through a transition graph is a series of edges fonning a path begin ning at some start state (there may be several) and ending at a final state. If we concatenate

80

CHAPTER 6

Transition Graphs

in order the string of letters that label each edge in the path, we produce a word that is ac cepted by this machine. For example, consider the following TG:

The path from state l to state 2 to state 3 back to state l then to state 4 corresponds to the string (abh)(A)(aa)(h). This is one way of factoring the word abbaab, which, we now see, is accepted by this machine. Some other words accepted are abba, abbaaabba, and b. When an edge is labeled with the string A, it means that we can take the ride it offers free (without consuming any letters from the input string). Remember that we do not ha\'e to follow that edge, but we can if we want to. If we are presented with a particular string of a 's and h's to run on a given TG, we must decide how to break the word into substrings that might correspond to the labels of edges in a path. If we run the input string abbab on the machine above, we see that from state l , where we must start, we can proceed along the outgoing edge labeled abb or the one labeled b. This word then moves along the edge from state l to state 2. The input letters abb are read and consumed. What is left of the input string is ab, and we are now in state 2. From state 2, we must move to state 3 along the A-edge. At state 3 , we cannot read aa, so we must read only a and go to state 4. Here, we have a b left in the input string but no edge to follow, so despite our best efforts we still must crash and reject the input string abbah. Because we have allowed some edges to be traversed for free, it is logical to allow for

the possibil ity of more than one start state. The reason we say that these two points are re lated is that we could always introduce more start states if we wanted to, simply by connect ing them to the original start state by edges labeled A. This point is illustrated by the follow ing example. There is no real difference between the TG

and

the TG

81

Looking at TGs

in the sense that all the strings accepted by the first are accepted by the second and vice versa. There are differences between the two machines such as the total number of states they have, but as language-acceptors they are equivalent. It is extremely important for us to notice that every FA is also a TG. This means that any picture that represents an FA can be interpreted as a picture of a TG. Of course, not every TG satisfies the definition of an FA.

'1f LOOKING AT TGs Let us consider some more examples of TGs.

0 The picture above represents a TG that accepts nothing, not even the null string A. To be able to accept anything, it must have a final state. The machine

accepts only the string A. Any other string cannot have a successful path to the final state through labels of edges because there are no edges (and hence no labels). Any TG in which some start state is also a final state will always accept the stri ng A; this is also true of FAs. There are some other TGs that accept the word A. For example,

a b ba

82

CHAPTER 6

Transition Graphs

This machine accepts only the words A, baa, and abba. Anything read while in the + state will cause a crash, because the + state has no outgoing edges. EXAMPLE

The following TGs also only accept A :

bb

� • EXAMPLE

Consider the following TG : (I .

h

We can read all the input letters one at a time and stay in the left-side state. When we read a b in the - state, there are two possible edges we can follow. If the very last letter is a b, we can use it to go to the + state. This b must be the very last letter, because once in the right-side state, if we try to read another letter, we crash. Notice that it is also possible to start with a word that does end with a b, but to follow an unsuccessful path that does not lead to acceptance. We could either make the mistake of fol lowing the nonloop b-edge too soon (on a nonfinal b), in which case we crash on the next letter, or else we might make the mistake of looping back to - when we read the last b, in which case we reject without crashing. But still, all words that end in b can be accepted by some path, and that is all that is required. The language accepted by this TG is all words ending in b. One regular expression for this language is (a + b)*b and an FA that accepts the same language is h

a

a

•

83

Looking at TGs

EXAMPLE

The following TG: a. h

h

a

accepts the language of all words that begin and end with different letters. This follows as a • logical extension of the reasoning for the previous example. EXAMPLE

The following TG: h

aa

aa

accepts the language of all words in which the a 's occur only in even clumps and that end in three or more h's. There is never an edge that reads a single a and it takes bhh at the end to • get to + . EXAMPLE

Consider the following TG: aa, b b

ab, ba

aa. bb

a b, ba

In this TG, every edge is labeled with a pair of letters. This means that for the string to be ac cepted, it must have an even number of letters that are read and processed in groups of two's.

84

CHAPTER 6

Transition Graphs

Let us call the left state the balanced state and the right state the unbalanced state. If the first pair of letters that we read from the input string is a double (aa or bb), then the machine stays in the balanced state. In the balanced state, the machine has read an even number of a's and an even number of b's. However, when a pair of unmatched letters is read (either ab or ba), the machine flips over to the unbalanced state, which signifies that it has read an odd number of a's and an odd number of b's. We do not return to the balanced state until another "corresponding" un matched pair is read (not necessarily the same unmatched pair but any unequal pair). The dis covery of two unequal pairs makes the total number of a 's and the total number of b 's read from the input string even again. This TG is an example of a machine that accepts exactly the familiar language EVEN-EVEN of all words with an even number of a's and an even number of b's. Of the three examples of definitions or descriptions of this language we have reviewed • (the regular expression, the FA, and the TG), this last is the most understandable. There is a practical problem with TGs. There are occasionally so many possible ways of grouping the letters of the input string that we must examine many possibilities before we know whether a given string is accepted or rejected. EXAMPLE

Consider this TG: h

a

Is the word abbbabbbabba accepted by this machine? (Yes, in three ways.)

•

When we allow A-edges, we may have an infinite number of ways of grouping the let ters of an input string. For example, the input string ab may be factored as (a)

(b)

(A) (b) (a) (A) (A) (b) (a ) (A) (A) (A) (b) (a)

85

Looking at TGs

Instead of presenting a definite algorithm right now for determining whether a partic ular string is accepted by a particular TG, we shall wait until Chapter 1 1 when the task will be easier. There are, of course, difficult algorithms for performing this task that are within our abilities to analyze at this moment. One such algorithm is outlined in Problem 20 on page 9 1 . The existence of A-edges also allows for a new and completely unsettling set of possi bilities - it allows infinite things to happen in seemingly finite situations. Consider the following TG:

a

Obviously, the only word accepted by this machine is the single word aa , but it can be accepted by infinitely many different paths. It is even possible to conceive that this ma chine accepts the word aa through paths of infinite length by looping infinitely many times before moving to the next state. But by our understanding, "paths" of necessity mean only "finite paths." A-loop-edges can make life difficult, and just as obviously their util ity is nil. If we take any TG with A-loops and trim away these loops, the resultant pic ture is still a TG and accepts the same set of input strings. Why did we ever allow A-loops in the first place? One answer is so that we leave our definition as simple and universal-sounding as possible ("any edges, anywhere, with any labels") and another is that A-loops are not the only way of getting an infinite path out of a finite input string. Behold the A-circuit:

a

a

It is obvious how to eliminate this particular A-circuit, but with the machine a, A a

a

b,

A

if any A option is erased, the resultant language is changed. Yet, another reason for not adding extra clauses to the definition of the TG to avoid this problem is that A-edges, as we shall see in Chapter 7, are never necessary at all , in the sense that any language that can be accepted by a TG with A-edges can be accepted by some dif ferent TG without A-edges.

86

CHAPTER 6

Transition Graphs

{f GENERALIZED TRANSITION GRAPHS The ultimate step liberating state-to-state transitions is to allow the input to progress from one place to another by contributing a substring restricted to being a word in a predeter mined language. For example,

We can travel from start to state 2 by reading any (of course finite) word from the (possibly infinite) set of choices l1 and, similarly, between all other states. For the moment, we will not be so arbitrary as to allow just any language to be used as labels, not only those languages defined by regular expressions. This gives us a new concept of a transition graph. DEFINITION A

generalized transition graph (GTG) is a collection of three things:

1.

A finite set of states, of which at least one is a start state and some (maybe none) are fi nal states. 2. An alphabet I of input letters. 3. Directed edges connecting some pairs of states, each labeled with a regular expres sion. • EXAMPLE a*

a*

(b + A}

This machine accepts all strings without a double b. Notice that the word b takes a A-edge from start to middle. • In a very real sense, there is no difference between the Kleene star closure for regular expressions and a loop in our previous transition graphs, or FAs for that matter. Compare

87

Nondeterminism a, b

b a, b

and

(a + b ) * (a + b)

b*

a

In the first picture, we may loop in the middle state as many times as we want or go straight to the third state. To not loop corresponds to taking the A choice from the b* in the second example.

1} NONDETERMINISM Generalized transition graphs force us to face a deep but subtle and disturbing fact that sneaked past us quietly with TGs. Just as the * and the + in a regular expression represent a potential multiplicity of choices, so does the possible multiplicity of paths to be selected from a TG. In the GTG, the choices are both static and dynamic. We often have a range of choices of edges, each labeled with an infinite language of alternatives. The number of ways of going from state 1 to state 4 might be infinite. A blatant example of the inherent need for choice is offered in the fragment of the TG shown below:

If we tried to forbid people from writing this directly, they could sti ll sneak it into their TGs in other ways:

88

CHAPTER 6

Transition Graphs b

Even if we restrict labels to strings of only one letter or A, we may indirectly permit these two equivalent situations:

•

•

•

equals

b

We have already seen that in a TG a particular string of input letters may trace through the machine on different paths, depending on our choice of grouping. For instance, abb can go from state 3 to 4 or 5 in the middle of the three preceding examples, depending on whether we read the letters two and one or all three at once. The ultimate path through the machine is not determined by the input alone. Therefore, we say this machine is nondeter ministic. Human choice becomes a factor in selecting the path; the machine does not make all its own determinations.

y

PROBLEMS 1.

For each of the five FAs pictured in Problems 1 7, I 9, and 20 in Chapter 5, build a transi tion graph that accepts the same language but has fewer states.

2.

For each of the next I 0 words, decide which of the six machines on the next page accept the given word. (i) A (ii) a (iii) b (iv) aa (v) ab (vi) aba (vii) abba

(viii) bab (ix) baab (x) abbb

89

Problems TG, h

h

h TG ,

(/

(/

TG , h

I>

h

TG , flh, h(I

(/(/ , hh

(/ h. h(I

(/(/ , hh

3.

Show that any language that can be accepted by a TG can be accepted by a TG with an even number of states.

4.

How many different TGs are there over the alphabet l a

b } that have two states?

5. Prove that for every TG there is another TG that accepts the same language but has only

one + state. 6.

B uild a TG that accepts the language L 1 of all words that begin and end with the same double letter, either of the form aa . . . aa or bb . . . bb. Note : aaa and bbb are not words in this language.

7.

If OURSPONSOR is a language that is accepted by a TG called Henry, prove that there is a TG that accepts the language of all strings of a's and b's that end in a word from OURSPONSOR.

8.

(i) Suppose that L is a finite language whose words are that there is a TG that accepts exactly the language L.

w 1 , w 2 , w3 ,

•

•

•

, w83 •

Prove

90

CHAPTER 6

Transition Graphs

(ii) Of all TGs that accept exactly the language L, what is the one with the fewest num ber of states? 9.

Given a TG, called TG 1 , that accepts the language L 1 and a TG, called TG 2 , that accepts the language L2 , show how to build a new TG (called TG3 ) that accepts exactly the lan guage L 1 + L2 • Given TG 1 and TG 2 as described in Problem 9, show how to build TG 4 that accepts ex actly the language L 1 L2 • 1 1 . Given a TG for some arbitrary language L, what language would it accept if every + state were to be connected back to every - state by A-edges? For example, by this method, ha

ha

becomes

.\

Hint: Why is the answer not always L*?

(i) Let the language L be accepted by the transition graph T and let L not contain the word A. Show how to build a new TG that accepts exactly all the words in L and the word A. (ii) Given TG 1 that accepts the language L " show how to build a TG that accepts the language L*. (Hint: Use Problems 1 1 and 1 2(i) and sound authoritative.)

Using the results of Problems 8, 9, 1 0, and 12 in an organized fashion, prove that if L is any language that can be defined by a regular expression, then there is a TG that accepts exactly the language L * .

Verify that there are indeed three and only three ways for the TG on p. 84 to accept the word abbbabbbabba.

An FA with four states was sitting unguarded one night when vandals came and stole an edge labeled a. What resulted was a TG that accepted exactly the language b * . In the morning the FA was repaired, but the next night vandals stole an edge labeled h and what resulted was a TG that accepted a*. The FA was again repaired, but this time the vandals stole two edges, one labeled a and one labeled b, and the resultant TG accepted the language a* + b*. What was the original FA?

91

Problems

Let the language l be accepted by the transition graph T and let l not contain the word ba. We want to build a new TG that accepts exactly l and the word ba. (i) One suggestion is to draw an edge from - to + and label it ba. Show that this does not always work. (ii) Another suggestion is to draw a new + state and draw an edge from any - state to it labeled ba. Show that this does not always work. (iii) What does work?

Let l be any language. Let us define the transpose of l to be the language of exactly those words that are the words in L spelled backward. If w El, then reverse(w) El. For example, if l then

=

Ia

transpose(l)

abb

=

Ia

bbaab

bba

bbbaa I

baabb

aabbb I

(i) Prove that if there is an FA that accepts L, then there is a TG that accepts the trans pose of l. (ii) Prove that if there is a TG that accepts l, then there is a TG that accepts the trans pose of l. Note: It is true, but much harder to prove, that if an FA accepts l, then some FA ac cepts the transpose of l. However, after Chapter 7 this will be trivial to prove. (iii) Prove that transpose(l 1 l2 ) transpose(l2 ) · transpose(l 1 ). Transition graph T accepts language L. Show that if L has a word of odd length, then T has an edge with a label with an odd number of letters.

=

A student walks into a classroom and sees on the blackboard a diagram of a TG with two states that accepts only the word A. The student reverses the direction of exactly one edge, leaving all other edges and all labels and all + 's and - 's the same. But now the new TG accepts the language a* . What was the original machine?

Let us now consider an algorithm for determining whether a specific TG that has no A-edges accepts a given word: Step l Step 2 Step 3 Step 4

Step 5

Number each edge in the TG in any order with the integers l , 2, 3, . . . , x, where x is the number of edges in the TG. Observe that if the word has y letters and is accepted at all by this machine, it can be accepted by tracing a path of not more than y edges. List all strings of y or fewer integers, each of which :5 x. This is a finite list. Check each string on the list in step 3 by concatenating the labels of the edges involved to see whether they make a path from a - to a + corresponding to the given word. If there is a string in step 4 that works, the word is accepted. If none work, the word is not in the language of the machine.

(i) Prove that this algorithm does the job. (ii) Why is it necessary to assume that the TG has no A-edges.

CHAPTER 7

Kleene 's Theorem

1f UNIFICATION In the last three chapters, we introduced three separate ways of defining a language: genera tion by regular expression, acceptance by finite automaton, and acceptance by transition graph. In this chapter, we will present a theorem proved by Kleene in 1 , which (in our version) says that if a language can be defined by any one of these three ways, then it can also be defined by the other two. One way of stating this is to say that all three of these methods of defining languages are equivalent. THEOREM 6

Any language that can be defined by regular expression, or finite automaton, or transition graph can be defined by all three methods. This theorem is the most important and fundamental result in the theory of finite au tomata. We are going to take extreme care with its proof. In the process, we shall introduce four algorithms that have the practical value of enabling us actually to construct the corre sponding machines and expressions. More than that, the importance of this chapter lies in its value as an illustration of thorough theoretical thinking in this field. The logic of this proof is a bit involved. If we were trying to prove the mathematical theorem that the set of all ZAPS (whatever they are) is the same as the set of all ZEPS, we could break the proof into two parts. In Part l , we would show that all ZAPS are also ZEPS. In Part 2, we would show that all ZEPS are also ZAPS. Together, this would demonstrate the equivalence of the two sets. Here, we have a more ambitious theorem. We wish to show that the set of ZAPS, the set of ZEPS, and the set of ZIPS are all the same. To do this, we need three parts. In Part I , we shall show that all ZAPS are ZEPS. In Part 2, we shall show that all ZEPS are ZIPS. Finally, in Part 3 , we shall show that all ZIPS are ZAPS. Taken together, these three parts will estab lish the equivalence of the three sets: 92

93

Turning TGs into Regular Expressions

[ZAPS

c

ZEPS c ZIPS

c

ZAPS] - [ZAPS = ZEPS = ZIPS]

PROOF

The three sections of our proof will be: Part I

Every language that can be defined by a finite automaton can also be defined by a transition graph. Part 2 Every language that can be defined by a transition graph can also be defined by a regular expression. Part 3 Every language that can be defined by a regular expression can also be defined by a finite automaton. When we have proven these three parts, we have finished our theorem.

Proof of Part 1 This is the easiest part. Every finite automaton is itself already a transition graph. Therefore, any language that has been defined by a finite automaton has already been defined by a tran sition graph. Done.

r-Qr TURNING TGs INTO REGULAR EXPRESSIONS Proof of Part 2 The proof of this part will be by constructive algorithm. This means that we present a pro cedure that starts out with a transition graph and ends up with a regular expression that de fines the same language. To be acceptable as a method of proof, any algorithm must satisfy two criteria. It must work for every conceivable TG, and it must guarantee to finish its job in a finite time (a finite number of steps). For the purposes of theorem-proving alone, it does not have to be a good algorithm (quick, least storage used, etc.). It just has to work in every case. Let us start by considering an abstract transition graph T. T may have many start states. We first want to simplify T so that it has only one start state that has no incoming edges. We do this by introducing a new state that we label with a minus sign and that we connect to all the previous start states by edges labeled with A . Then we drop the minus signs from the previous start states. Now all inputs must begin at the new unique start state. From there, they can proceed free of charge to any of the old start states. If the word w used to be ac cepted by starting at previous start state 3 and proceeding through the machine to a final state, it can now be accepted by starting at the new unique start state and progressing to the old start state 3 along the edge labeled A. This trip does not use up any of the input letters. The word then picks up its old path and becomes accepted. This process is illustrated below on a fragment of a TG that has three start states: I , 3, and 5 :

� ah

. . .

94

CHAPTER

7

Kleene's Theorem

This becomes

. . .

The ellipses i n the pictures above indicate other sections of the TG that are irrelevant because they contain no start states. Another simplification we can make in T is that it can be modified to have a unique un exitable final state without changing the language it accepts. If T had no final states to begin with, then it accepts no strings at all and has no language and we need produce no regular expression other than the null, or empty, expression cf> (see p. 36). If T has several final states, let us un-final them and instead introduce a new unique final state labeled with a plus sign. We draw new edges from all the former final states to the new one, dropping the old plus signs, and labeling each new edge with the null string A. When an input string runs out of letters and it is in an old final state, it can now take a free A-edge ride to the new unique final state. This process is depicted below:

. . .

becomes

. . .

h

The new final state has no outgoing edges.

95

Turning TGs into Regular Expressions

We shall require that the unique final state be a different state from the unique start state. If an old state used to have ± , then both signs are removed from it to newly created states. It should be clear that the addition of these two new states does not affect the language that T accepts. Any word accepted by the old T is also accepted by the new T, and any word rejected by the old T is also rejected by the new T. Furthermore, the machine now has the following shape:

where there are no other - or + states. If the TG was already in this shape, this step could have been skipped but, even then, executing it could not have hurt either. We are now going to build piece by piece the regular expression that defines the same language as T. To do so, we will change T into a GTG. Let us suppose that T has some state (called state x) inside it (not the - or + state) that has more than one loop circling back to itself:

where r " r2, and r 3 are all regular expressions or simple strings. In this case, we can replace the three loops by one loop labeled with a regular expression: r1

r2

r3

(] +

+

The meaning here is that from state x we can read any one string from the input that fits the regular expression r 1 + r2 + r3 and return to the same state. Similarly, suppose two states are connected by more than one edge going in the same direction: r1

. .. a==B

where the labels r 1 and r2 are each regular expressions or simple strings. We can replace this with a single edge that is labeled with a regular expression:

�

. ..

96

CHAPTER 7

Kleene's Theorem

We can now define the bypass and state elimination operation. In some cases, if we have three states in a row connected by edges labeled with regular expressions (or simple strings), we can eliminate the middleman and go directly from one outer state to the other by a new edge labeled with a regular expression that is the concatenation of the two previous labels. For example, if we have 3

2

we can replace this with

. . .

�

. . .

We say "replace" because we no longer need to keep the old edges from state 1 to state 2 and state 2 to state 3 unless they are used in paths other than the ones from state l to state 3 . The elimination of edges and states is our goal. We can do this trick only as long as state 2 does not have a loop going back to itself. If state 2 does have a loop, we must use this model:

becomes

. . .

�

. . .

We have had to introduce the * because once we are at state 2, we can loop the loop edge as many times as we want, or no times at all, before proceeding to state 3. Any string that fits the description r 1 rz*r3 corresponds to a path from state 1 to state 3 in either picture. If state I is connected to state 2 and state 2 is connected to more than one other state (say, to states 3, 4, and 5), then when we eliminate the edge from state I to state 2, we have to add edges that show how to go from state I to states 3, 4, and 5. We do this as in the fol lowing pictures:

. . .

becomes

Turning TGs into Regular Expressions

We see that in this way we can eliminate the edge from state I to state

97

2, bypassing state

2 altogether. In fact, every state that leads into state 2 can be made to bypass state 2. If state 9 leads into state 2, we can eliminate the edge from state 9 to state 2 by adding edges from state 9 to states 3, 4, and 5 directly. We can repeat this process until nothing leads into state 2. When this happens, we can eliminate state 2 entirely, because it then cannot be in a path that ac cepts a word. We drop the whole state, and the edges leading from it, from the picture for T. What have we done to transition graph T? Without changing the set of words that it ac cepts, we have eliminated one of its states. We can repeat this process again and again until we have eliminated all the states from T except for the unique start state and the unique final state. (We shall illustrate this presently.) What we come down to is a picture that looks like this:

with each edge labeled by a regular expression. We can then combine this once more to pro duce

�

�

The resultant regular expression is then the regular expression that defines the same lan guage T did originally. For example, if we have

98

CHAPTER 7

Kleene's Theorem

we can bypass state 2 by introducing a path from state I to state 4 labeled aba * ba, a path from state I to state 5 labeled aba * b, a path from state 3 to state 4 labeled bbba * ba, and a path from state 3 to state 5 labeled bbba * b. We can then erase the edges from state I to state 2 and from state 3 to state 2. Without these edges, state 2 becomes unreachable. The edges from state 2 to states 4 and 5 are then unless because they cannot be part of any path from - to + . Dropping this state and these edges will not affect whether any word is ac cepted by this TG. The machine that results from this operation is

If there had previously been any edges from state I to state 5, we leave these alone. If we wish to eliminate a given state, say, state 2, we must first list all the edges going into that state from other states (say, from states 7 and 9) and also make a list of all the states that could be reached from state 2 by an edge (say, states 1 1 , 4, and 5). If state 2 were to dis appear, it would interrupt all the paths input strings could have taken that pass through it on their way to +. We do not wish to destroy any possible paths input strings might take be cause that could change the language by killing some input string 's only path to acceptance, which would eliminate it from the language of the machine. It is too hard for us to check whether all the accepted input strings have some alternate paths to acceptance that do not go through state 2, so we make a careful point of replacing all destroyed routes with equivalent detours. It is our requirement to be sure that whatever change we make in the machine, all the strings that could have previously been accepted can still be accepted by the modified ma chine. In order to safely eliminate state 2 without disturbing any routes from - to + , we must install bypass roads going from each incoming state to every outgoing state and be sure that the labels of the bypass road correspond to the trips obliterated. In this hypothetical example, we must replace routes from state 7 to states 1 1 , 4 , and 5 and from state 9 to states I I , 4, and 5 . When we draw these new edges, we must label them with the appropriate tolls that are the charges of going into state 2, around state 2, and from state 2. If the machine segment we are analyzing started by looking like:

Turning TGs into Regular Expressions

99

it must become

Before we claim to have finished describing thi s algorithm, there are some special cases that we must examine more carefully. In the picture

we might want to eliminate state 2. This is an illustration of the possibility that one of the source states to the prospective bypassed state is also a destination state from that state. This case is really not different from the general situation described above. We still need to replace all the paths through the machine that previously went through state 2. The incom ing states are 1 and 3 and the outgoing state is only I . Therefore, we must add edges con necting state 3 to state I and state I to state I . The edge we shall add to connect state I to it self is a loop that summarizes and replaces the trip from 1 to 2 to I . The machine then becomes

Originally, it was possible to take a path from state 3 to state 2 to state I to state 2 and back to state I again at the cost of r4r /r 3r 1 r2 *r 3• This path is still represented in the reduced machine. It is reflected in the 3- 1 edge r4r 2 *r 3 followed by the loop at state I , r 1 r2 *r 3• There fore, no real problem arises even when the sets of incoming states and the set of outgoing states have some overlap.

CHAPTER 7

Kleene's Theorem

Even the complicated

rg

is algorithmically reduced to this equivalent form:

The path 1 1 - 1 1 in the original picture could be thought of as a looping twice at I , followed by a trip to 3, followed by a trip to 1 , then back to 3 and a loop at 3. All these edges traveled are still represented in the modified machine . Whenever we remove an edge or a state, we must be sure that we have not destroyed any paths through T that may previously have existed or create new paths that did not exist before. We now have a well-described method of producing regular expressions equivalent to given transition graphs. All words accepted by T are paths through the picture of T. If we change the picture but preserve all paths and their labels, we must keep the language unchanged. This algorithm terminates in a finite number of steps, because T has only finitely many states to being with, and one state is eliminated with each iteration of the bypass procedure. The other important observation is that the method works on all transition graphs. Therefore, this algorithm provides a satisfactory proof that there is a regular expression for each transi tion graph. Before detailing the steps of this procedure, let us illustrate the algorithm on a particular example.

EXAMPLE

(Inside the proof)

The TG we shall consider is the one below, which accepts all words that begin and end with double letters (having at least length 4). This is by no means the only TG that accepts this language :

Turning TGs into Regular Expressions

As it stands, this machine has only one start state with no incoming edges, but it has two fi nal states, so we must introduce a new unique final state following the method prescribed by the algorithm:

The next modification we perform is to note that the edge from the start state to state 1 is a double edge - we can travel over it by an aa or a bb. We replace this by the regular ex pression aa + bb. We also note that there is a double loop at state I . We can loop back to state I on a single a or on a single b. The algorithm says we are supposed to replace this double loop by a single loop labeled with the regular expression a + b . The picture of the machine has now become

The algorithm does not actually tell us which state of the TG we must bypass next. The order of elimination is left up to our own discretion. The algorithm (when we formally state

1 02

CHAPTER 7

Kleene's Theorem

it) implies that it really does not matter. As long as we continue to eliminate states, we shall be simplifying the machine down to a single regular expression representation. Let us choose state 2 for elimination. The only path we are now concerned with is .\

The algorithm says we can replace this with one edge from state 1 to state + that bears the label that is the concatenation of the regular expressions on the two parts of the path. In this case, aa is concatenated with A, which is only aa again. Once we have eliminated the edge from state 1 , we can eliminate state 2 entirely. The machine now looks like this:

It seems reasonable now for us to choose to eliminate state 3 next. But the algorithm does not require us to be reasonable, and because this is an illustrative example and we have al ready seen something like this path, we shall choose a different section of T to modify. The technique described above does not require us to choose the order of eliminating states in a logical, efficient, intelligent, or aesthetic manner. All these considerations are completely inappropriate to the consideration of what is an algorithm. An algorithm must be so clearly stated that it works successfully no matter how little forethought, experience, clev erness, or artistic sensibility the applier of the procedure possesses. The algorithm must be able to be completely and successfully executed by a dimwit, a half-wit, or even a no-wit such as a computer. To execute an algorithm, all we are allowed to presume on the part of the executing agent is tireless diligence and immaculate precision. If we could presume that gifted insight on the part of the executor was routinely avail able, the algorithm would be much simpler: Step

I

Look at the machine, figure out its language, and write down an equivalent reg ular expression.

Unfortunately, people are not as reliably creative as they are rel iable drones, and the whole purpose of an algorithm is so that we can get some jobs done on a daily basis without wait ing for DaVinci to be in the suitable mood. All the requisite cleverness must be incorporated into the algorithm itself by the creator of the algorithm. If we want the algorithm to be efficient, we must design one that will force the drone to tum out efficient products. If we want the output to be aesthetic, we must build that in, too. Computer science courses that are concerned with how good an algorithm is are fundamen tally different from this course. We are primarily concerned with whether an algorithm to ac complish a certain task exists or not- we are never in search of the "best" one by any stan dards of what it means to be best. That said, we shall, however, occasionally present more than one algorithm for accomplishing a certain task, but the reason for this will always be that each of the algorithms we develop can be generalized to other tasks in different ways.

Turning TGs into Regular Expressions

As such, they are each the seed of different classes of procedures and each deserves individ ual attention. Let us continue with the example of the TG we are in the process of reducing to a regu lar expression. Let us stubbornly insist on bypassing state 1 before eliminating state 3 . Only one edge comes into state 1 and that i s from state - . There is a loop a t state l with the label (a + b ) State l has edges coming out of it that lead to state 3 and state + . The algorithm explains that we can eliminate state 1 and replace these edges with an edge from state - to state 3 labeled (aa + bb)(a + b) * (bb) and an edge from state - to state + labeled (aa + bb)(a + b) * (aa). After we eliminate state 1 , the machine looks l ike this: .

It is obvious that we must now eliminate state 3, because that is the only bypassable state left. When we concatenate the regular expression from state - to state 3 with the regu lar expression from state 3 to state +, we are left with the machine

(aa + bb) (a + b) *bb

Now by the last rule of the algorithm, this machine defines the same language as the regular expression (aa

+ bb)(a + b)*(aa) + (aa + bb)(a + b)*(bb)

It is entirely conceivable that if we eliminated the states in a different order, we could end up with a different-looking regular expression. But by the logic of the elimination process, these expressions would all have to represent the same language. If we had to make up a regular expression for the language of all strings that begin and end with double letters, we would probably have written (aa + bb)(a + b) * (aa + bb) which is equivalent to the regular expression that the algorithm produced because the alge • braic distributive law applies to regular expressions. Without going through lengthy descriptions, let us watch the algorithm work on one more example. Let us start with the TG that accepts strings with an even number of a 's and an even number of b's, the language EVEN-EVEN. (We keep harping on these strings not because they are so terribly important, but because it is the hardest example we thoroughly understand to date, and rather than introduce new hard examples, we keep it as an old con quest. )

CHAPTER 7

Kleene's Theorem a a . hh

a 11 . hh

a h . ha

a h . ha

becomes first

When we eliminate state 2, the path from 1 to 2 to 1 becomes a loop at state 1 : aa + bb A

(ab + ba) (aa + bb) * (ab + ba)

which becomes (aa + bb) + (ab + ba) (aa + bb) * (ab + ba) A

which becomes [(aa + bb) + (ab + ba) (aa + bb) * (ab + ba) ] *

which reduces to the regular expression [(aa + bb) + (ab + ba)(aa + bb) * (ab + ba)] * which is exactly the regular expression we used to define this language before. Anyone who was wondering how we could have thought up that complicated regular expression we pre sented in Chapter 4 can see now that it came from the obvious TG for this language by way of our algorithm. We still have one part of Kleene's theorem yet to prove. We must show that for each regular expression we can build a finite automaton that accepts the same language. We have s o far tacitly maintained that we can consider the state being bypassed without regard to any extra compl ications in the rest of the TG. Is this really so? It is often hard to tell whether we have accounted for all the exceptional situations that might arise. Remem-

1 05

Turning TGs into Regular Expressions

ber, it is not a complete algorithm if it breaks down in any case no matter how remote or freakish an occurrence. How can we tell when we have covered all possibilities? Who knows? There is no algorithm to tell whether the algorithm we have proposed has omitted an important case -but here is a surprise - this very statement about the limitations of analyz ing algorithms by other algorithms will be proven later on in this book. Let us consider a complicated, most general-looking case and see whether our simple rules work on it without the introduction of any new difficulties. Consider the TG fragment below:

rg

Our state targeted for bypass is state 2. Proceeding in an orderly fashion, we list all the states connected to state 2 by incoming and outgoing edges. The incoming edges are from states 1 and 3, the outgoing are to states 3, 4, and 5. Because each previously possible path must still exist, we need to introduce six new edges (including the loop at 3): From

To

Labeled

3

r 1 r2 *r6

4

r 1 r2 *r5

1

5

3

3

r 1 r2 *r7 r3 r2 *r6

3

4

r 3 r2 *rs

3

5

r3 r2 * r1

Because there is already a loop at state 3, we can add this regular expression to the existing one and the resultant picture is this:

CHAPTER 7

Kleene's Theorem

State 2 has disappeared but all paths that used to travel through it remain possible and, equally i mportant, no new paths are possible in this new TG that were not possible for the same cost of input letters in the original TG. For example, the old trip through states 1 1 can stil l be made. It now, however, travels through the state sequence 1 1 whose concatenation of regular expressions is exactly the same as before. ALGORITHM

Now that we already have a fairly good idea of what the state-elimination algorithm is all about, we are ready to present a semiformal statement of the general rules defining the con structive algorithm that proves that all TGs can be turned into regular expressions that define the exact same language: Step l Create a unique, unenterable minus state and a unique, unleaveable plus state. Step 2 One by one, in any order, bypass and eliminate all the non - or + states in the TG. A state is bypassed by connecting each incoming edge with each outgoing edge. The label of each resultant edge is the concatenation of the label on the incoming edge with the label on the loop edge if there is one and the label on the outgoing edge. Step 3 When two states are joined by more than one edge going in the same direction, unify them by adding their labels. Step 4 Finally, when all that is left is one edge from - to + , the label on that edge is a regular expression that generates the same language as was recognized by the • original machine. We have waffled about calling this representation a "semiformal" description of the pro cedure. The addition of phrases (or symbols) that say things like "for all states qx that enter state qy by a single directed edge (qx, qy ) labeled r(x, y), and for all states q= such that (q.r, q, ) is a single directed edge labeled r(y, z), create the directed edge (q,, q, ) and label it [r(x, y) r(y, y)*r(y, z)], where r(y, y) is the regular expression labeling the possible loop at state q, while deleting the state qy and all its associated edges," and so on, would please some peop ie ..

more, but would not help anyone go from a state of not understanding the algorithm to a state of understanding it. There is one logical possibility that we have not accounted for in the description of the algorithm given above; that is, when we finish step 3, there may be no path left at all that connects - to + . In this case, we say that the original machine accepted no words, which means that it accepted only the null language whose regular expression has no symbols. We shall consider the logical consequences of this possibility in a later chapter; at the mo ment, all it means is that completing the algorithm guarantees producing a regular expres sion for all machines that accept a language and no expression for those that do not. EXAMPLE

Consider the TG

Turning TGs into Regular Expressions b

a

Eliminating the states in the order l , 2, 3 gives this procession of TGs: b

then ab*a

a + bb*a

then

o

ab*a + Cb + ab*a][a + bb*al*CA + bb*al

)o

8

Eliminating the states in the order 3, 2, l gives this procession of TGs:

CHAPTER 7

Kleene's Theorem b + aa*b

ha*

then [a + ba*b][b + aa*bJ *[a + aa* J ha*

then

0 -

o

ba* + [a + ba*b][b + aa*bJ*[a + aa*J �-»

+

If we had not seen how they were derived, we might have no clue as to whether these two • regular expressions define the same language.

1} CONVERTING REGULAR EXPRESSIONS INTO FAs Proof of Part 3 The proof of this part will be by recursive definition and constructive algorithm at the same time. This is the hardest part of our whole theorem, so we shall go very slowly. We know that every regular expression can be built up from the letters of the alphabet I and A by repeated application of certain rules: addition, concatenation, and closure. We shall see that as we are building up a regular expression, we could at the same time be building up an FA that accepts the same language. We present our algorithm recursively. Rule 1

There is an FA that accepts any particular letter of the alphabet. There is an FA that accepts only the word A.

Proof of Rule 1 If x is in I, then the FA all � all � except x

a ll �

Converting Regular Expressions into FAs

accepts only the word x. One FA that accepts only A is all r. all r.

• It would be easier to design these machines as TGs, but it is important to keep them as FAs. If there is an FA called FA , that accepts the language defined by the regular ex pression r1 and there is an FA called FA2 that accepts the language defined by the regular expression r2, then there is an FA that we shall call FA3 that accepts the language defined by the regular expression (r 1 + r2).

Rule 2

Proof of Rule 2 We are going to prove Rule 2 by showing how to construct the new machine in the most rea sonable way from the two old machines. We shall prove FA3 exists by showing how to con struct it. Before we state the general principles, let us demonstrate them in a specific example. Suppose we have the machine FA , pictured below, which accepts the language of all words over the alphabet I, = ( a b } that have a double a somewhere in them ,,

a

" · ,,

a

x2 X3 X3

-x , X2 + x3

a

h

x,

x,

X3

,,

and the familiar machine FA2, which accepts all words that have both an even number of to tal a's and an even number of total b's (EVEN-EVEN) b

±y , Y2 Y3 Y4

b

a

a

a

a

a

h

Y3 Y4 Y1 Y2

Y2 Y1 Y4 Y_�

b

b

We shall show how to design a machine that accepts both sets. That is, we shall build a ma chine that accepts all words that either have an aa or are in EVEN-EVEN and rejects all strings with neither characteristic. The language the new machine accepts will be the union of these two languages. We shall call the states in this new machine z " z2, z 3 , and so on, for as many as we need. We shall define this machine by its transition table.

1 10

CHAPTER 7

Kleene's Theorem

Our guiding principle is this: The new machine will simultaneously keep track of where the input would be if it were running on FA 1 alone and where the input would be if it were running on FA2 alone. First of all, we need a start state. This state must combine x " the start state for FA " and y " the start state for FA2• We call it z 1 • If the string were running on FA 1 , it would start in x 1 and if on FA2 in y 1 • All z-states in the FA3 machine carry with them a double meaning - they keep track of which x state the string would be in and which y state the string would be in. It is not as if we are uncertain about which machine the input string is running on - it is running on both FA 1 and FA2, and we are keeping track of both games simultaneously. What new states can occur if the input letter a is read? If the string were being run on the first machine, it would put the machine into state x2• If the string were running on the second machine, it would put the machine into state y3 ' Therefore, on our new machine an a puts us into state z2, which means either x2 or y3 , in the same way that z 1 means either x 1 or y 1 • Because y 1 is a final state for FA2, z 1 is also a final state in the sense that any word whose path ends there on the z-machine would be accepted by FA2• ± z 1 = x 1 or y1 Z z = X2 or Y3 On the machine FA3, we are following both the path the input would make on FA 1 and the in put's path on FA2 at the same time. By keeping track of both paths, we know when the input string ends, whether or not it has reached a final state on either machine. Let us not consider this "x or y" disjunction as a matter of uncertainty. We know for a fact that the same input is running on both machines; we might equivalently say "x and y. " We may not know whether a certain person weighed I 00 or lb to start with, but we are certain that after gaining 20 lb, then losing 5 , and then gaining l , his total weight is now ex actly either 1 1 6 or 2 1 6 lb. So, even if we do not know in which initial state the string started, we can still be certain that given a known sequence of transformations, it is now definitely in either one of two possible conditions. If we are in state z1 and we read the letter b, then being in x 1 on FA 1 and reading a b, we return to x " whereas being in y 1 on FA2 and reading a b send us to y2 • z3 = x 1 or y2 The beginning of our transition table for FA3 is a

, ·2

b

•3

Suppose that somehow we have gotten into state z2 and then we read an a. If we were in FA 1 , we would now go to state x3, which is a final state. If we were in FA2, we would now go back to y " which is also a final state. We will call this condition z4, meaning either x3 or y 1 • Be cause this string could now be accepted on one of these two machines, :4 is a final state for FA3• As it turns out, in this example the word is accepted by both machines at once, but this is not necessary. Acceptance by either machine FA 1 or FA2 is enough for acceptance by FA3• Membership in either language is enough to guarantee membership in the union. If we are in state z 2 and we happen to read a b, then in FA 1 we are back to x" whereas in FA2 we are in y4 • Call this new condition z5 = state x1 or y4 • + z4 = x3 or Y i z5 = x 1 or y4

Converting Regular Expressions into FAs

At this point, our transition table looks l ike this: a

b

What happens if we start from state z3 and read an a ? I f w e were i n FA 1 , we are now in xi ; if in FA i , we are now in y4 . This is a new state in the sense that we have not encountered this combination of x and y before; call it state z6 . z6 = Xi or Y4 What if we are in z3 and we read a b? In FA 1 we stay in x 1 , whereas in FA i we return to y 1 • This means that if we are in z3 and we read a b, we return to state z 1 • This is the first time that we have not had to create a new state. If we never got any use out of the old states, the machine would grow ad infinitum. Our transition table now looks like this: a ±: z 1

22 Z3

Z2 Z4 z6

b

Z3 Z5 Z1

What if we are in z4 and we read an a? If we are tracing FA 1 , the input remains in Xv whereas if we are tracing the input on FA i , it goes to h This is a new state; call it zr If we are in z4 and we read a b, the FA 1 part stays at x3 , whereas the FA i part goes to Yi - This is also a new state; call it z8 • + z1 = X3 or Y3 + zs = X3 or Yi Both of these are final states because a string ending here on the z-machine will be accepted by FA 1 , because x3 is a final state for FA 1 • If we are in Zs and we read an a, we go to Xi or Yi · which we shall call z9• If we are in Zs and we read a b, we go to x 1 or y3 , which we shall call z i w

Z9 = Xi or Yi z 1 o = x 1 or Y3 If we are in z6 and we read an a, we go to x3 or Yi · which is our old zw If we are in z6 and we read a b, we go to x 1 or y3 , which is z 1 0 again. If we are in z7 and we read an a, we go to x3 or y " which is z4 again. If we are in z7 and we read a b, we go to x3 or y4 , which is a new state, : 1 1 • + z 1 1 = X3 or J4 If we are If we are If we are If we are If we are

in z8 and we read an a, we go to x3 or y4 = z 1 , . in z8 and we read a b, we go to x3 or y 1 = z4 • in z9 and we read an a, we go to x3 or y4 = z 1 1 • in z9 and we read a b, we go to x 1 or y 1 = z 1 • in z 1 0 and we read an a, we go to Xi or y 1 , which is our last new state, : 1 2• + z 1 i = Xi or Y i If we are in z 1 0 and we read a b, we go to x 1 or y4 = z5 •

1 12

CHAPTER 7

Kleene's Theorem

If we are in z1 1 and we read an a, we go to x3 or y2 = z8• If we are in z1 1 and we read a b, we go to x3 or y3 = zr If we are in z 12 and we read an a, we go to x3 or y3 = zr If we are in z 1 2 and we read a b, we go to x1 or y2 = zy Our machine is now complete. The full transition table is

:t z , z2 Z3 + z4 Zs z6 +z 1 + zs Z9 Z IO +z 11 +z, 2

a

b

Z2 Z4 z6 Z7 Z9 Zs Z4 Z1 1 Z1 1 Z12 Zs Z7

Z3 Zs z, Zs Z IO z ,o zl l Z4 z, Zs �

-7 Z3

Here is what FA 3 may look like:

If a string traces through this machine and ends up at a final state, it means that it would also end at a final state either on machine FA 1 or on machine FA2• Also, any string accepted by ei ther FA 1 or FA2 will be accepted by this FA3•

Converting Regular Expressions into FAs ALGORITHM

The general description of the algorithm we employed earlier is as follows. Starting with two machines, FA 1 with states y3 , , build a new Xi• 3 , and FA i with states machine FA 3 with states z l Zi• z3 , , where each z is of the form or ' The combination state or is the - state of the new FA. If either the part or the y part is a final state, then the corresponding z is a final state. To go from one z to another by reading a letter from the input string, we see what happens to the part and the y part and go to the new z accordingly. We could write this as a formula:

xi '

x,tart

x

Ystart

•

•

•

•

Yp Yi·

•

•

x

•

•

•

"xsomething x

Ysomethin/ '

znew after letter p = [xnew after letter p on FA 1 J or [ynew after letter p on FA i J Because there are only finitely many x's and y's, there can be only fi nitely many possi ble z's. Not all of them will necessarily be used in FA 3 if no input string beginning at - can get to them. In this way, we can build a machine that can accept the sum of two regular ex pressions if we already know machines to accept each of the component regular expressions separately. • EXAMPLE

(Inside the proof of Theorem 6)

Let us go through this very quickly once more on the two machines:

b

a, b

a

a

b

b

b

a

FA 1 accepts all words with a double a in them, and FA i accepts all words ending in rushbrookrathbone.co.uk machine that accepts the union of the two languages for these two machines begins: - z1

= X1

or

i

Y1

In z 1 if we read an a, we go to Xi or y 1 = z In z 1 if we read a b, we go to x 1 or Yi = z3 , which is a final state since Yi is. The partial picture of this machine is now a

b

1 14

CHAPTER 7

Kleene's Theorem

In z2 if we read an a, we go to x3 or y1 = z4, which is a final state because x3 is. In z2 if we read a b, we go to x1 or y2 = zr

In z3 if we read an a, we go to x2 or y1 = z2• In z3 if we read a b, we go to x1 or y2 = z3" In z4 if we read an a, we go to x3 or y1 = z4• In z4 if we read a b, we go to x3 or y2 = z5, which is a final state. In z5 if we read an a, we go to x3 or y 1 = z4• In z5 if we read a b, we go to x3 or y2 z5•

=

The whole machine looks like this:

a

h

This machine accepts all words that have a double a or that end in b. The seemingly logical possibility or y2

z6 = x2

does not arise. This is because to be in x2 on FA 1 means the last letter read is an a. But to be in y2 on FA2 means the last letter read is a b. These cannot both be true at the same time, so • no input string ever has the possibility of being in state z6 . EXAMPLE

(Inside the proof of Theorem 6)

Let FA 1 be the machine below that accepts all words that end in a: b

a

a

1 15

Converting Regular Expressions into FAs

and let FA2 be the machine below that accepts all words with an odd number of letters (odd length): a, b

� a, b

Using the algorithm produces the machine below that accepts all words that either have an odd number of letters or end in a: a

b

b

b

a

a

b

The only state that is not a + state is the - state. To get back to the start state, a word must have an even number of letters and end in b. • EXAMPLE

(Inside the proof of Theorem 6)

Let FA 1 be b a

a

which accepts all words ending in a, and let FA 2 be a

b b

a

which accepts all words ending in b.

Using the algorithm, we produce

1 16

CHAPTER 7

Kleene's Theorem

which accepts all words ending in a or b, that is, all words except A. Notice that the state x, or y2 cannot be reached because x2 means "we have just read an a" and y2 means "we hav e • just read a b." There is an alternate procedure for producing the union-machine form two-component machines that has a more compact mathematical description, but whose disadvantages are well illustrated by the example we have just considered. Let FA , have states x" x2, • • • and FA2 have states y " y2 , Then we can define FA3 initially as having all the possible states X; or yj for all combinations of i and j. The number of states in FA3 would always be the prod uct of the number of states in FA 1 and the number of states in FA2• For each state in FA3 we could then, in any order, draw its a-edge and b-edge because they would go to already exist ing states. What we have done before is create new z-states as the need arose, as in the Japan ese "just in time" approach to automobile manufacturing. This may seem a little haphazard, and we never really know when or whether the need for a new combination of x and y would arise. This alternate, more organized, approach has the advantage of knowing from the begin ning just how many states and edges we will need to draw, always the pessimistic estimate of the largest possible number. For the example above, we would start with four possible states: •

•

•

•

For each of these four states we would draw two edges, producing

a

h

1 17

Converting Regular Expressions into FAs

This is a perfectly possible FA for the union language FA 1 + FA2• However, on inspection we see that its lower right-hand state is completely useless because it can never be entered by any string starting at - . It is not against the definition of an FA to have such a useless state, nor is it a crime. It is simply an example of the tradeoff between constructing states in our need-to-have policy versus the more universal-seeming all-at-once strategy. By either algorithm, this concludes the proof of Rule 2. We still have two rules to go. Rule 3

If there is an FA 1 that accepts the language defined by the regular expression r1 and an FA2 that accepts the language defined by the regular expression r2, then there is an FA3 that accepts the language defined by the concatenation r 1 r2, the product language.

Источник: [rushbrookrathbone.co.uk]

Introduction to computer theory 2nd edition pdf download

Introduction to computer theory 2nd edition pdf download - site, with

Introduction to computer theory 2nd edition pdf download - clearly sorry

Introduction to Computer Theory by Daniel I. a. Cohen - Second Edition

Short Description

Description

2 thoughts to “Introduction to computer theory 2nd edition pdf download”

Leave a Reply Cancel reply