1
00:00:01,000 --> 00:00:02,400
Welcome back.

2
00:00:02,410 --> 00:00:07,270
Now that we got our pattern ready for our e-mails we need to tell the program right after it scans the

3
00:00:07,270 --> 00:00:10,930
first link that it should go to the second link and do that all over.

4
00:00:10,930 --> 00:00:12,200
Up to 100.

5
00:00:12,260 --> 00:00:12,590
OK.

6
00:00:13,090 --> 00:00:17,230
So in order to do that we're going to use the beautiful soup library that we imported.

7
00:00:17,290 --> 00:00:20,620
So let's create a beautiful soup object and call it soup.

8
00:00:20,620 --> 00:00:27,700
It will be equal to Beautiful Soup which we are going to use onto the response text with the features

9
00:00:28,000 --> 00:00:31,370
equal to double quotes Alex and Elle.

10
00:00:32,290 --> 00:00:32,970
All right.

11
00:00:33,010 --> 00:00:34,240
Now for each anchor.

12
00:00:34,270 --> 00:00:42,760
So for anchor in soup dot find underscored all and we are searching for the 8:00 tag right now entering

13
00:00:42,760 --> 00:00:43,720
the fourth statement.

14
00:00:43,720 --> 00:00:45,630
Make sure you add two dots.

15
00:00:45,730 --> 00:00:52,870
Click enter and now we will specify the link to be equal to anchor that attributes or 80 hour s and

16
00:00:52,870 --> 00:00:58,900
we're searching for the a draft because after the draft comes the link and then we going to use that

17
00:00:58,900 --> 00:01:02,370
link in order to process the next scan for our e-mails.

18
00:01:03,190 --> 00:01:04,420
Let's add the if statement.

19
00:01:04,420 --> 00:01:15,690
So if that a trip is already in anchor not attributes in any other case we will specify nothing.

20
00:01:15,700 --> 00:01:27,360
We also want to check if link that starts with a flash then the link will be equal to base you are L

21
00:01:27,820 --> 00:01:32,360
plus link else if the link doesn't start.

22
00:01:32,370 --> 00:01:43,930
So else if not link starts with and then we will specify HDP then the link will be equal to link equals

23
00:01:43,930 --> 00:01:46,000
path plus link.

24
00:01:46,210 --> 00:01:49,140
And the last if statement will be if not.

25
00:01:49,570 --> 00:02:00,130
So if our link is not in else so if not link in you or else and not link in scraped you or else then

26
00:02:00,130 --> 00:02:08,050
we want to append that link so link that append or pardon me you or else not append and the link is

27
00:02:08,050 --> 00:02:12,970
what we want to append so you are else not append link

28
00:02:15,790 --> 00:02:19,460
OK so that should be everything that we need.

29
00:02:19,630 --> 00:02:26,200
We have some few red underlines which I'm not really sure by we do have that maybe we need to import

30
00:02:26,230 --> 00:02:27,600
beautiful soup like this.

31
00:02:27,790 --> 00:02:31,440
Let's use B as for dot and then Beautiful Soup.

32
00:02:31,480 --> 00:02:33,340
Does that fix the problem.

33
00:02:33,340 --> 00:02:34,330
No it doesn't.

34
00:02:34,330 --> 00:02:41,160
So let's delete this maybe it is just some bug and that's why this red underlined we're going to see

35
00:02:41,160 --> 00:02:42,470
once we start the program.

36
00:02:42,780 --> 00:02:47,760
But before we do that we want to add and accept statement because if you remember at the beginning we

37
00:02:47,760 --> 00:02:49,000
had a tri statement.

38
00:02:49,020 --> 00:02:51,450
So at the end we need to have accept

39
00:02:54,460 --> 00:02:55,730
keyboard interrupts.

40
00:02:55,750 --> 00:03:01,640
So in case we want to keyboard interrupt we're going to print simply just closing

41
00:03:04,250 --> 00:03:06,440
and we are forgetting the most important part.

42
00:03:06,440 --> 00:03:10,310
That after the execution of this program we want to print all of the e-mails.

43
00:03:10,400 --> 00:03:15,150
So let's do that for mail in e-mails.

44
00:03:15,260 --> 00:03:20,160
Print the mail and it will print them one by one.

45
00:03:20,900 --> 00:03:23,970
So let's give it a try and see whether this program works.

46
00:03:23,970 --> 00:03:32,220
If we open up our terminal and type Python 3 e-mail scraper that b y we get the syntax error so we do

47
00:03:32,220 --> 00:03:33,340
have some syntax error.

48
00:03:33,360 --> 00:03:34,950
Let's see why we have it.

49
00:03:35,100 --> 00:03:42,550
Let's go all the way down and it seems that we are missing one bracket so let's open it up.

50
00:03:42,580 --> 00:03:44,500
Let me just see where do we need to open it.

51
00:03:45,550 --> 00:03:46,390
Oh never mind.

52
00:03:46,390 --> 00:03:47,430
This is the mistake.

53
00:03:47,470 --> 00:03:49,720
We do not need the brackets right here.

54
00:03:49,720 --> 00:03:51,200
So we can remove that.

55
00:03:51,400 --> 00:03:53,940
And now let's see whether we have any other errors.

56
00:03:53,950 --> 00:03:54,220
No.

57
00:03:54,220 --> 00:03:55,530
Everything seems to be fixed.

58
00:03:55,540 --> 00:04:02,000
So now we can run our program by three e-mail scraper that be.

59
00:04:02,870 --> 00:04:04,450
And to target your old skin.

60
00:04:05,290 --> 00:04:08,350
I will use this one HDP slash slash.

61
00:04:08,500 --> 00:04:19,090
This mass that BGP DOP a C dot R S it will process this year out and it will give us an error.

62
00:04:19,090 --> 00:04:19,960
Let's see what it says.

63
00:04:19,960 --> 00:04:23,260
Couldn't find a tree builder with the features you requested.

64
00:04:23,260 --> 00:04:27,190
Do you need to install a parser library.

65
00:04:27,250 --> 00:04:34,960
Now if we get this error we most likely need to run this comment so Page 3 install El X and l click

66
00:04:34,960 --> 00:04:39,070
on enter and this will collect it for you and write after it.

67
00:04:39,070 --> 00:04:41,070
We should no longer have that problem.

68
00:04:41,080 --> 00:04:46,940
So let's try it once again entered the same euro.

69
00:04:47,860 --> 00:04:55,060
It will process the first euro the second euro and this will finish after it hits 100 euros right after

70
00:04:55,060 --> 00:04:55,240
it.

71
00:04:55,270 --> 00:05:00,020
It should print all the emails that it managed to find in the links that it scraped.

72
00:05:01,030 --> 00:05:06,420
So I would wait for this to finish and we are going to see the output once it reaches 100.

73
00:05:06,780 --> 00:05:11,590
Okay so our program has finished and we can already see the output of all of the emails.

74
00:05:11,590 --> 00:05:14,980
Let me enlarge this terminal so we can see everything.

75
00:05:14,980 --> 00:05:20,430
And if it's wrong all the way up you can see all of the emails with the exact same domain name that

76
00:05:20,440 --> 00:05:22,120
our program managed to find.

77
00:05:22,120 --> 00:05:27,800
Scanning first 100 links so you can see there is a lot of them.

78
00:05:28,150 --> 00:05:34,240
I'm scrolling all the way up there is at least like 50 emails that we managed to find.

79
00:05:34,420 --> 00:05:40,720
And here is the end of the program while we processed first one hundred year or else so all of these

80
00:05:40,720 --> 00:05:44,740
emails were found in these 100 year else.

81
00:05:44,780 --> 00:05:47,410
OK so our program works really well.

82
00:05:47,710 --> 00:05:53,020
Now you can operate it if you want to can make it count for first 1000 euros and you will receive even

83
00:05:53,020 --> 00:05:54,220
more results.

84
00:05:54,220 --> 00:06:01,000
But there is really no need that you can already see how many potential emails we got for further attack.

85
00:06:01,000 --> 00:06:05,010
Now keep in mind this is not a website that I own or that I am allowed to attack.

86
00:06:05,020 --> 00:06:10,410
So make sure that you do not use this website in order to further on plan your attack.

87
00:06:10,450 --> 00:06:16,270
We just scan this website in order to show how our program works and for now on it seems to work really

88
00:06:16,270 --> 00:06:16,660
well.

89
00:06:18,540 --> 00:06:20,550
Now let's try with a different program as well.

90
00:06:20,550 --> 00:06:25,140
So for example let's see whether we can find something if you can't Google.

91
00:06:25,140 --> 00:06:26,420
You know I never really tried it.

92
00:06:26,430 --> 00:06:28,520
So let's try it together.

93
00:06:28,530 --> 00:06:34,290
If we type HDP s w w w Google dot com let's try it.

94
00:06:34,290 --> 00:06:40,590
It will process first 100 euros and let's see whether it will manage to find any emails inside of those

95
00:06:40,590 --> 00:06:43,320
100 euros okay.

96
00:06:43,350 --> 00:06:48,210
So the program has finished once again and these are all of the e-mails that we managed to find from

97
00:06:48,210 --> 00:06:49,600
these 100 links.

98
00:06:49,610 --> 00:06:50,500
Really cool right.

99
00:06:51,210 --> 00:06:56,160
So now you have a program that can gather all of the emails from a specified domain name and then you

100
00:06:56,160 --> 00:06:58,310
can use them to plan further attack.

101
00:06:58,350 --> 00:07:01,420
All right so that would be about it for this project.

102
00:07:01,440 --> 00:07:04,920
I hope you enjoyed it and I hope I see you in the next project.

103
00:07:04,920 --> 00:07:05,640
Take care.

104
00:07:05,640 --> 00:07:05,900
Bye.
