Python @ DjangoSpin

PyPro #43 Extract all links from anchor tags

Buffer this pageShare on FacebookPrint this pageTweet about this on TwitterShare on Google+Share on LinkedInShare on StumbleUpon
Reading Time: 1 minutes

Extract all links from anchor tags in Python

Extract all links from anchor tags in Python

Extract all links from anchor tags: Write a Python program to extract all hyperlinks from anchor tags in an HTML source string.

Extract all links from anchor tags

import re

sourceString = '''
<p>Paragraph 1</p>
<a href = "link_one">Link 1</a><br />
<a href = "link_two">Link 2</a><br />
<a href = "link_three">Link 3</a><br />

<p>Paragraph 2</p>
<a href = "link_four">Link 4</a><br />
<a href = "link_five">Link 5</a><br />
<a href = "link_six">Link 6</a><br />
'''

print(re.findall('<a.+?href\s*=\s*[\'\"](.+?)[\'\"]', sourceString))		# ['link_one', 'link_two', 'link_three', 'link_four', 'link_five', 'link_six']

To know more about Regular Expressions in Python, click here.


See also:

Buffer this pageShare on FacebookPrint this pageTweet about this on TwitterShare on Google+Share on LinkedInShare on StumbleUpon

Leave a Reply