Parsing question

Post Reply
gerry.odea
Posts: 96
Joined: Fri Sep 19, 2008 9:33 am

Parsing question

Post by gerry.odea »

I'm trying to parse

<article class="story ">
<div class="story-photo lazy-photo ">
<a href="/article/us-wirecard-accounts-scholz/german-minister-demands-regulatory-rethink-after-wirecard-collapse-idUSKBN23W346">
<img src="https://s2.reutersmedia.net/resources_v ... atured.png" org-src="https://s4.reutersmedia.net/resources/r ... XMPEG5O28Y" border="0" alt=""/></a>
</div>
<div class="story-content">
<a href="/article/us-wirecard-accounts-scholz/german-minister-demands-regulatory-rethink-after-wirecard-collapse-idUSKBN23W346">
<h3 class="story-title">German minister demands regulatory rethink after Wirecard collapse</h3></a>
<div class="contributor"></div>
<p>German Finance Minister Olaf Scholz described as a scandal the collapse of Wirecard after the payments company disclosed a hole in its books that left it owing creditors almost $4 billion, adding it was a wake-up call for supervision.</p>
<time class="article-time"><span class="timestamp">3:01pm EDT</span></time>
</div>
</article>

with something like
<$imports=
'recdelim >><item
multiple
field Title varchar />><title>\P=!</title>+ ""
field Link varchar />><link>\P=!</link>+ ""
'>

but I'm not sure how to get the title from <h3 class="story-title"> and the url from <div class="story-content"> <a href="">

Do you have a link to your parser or could you explain how I would write it?

Thanks,
Gerry
User avatar
mark
Site Admin
Posts: 5495
Joined: Tue Apr 25, 2000 6:56 pm

Parsing question

Post by mark »

The parser is REX.

Title
/>><h3\x20class\="story-title">\P=!</h3>*

Link
/>><div\x20class\="story-content">=\space*<a\x20href\="\P=[^"]*
User avatar
mark
Site Admin
Posts: 5495
Joined: Tue Apr 25, 2000 6:56 pm

Parsing question

Post by mark »

Post Reply