Posted in Data Imports
381
11:48 pm, February 22, 2022
 

Importing CSS Reference

<p>I thought it would be kinda cool to have the css reference as some kind of database that i could refer to.</p>
<p>this is how to import it somewhere else.</p>
<h3>Import CSS Reference Function</h3>
<p>This is the initial function, which i had to modify later, as it was crashing due to too many http requests.&nbsp;</p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference /
public function importcssreference (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$for
counter = 0;
$mainlooptag = "#sect2 li";

$dbtablename = $this-&gt;db-&gt;escapeString($this-&gt;dbtablename);
$loopmax = $this-&gt;db-&gt;escapeString($loopmax);

requireonce("lib/simplehtmldom.php");
$html = file
gethtml($this-&gt;importurl);

foreach($html-&gt;find($mainlooptag) as $item) {

$forcounter++;
if($for
counter == $loopmax) {
continue;
}

$csslink = $item-&gt;find('a',0)-&gt;href;
$csstitle = $item-&gt;find('a',0)-&gt;plaintext;

$out .= "\$csslink:$csslink&lt;br /&gt;";
$out .= "\$css
title:$csstitle&lt;br /&gt;";
// this is all we need for stage one, then open the page link and process.

$htmlsource = filegethtml($csslink);

$replycount = 0;

$out .= "&lt;hr /&gt;";

return $out;

}
}</code></pre>
<p>this is returning the first item here, so its working as intended so far.&nbsp;</p>
<p>Find the div with the id #sect2 li as a loop item.&nbsp;</p>
<p>Then for each of them grab the link and the title.&nbsp;</p>
<p>Its finding the first item here and getting its link and title</p>
<p><img src="https://i.imgur.com/Tp7X6Cy.png" /></p>
<p><img src="https://i.imgur.com/KAHrUQK.png" /></p>
<h3>&nbsp;</h3>
<h3>append the domain link</h3>
<p>usually the links will not include the full domain, so need to manually append it for the full link</p>
<h4>PHP</h4>
<pre><code class="php hljs">$xelementlink = "https://the-domain.org".$xelementlink;</code></pre>
<h3>&nbsp;</h3>
<h3>grab the content</h3>
<p>this part is causing a timeout on the server, processing too many html requests on one loop can cause the server to crash so, need to move this part to a separate function and request.</p>
<h4>PHP</h4>
<pre><code class="php hljs">$htmlsource = filegethtml($xelementlink);</code></pre>
<p>&nbsp;</p>
<h3>importing problem, server 504 error timeout</h3>
<p>Problem its only importing 11 items, so will need to manually add each link to a temp table and then, run the details as a separate import for each item.</p>
<p>Split the import, so the 1st bit just adds the titles and links, then another import can go through each of them, and add the missing details from the second part of the function.</p>
<p>Increased memory size limit of php from 128mb to 256mb, but still timing out on import.</p>
<p><code>/etc/php/7.4/fpm$ sudo nano php.ini</code></p>
<p><code>#find mem
limit and change to 256</code></p>
<h3>Import Full</h3>
<p>currently crashing the server, causing 504 error timeout.&nbsp;</p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference /
/ this import crashes after 10 items - so need to split into smaller import chunks /

public function importcssreference (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$for
counter = 0;
$mainlooptag = "#sect2 li";

$dbtablename = $this-&gt;db-&gt;escapeString($this-&gt;dbtablename);
$loopmax = $this-&gt;db-&gt;escapeString($loopmax);

requireonce("lib/simplehtmldom.php");
$html = file
gethtml($this-&gt;importurl);

foreach($html-&gt;find($mainlooptag) as $item) {

$forcounter++;
if($for
counter == $loopmax) {
continue;
}

$xelementlink = $item-&gt;find('a',0)-&gt;href;
$x
elementtitle = $item-&gt;find('a',0)-&gt;plaintext;
$x
elementtitle = trim($xelementtitle);

$out .= "\$xelementlink:$xelementlink&lt;br /&gt;";
$out .= "\$x
elementtitle:$xelementtitle&lt;br /&gt;";

$xelementlink = "https://the-domain.org".$xelementlink;
$html
source = filegethtml($xelementlink);
$replycount = 0;
$out .= "&lt;hr /&gt;";

foreach($htmlsource-&gt;find(".main-content") as $maincontent) {

$xtitle = $maincontent-&gt;find("h1",0)-&gt;plaintext;
$x
title = trim($xtitle);
$out .= "\$x
title:$xtitle&lt;br /&gt;";

$xsummary = $maincontent-&gt;find("p",0)-&gt;innertext;
$x
summary = trim($xsummary);
$out .= "\$x
summary:$xsummary&lt;br /&gt;";

$xsummary2 = $maincontent-&gt;find("p",1)-&gt;innertext;
$xsummary2 = trim($xsummary2);
$out .= "\$xsummary2:$xsummary2&lt;br /&gt;";

$xmd5 = md5($xelementtitle);
$out .= "\$x
md5:$xmd5&lt;br /&gt;";

$xcategory = "CSS";
$out .= "\$xcategory:$xcategory&lt;br /&gt;";

$xadditional = $maincontent-&gt;innertext;
$out .= "\$xadditional:$xadditional&lt;br /&gt;";

// start the class
$linkedclass = new $this-&gt;linkedclass;
$linkedclass-&gt;addtomenu = false;
$linked
class-&gt;start();

// assign all vars
$linkedclass-&gt;title = $xtitle;
$linkedclass-&gt;additional = $xadditional;
$linkedclass-&gt;category = $xcategory;
$linkedclass-&gt;md5 = $xmd5;
$linkedclass-&gt;summary = $xsummary;
$linkedclass-&gt;summary2 = $xsummary2;
$linkedclass-&gt;elementtitle = $xelementtitle;
$linkedclass-&gt;sourcelink = $xelementlink;

// check if title md5 exists
if(!$linkedclass-&gt;md5exists($xmd5)) {
if($linked
class-&gt;add()) {
$out .= "Item $linkedclass-&gt;title Added&lt;br&gt;";
}
}

}

}

return $out;

}
/ Import CSS Reference /</code></pre>
<p>&nbsp;</p>
<p>Split the import into part1 and part2, so the 1st part of the import should just be loading one page, but its still giving me a 504 Gateway Time-out and only adding 4 items for some reason.&nbsp;</p>
<p>Even less items than the more complicated import.</p>
<p>&nbsp;</p>
<h2>Import Part 1</h2>
<p>This is a smaller import and is only grabbing the 1st page and not following url&rsquo;s so it should be working better than the full import, but only adds 4 items. Hmm...&nbsp;</p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference - Part 1 /
/ this import crashes after 10 items - so need to split into smaller import chunks /

public function importcssreferencepart1 (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$for
counter = 0;
$mainlooptag = "#sect2 li";

$dbtablename = $this-&gt;db-&gt;escapeString($this-&gt;dbtablename);
$loopmax = $this-&gt;db-&gt;escapeString($loopmax);

requireonce("lib/simplehtmldom.php");
$html = file
gethtml($this-&gt;importurl);

foreach($html-&gt;find($mainlooptag) as $item) {

$forcounter++;
if($for
counter == $loopmax) {
continue;
}

$xelementlink = $item-&gt;find('a',0)-&gt;href;
$x
elementtitle = $item-&gt;find('a',0)-&gt;plaintext;
$x
elementtitle = trim($xelementtitle);

$out .= "\$xelementlink:$xelementlink&lt;br /&gt;";
$out .= "\$x
elementtitle:$xelementtitle&lt;br /&gt;";

$xelementlink = "https://the-domain.org".$xelementlink;
$html
source = filegethtml($xelementlink);
$replycount = 0;
$out .= "&lt;hr /&gt;";

// start the class
$linked
class = new $this-&gt;linkedclass;
$linked
class-&gt;addtomenu = false;
$linkedclass-&gt;start();

$xmd5 = md5($xelementtitle);
$xcategory = "CSS";

// assign items
$linked
class-&gt;title = $xelementtitle;
$linkedclass-&gt;category = $xcategory;
$linkedclass-&gt;md5 = $xmd5;
$linkedclass-&gt;sourcelink = $xelementlink;

/ these following items can come from part 2 of the import /

// $linkedclass-&gt;additional = $xadditional;
// $linkedclass-&gt;summary = $xsummary;
// $linkedclass-&gt;summary2 = $xsummary2;
// $linkedclass-&gt;longtitle = $longtitle;

// check if title md5 exists
if(!$linked
class-&gt;md5exists($xmd5)) {
if($linkedclass-&gt;add()) {
$out .= "Item $linked
class-&gt;title Added&lt;br&gt;";
}
}

}

return $out;

}
/ Import CSS Reference - Part 1 /</code></pre>
<p>Still causing this timeout.&nbsp;</p>
<p><img src="https://i.imgur.com/CcakklX.png" /></p>
<h3>Timeout Fixed</h3>
<p>Actually I see the issue now, i left the download source line in there. Doh!</p>
<h4>PHP</h4>
<pre><code class="php hljs">// get rid of this line and it should run ok
$htmlsource = filegethtml($xelementlink);</code></pre>
<p>&nbsp;</p>
<h3>Import Stage 1 Working Now</h3>
<p>Just the titles and the links for now.&nbsp;</p>
<p><img src="https://i.imgur.com/b0kE0lL.png" /></p>
<p><img src="https://i.imgur.com/lsSJ1aM.png" /></p>
<p>Woo 695 CSS Attribute Items, with no crash.&nbsp;</p>
<p>Now to get the second part of the import done.</p>
<p>&nbsp;</p>
<h3>Part 2 Import</h3>
<p>the import will need a way to check if the import has already been processed.</p>
<p>check through each item in css
reference and if the other flag is blank then process it, otherwise set it to processed. just do one at a time, and then add to a 1 min cron, then in 695 minutes it should be all processed. thats a long time, maybe run it every 5 seconds, and stop it after 700 x 5 seconds.</p>
<p>added cron, remove this after a day or so.</p>
<p><code>/3 wget --spider https://theimporturl/ &gt; /dev/null 2&gt;&amp;1</code></p>
<h4>PHP</h4>
<pre><code class="php hljs">/
Import CSS Reference - Part 2 /

/
This one needs to, load a single item from the cssreference
grab the url, load the content and populate the missing items.
when loading the item it needs to also add something to the other field, to mark it processed
/

public function importcssreferencepart2 (
$loopmax = 10
) {
/
This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$for
counter = 0;
$mainlooptag = ".main-content";

$cssreference = new cssreference;
$cssreference-&gt;addtomenu = false;
$css
reference-&gt;start();

// load item - using fields array
$fieldsarray = [
"other" =&gt; "",
];
if(!$css
reference-&gt;loadfromfieldsarray($fieldsarray, $max = 1)) {
return "nothing to load";
}

// new item should now be loaded

$out .= $cssreference-&gt;title . "&lt;br /&gt;";

$dbtablename = $this-&gt;db-&gt;escapeString($this-&gt;dbtablename);
$loop
max = $this-&gt;db-&gt;escapeString($loopmax);

requireonce("lib/simplehtmldom.php");
$html = filegethtml($cssreference-&gt;sourcelink); // new url based on loaded item

foreach($html-&gt;find($mainlooptag) as $maincontent) {

if($forcounter == $loopmax) {
continue;
}
$for
counter++;

$xtitle = $maincontent-&gt;find("h1",0)-&gt;plaintext;
$xtitle = trim($xtitle);
$out .= "\$xtitle:$xtitle&lt;br /&gt;";
$cssreference-&gt;longtitle = $xtitle;

$xsummary = $maincontent-&gt;find("p",0)-&gt;innertext;
$x
summary = trim($xsummary);
$out .= "\$x
summary:$xsummary&lt;br /&gt;";
$css
reference-&gt;summary = $xsummary;

//$xsummary2 = $maincontent-&gt;find("p",1)-&gt;innertext;
$xsummary2 = $maincontent-&gt;find(".code-example",0)-&gt;innertext;
$x
summary2 = trim($xsummary2);
$out .= "\$x
summary2:$xsummary2&lt;br /&gt;";
$css
reference-&gt;summary2 = $xsummary2;

$xadditional = $maincontent-&gt;innertext;
$out .= "\$x
additional:$xadditional&lt;br /&gt;";
$css
reference-&gt;additional = $xadditional;

$cssreference-&gt;other = "processed";

if($cssreference-&gt;update()) {
$out .= "Item $css
reference-&gt;title Updated&lt;br&gt;";
}

// check if title md5 exists
/

if(!$cssreference-&gt;md5exists($xmd5)) {
if($css
reference-&gt;add()) {
$out .= "Item $css_reference-&gt;title Updated&lt;br&gt;";
}
}
/

}

return $out;

}
/
Import CSS Reference - Part 2 */</code></pre>
<p>Ran this over night and some of the items &nbsp;imported and then the importer was timing out again, so increased the script processing time to 60 seconds on php, and now it seems to be working slowly again. Maybe the end site is slow.&nbsp;</p>
<p>Found the reason it was crashing is that the source link was not the correct doc link, so it was trying to import from an incorrect page, which was crashing the script somehow.&nbsp;</p>
<p>So go through and delete or mark as processed the ones with incorrect source links and it should continue.</p>
<p>I think the issue was that it had some items with disabled links that it was still using as a link source, removing these disabled links stopped the crashing. Yay!</p>

View Statistics
This Week
156
This Month
600
This Year
415

No Items Found.

Add Comment
Type in a Nick Name here
 
Other Items in Data Imports
Search Articles
Search Articles by entering your search text above.
Welcome

This is my test area for webdev. I keep a collection of code here, mostly for my reference. Also if i find a good link, i usually add it here and then forget about it. more...

You could also follow me on twitter. I have a couple of youtube channels if you want to see some video related content. RuneScape 3, Minecraft and also a coding channel here Web Dev.

If you found something useful or like my work, you can buy me a coffee here. Mmm Coffee. ☕

❤️👩‍💻🎮

🪦 2000 - 16 Oct 2022 - Boots
Random Quote
Most people can do absolutely awe-inspiring things,” he said. “Sometimes they just need a little nudge.
Unknown
Latest News
## 🚀 AI Giants Hit Bullseye: Anthropic & OpenAI Achieve Product-Market Fit Anthropic and OpenAI have reached a significant milestone, finding product-market fit with their AI technologies, which means their products effectively meet the needs of their customers, driving growth and adoption. This achievement showcases the practical value of their innovations, enabling businesses and individuals to leverage AI for enhanced productivity and efficiency. With this alignment of product and market needs, these companies are poised to transform industries and shape the future of technology.