Importing CSS Reference
<p>I thought it would be kinda cool to have the css reference as some kind of database that i could refer to.</p>
<p>this is how to import it somewhere else.</p>
<h3>Import CSS Reference Function</h3>
<p>This is the initial function, which i had to modify later, as it was crashing due to too many http requests. </p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference /
public function importcssreference (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$forcounter = 0;
$mainlooptag = "#sect2 li";
$dbtablename = $this->db->escapeString($this->dbtablename);
$loopmax = $this->db->escapeString($loopmax);
requireonce("lib/simplehtmldom.php");
$html = filegethtml($this->importurl);
foreach($html->find($mainlooptag) as $item) {
$forcounter++;
if($forcounter == $loopmax) {
continue;
}
$csslink = $item->find('a',0)->href;
$csstitle = $item->find('a',0)->plaintext;
$out .= "\$csslink:$csslink<br />";
$out .= "\$csstitle:$csstitle<br />";
// this is all we need for stage one, then open the page link and process.
$htmlsource = filegethtml($csslink);
$replycount = 0;
$out .= "<hr />";
return $out;
}
}</code></pre>
<p>this is returning the first item here, so its working as intended so far. </p>
<p>Find the div with the id #sect2 li as a loop item. </p>
<p>Then for each of them grab the link and the title. </p>
<p>Its finding the first item here and getting its link and title</p>
<p><img src="https://i.imgur.com/Tp7X6Cy.png" /></p>
<p><img src="https://i.imgur.com/KAHrUQK.png" /></p>
<h3> </h3>
<h3>append the domain link</h3>
<p>usually the links will not include the full domain, so need to manually append it for the full link</p>
<h4>PHP</h4>
<pre><code class="php hljs">$xelementlink = "https://the-domain.org".$xelementlink;</code></pre>
<h3> </h3>
<h3>grab the content</h3>
<p>this part is causing a timeout on the server, processing too many html requests on one loop can cause the server to crash so, need to move this part to a separate function and request.</p>
<h4>PHP</h4>
<pre><code class="php hljs">$htmlsource = filegethtml($xelementlink);</code></pre>
<p> </p>
<h3>importing problem, server 504 error timeout</h3>
<p>Problem its only importing 11 items, so will need to manually add each link to a temp table and then, run the details as a separate import for each item.</p>
<p>Split the import, so the 1st bit just adds the titles and links, then another import can go through each of them, and add the missing details from the second part of the function.</p>
<p>Increased memory size limit of php from 128mb to 256mb, but still timing out on import.</p>
<p><code>/etc/php/7.4/fpm$ sudo nano php.ini</code></p>
<p><code>#find memlimit and change to 256</code></p>
<h3>Import Full</h3>
<p>currently crashing the server, causing 504 error timeout. </p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference /
/ this import crashes after 10 items - so need to split into smaller import chunks /
public function importcssreference (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$forcounter = 0;
$mainlooptag = "#sect2 li";
$dbtablename = $this->db->escapeString($this->dbtablename);
$loopmax = $this->db->escapeString($loopmax);
requireonce("lib/simplehtmldom.php");
$html = filegethtml($this->importurl);
foreach($html->find($mainlooptag) as $item) {
$forcounter++;
if($forcounter == $loopmax) {
continue;
}
$xelementlink = $item->find('a',0)->href;
$xelementtitle = $item->find('a',0)->plaintext;
$xelementtitle = trim($xelementtitle);
$out .= "\$xelementlink:$xelementlink<br />";
$out .= "\$xelementtitle:$xelementtitle<br />";
$xelementlink = "https://the-domain.org".$xelementlink;
$htmlsource = filegethtml($xelementlink);
$replycount = 0;
$out .= "<hr />";
foreach($htmlsource->find(".main-content") as $maincontent) {
$xtitle = $maincontent->find("h1",0)->plaintext;
$xtitle = trim($xtitle);
$out .= "\$xtitle:$xtitle<br />";
$xsummary = $maincontent->find("p",0)->innertext;
$xsummary = trim($xsummary);
$out .= "\$xsummary:$xsummary<br />";
$xsummary2 = $maincontent->find("p",1)->innertext;
$xsummary2 = trim($xsummary2);
$out .= "\$xsummary2:$xsummary2<br />";
$xmd5 = md5($xelementtitle);
$out .= "\$xmd5:$xmd5<br />";
$xcategory = "CSS";
$out .= "\$xcategory:$xcategory<br />";
$xadditional = $maincontent->innertext;
$out .= "\$xadditional:$xadditional<br />";
// start the class
$linkedclass = new $this->linkedclass;
$linkedclass->addtomenu = false;
$linkedclass->start();
// assign all vars
$linkedclass->title = $xtitle;
$linkedclass->additional = $xadditional;
$linkedclass->category = $xcategory;
$linkedclass->md5 = $xmd5;
$linkedclass->summary = $xsummary;
$linkedclass->summary2 = $xsummary2;
$linkedclass->elementtitle = $xelementtitle;
$linkedclass->sourcelink = $xelementlink;
// check if title md5 exists
if(!$linkedclass->md5exists($xmd5)) {
if($linkedclass->add()) {
$out .= "Item $linkedclass->title Added<br>";
}
}
}
}
return $out;
}
/ Import CSS Reference /</code></pre>
<p> </p>
<p>Split the import into part1 and part2, so the 1st part of the import should just be loading one page, but its still giving me a 504 Gateway Time-out and only adding 4 items for some reason. </p>
<p>Even less items than the more complicated import.</p>
<p> </p>
<h2>Import Part 1</h2>
<p>This is a smaller import and is only grabbing the 1st page and not following url’s so it should be working better than the full import, but only adds 4 items. Hmm... </p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference - Part 1 /
/ this import crashes after 10 items - so need to split into smaller import chunks /
public function importcssreferencepart1 (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$forcounter = 0;
$mainlooptag = "#sect2 li";
$dbtablename = $this->db->escapeString($this->dbtablename);
$loopmax = $this->db->escapeString($loopmax);
requireonce("lib/simplehtmldom.php");
$html = filegethtml($this->importurl);
foreach($html->find($mainlooptag) as $item) {
$forcounter++;
if($forcounter == $loopmax) {
continue;
}
$xelementlink = $item->find('a',0)->href;
$xelementtitle = $item->find('a',0)->plaintext;
$xelementtitle = trim($xelementtitle);
$out .= "\$xelementlink:$xelementlink<br />";
$out .= "\$xelementtitle:$xelementtitle<br />";
$xelementlink = "https://the-domain.org".$xelementlink;
$htmlsource = filegethtml($xelementlink);
$replycount = 0;
$out .= "<hr />";
// start the class
$linkedclass = new $this->linkedclass;
$linkedclass->addtomenu = false;
$linkedclass->start();
$xmd5 = md5($xelementtitle);
$xcategory = "CSS";
// assign items
$linkedclass->title = $xelementtitle;
$linkedclass->category = $xcategory;
$linkedclass->md5 = $xmd5;
$linkedclass->sourcelink = $xelementlink;
/ these following items can come from part 2 of the import /
// $linkedclass->additional = $xadditional;
// $linkedclass->summary = $xsummary;
// $linkedclass->summary2 = $xsummary2;
// $linkedclass->longtitle = $longtitle;
// check if title md5 exists
if(!$linkedclass->md5exists($xmd5)) {
if($linkedclass->add()) {
$out .= "Item $linkedclass->title Added<br>";
}
}
}
return $out;
}
/ Import CSS Reference - Part 1 /</code></pre>
<p>Still causing this timeout. </p>
<p><img src="https://i.imgur.com/CcakklX.png" /></p>
<h3>Timeout Fixed</h3>
<p>Actually I see the issue now, i left the download source line in there. Doh!</p>
<h4>PHP</h4>
<pre><code class="php hljs">// get rid of this line and it should run ok
$htmlsource = filegethtml($xelementlink);</code></pre>
<p> </p>
<h3>Import Stage 1 Working Now</h3>
<p>Just the titles and the links for now. </p>
<p><img src="https://i.imgur.com/b0kE0lL.png" /></p>
<p><img src="https://i.imgur.com/lsSJ1aM.png" /></p>
<p>Woo 695 CSS Attribute Items, with no crash. </p>
<p>Now to get the second part of the import done.</p>
<p> </p>
<h3>Part 2 Import</h3>
<p>the import will need a way to check if the import has already been processed.</p>
<p>check through each item in cssreference and if the other flag is blank then process it, otherwise set it to processed. just do one at a time, and then add to a 1 min cron, then in 695 minutes it should be all processed. thats a long time, maybe run it every 5 seconds, and stop it after 700 x 5 seconds.</p>
<p>added cron, remove this after a day or so.</p>
<p><code>/3 wget --spider https://theimporturl/ > /dev/null 2>&1</code></p>
<h4>PHP</h4>
<pre><code class="php hljs">/ Import CSS Reference - Part 2 /
/
This one needs to, load a single item from the cssreference
grab the url, load the content and populate the missing items.
when loading the item it needs to also add something to the other field, to mark it processed
/
public function importcssreferencepart2 (
$loopmax = 10
) {
/ This will be run on a loaded import item, so dont need to pass variabled to the function /
global $db;
global $functions;
$out = "";
$forcounter = 0;
$mainlooptag = ".main-content";
$cssreference = new cssreference;
$cssreference->addtomenu = false;
$cssreference->start();
// load item - using fields array
$fieldsarray = [
"other" => "",
];
if(!$cssreference->loadfromfieldsarray($fieldsarray, $max = 1)) {
return "nothing to load";
}
// new item should now be loaded
$out .= $cssreference->title . "<br />";
$dbtablename = $this->db->escapeString($this->dbtablename);
$loopmax = $this->db->escapeString($loopmax);
requireonce("lib/simplehtmldom.php");
$html = filegethtml($cssreference->sourcelink); // new url based on loaded item
foreach($html->find($mainlooptag) as $maincontent) {
if($forcounter == $loopmax) {
continue;
}
$forcounter++;
$xtitle = $maincontent->find("h1",0)->plaintext;
$xtitle = trim($xtitle);
$out .= "\$xtitle:$xtitle<br />";
$cssreference->longtitle = $xtitle;
$xsummary = $maincontent->find("p",0)->innertext;
$xsummary = trim($xsummary);
$out .= "\$xsummary:$xsummary<br />";
$cssreference->summary = $xsummary;
//$xsummary2 = $maincontent->find("p",1)->innertext;
$xsummary2 = $maincontent->find(".code-example",0)->innertext;
$xsummary2 = trim($xsummary2);
$out .= "\$xsummary2:$xsummary2<br />";
$cssreference->summary2 = $xsummary2;
$xadditional = $maincontent->innertext;
$out .= "\$xadditional:$xadditional<br />";
$cssreference->additional = $xadditional;
$cssreference->other = "processed";
if($cssreference->update()) {
$out .= "Item $cssreference->title Updated<br>";
}
// check if title md5 exists
/
if(!$cssreference->md5exists($xmd5)) {
if($cssreference->add()) {
$out .= "Item $css_reference->title Updated<br>";
}
}
/
}
return $out;
}
/ Import CSS Reference - Part 2 */</code></pre>
<p>Ran this over night and some of the items imported and then the importer was timing out again, so increased the script processing time to 60 seconds on php, and now it seems to be working slowly again. Maybe the end site is slow. </p>
<p>Found the reason it was crashing is that the source link was not the correct doc link, so it was trying to import from an incorrect page, which was crashing the script somehow. </p>
<p>So go through and delete or mark as processed the ones with incorrect source links and it should continue.</p>
<p>I think the issue was that it had some items with disabled links that it was still using as a link source, removing these disabled links stopped the crashing. Yay!</p>