Automating OLDaily for LinkedIn

LinkedIn can be a pain to work with. I have long wanted to publish OLDaily, my daily newsletter, on it but the API doesn't support this.

This week I came up with a new plan: use browser automation software running on my desktop at home to download my newsletter XML, format it, and submit it as a new LinkedIn newsletter issue on my LinkedIn account. I turned to ChatGPT for help and after a couple days of iteration and testing, made it work (it would have taken me a long time to do this without that support). 

Obviously, I needed both the OLDaily XML file and my newsletter on LinkedIn set up to make this work . Here they both are:

   - Newsletter: https://www.linkedin.com/newsletters/oldaily-7369381037719646208/

   - OLDaily XML: https://www.downes.ca/news/OLDaily.xml 

I set up a single directory on my Windows 11 computer at home for the scripts (I was just going to run it off the server but it gets a bit complicated doing it off an SSH command line). There are three major files:

   .env

   li_newsletter_selenium.py

   run_oldaily.ps1

The first file defines things like the URL and passwords (if you were doing this yourself you'd set your own values here):

LINKEDIN_EMAIL=s***a
LINKEDIN_PASSWORD=***
NEWSLETTER_NAME=OLDaily   # must match exactly in LinkedIn’s modal
FEED_XML_URL=https://www.downes.ca/news/OLDaily.xml
MAX_ISSUES_PER_RUN=1                   # 1 is safest; raise if you want
HEADLESS=false                         # set true for silent runs
TITLE_PREFIX=OLDaily -  
TITLE_DATE_FORMAT=%Y-%m-%d
TIMEZONE=America/Toronto
# optional:
# EDITION_DATE=2025-09-05
PROFILE_DIR=E:\Websites\downes\chrome_profile
COMPOSER_URL=https://www.linkedin.com/article/new/author=urn%3Ali%3Afsd_profile%3AACoAAAAI52YBB6qnG3mdwHncS6-Lx5nnkx5Rz8I


Get the 'composer URL' from LinkedIn by creating a newsletter,. then creating an article for that newsletter.

I had to import a number of libraries for the Python script, including especially Selenium. So I updated my Python installation and created the project directory  ( E:\Websites\downes ) then created a python environment (I've always hated Python environments, which is why I spent so many years as a Perl coder):

(In PowerShell)
python -m venv .\venv
.\venv\Scripts\Activate.ps1


Then I imported my dependencies (In PowerShell):

python -m pip install -U pip setuptools wheel
pip install selenium feedparser beautifulsoup4 python-dotenv tzdata requests

 

Then I put my Python script into the project directory (this took a *lot* of iteration to get right): 

# li_newsletter_selenium.py
# Create ONE LinkedIn newsletter issue from a single XML page.
# Title: "OLDaily - <today's date>"
# Body: each <item> becomes a paragraph: <p><a href="LINK">TITLE</a> DESCRIPTION</p>

import os, json, time, sys, html
from pathlib import Path
from dataclasses import dataclass
from datetime import datetime
from zoneinfo import ZoneInfo, ZoneInfoNotFoundError

import feedparser
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from urllib.parse import urljoin



# --- Selenium (standard) ---
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ===================== CONFIG & ENV =====================

load_dotenv()

# LinkedIn auth / target
LINKEDIN_EMAIL = os.getenv("LINKEDIN_EMAIL")        # you@example.com (optional; you can log in manually)
LINKEDIN_PASSWORD = os.getenv("LINKEDIN_PASSWORD")  # (optional)
NEWSLETTER_NAME = os.getenv("NEWSLETTER_NAME")      # must match exactly in LinkedIn UI

# Source XML page (the whole page is one edition)
FEED_XML_URL = os.getenv("FEED_XML_URL")            # e.g. https://example.com/olddaily.xml

# Title settings
TITLE_PREFIX = os.getenv("TITLE_PREFIX", "OLDaily - ")
TITLE_DATE_FORMAT = os.getenv("TITLE_DATE_FORMAT", "%Y-%m-%d")  # change to "%B %d, %Y" if you prefer
TIMEZONE = os.getenv("TIMEZONE", "America/Toronto")
# Optional: override edition date (YYYY-MM-DD); otherwise "today" in TIMEZONE
EDITION_DATE = os.getenv("EDITION_DATE", "").strip()

# Chrome profile & headless
HEADLESS = os.getenv("HEADLESS", "false").lower() == "true"
PROFILE_DIR = str(Path(os.getenv("PROFILE_DIR", r"E:\Websites\downes\chrome_profile")).resolve())

# LinkedIn Article composer URL (your author URN embedded)
COMPOSER_URL = os.getenv(
    "COMPOSER_URL",
    "https://www.linkedin.com/article/new/?author=urn%3Ali%3Afsd_profile%3AACoAAAAI52YBB6qnG3mdwHncS6-Lx5nnkx5Rz8I"
)

POSTED_PATH = Path("posted.json")

# Sanity checks
assert NEWSLETTER_NAME, "Set NEWSLETTER_NAME in .env"
assert FEED_XML_URL, "Set FEED_XML_URL in .env"

# ===================== HELPERS =====================

def sanitize_keep_links(html_in: str, base_url: str) -> str:
    """
    Keep only <a> and <br>. For <a>, keep a safe absolute href.
    Strip all other tags (but keep their text).
    """
    soup = BeautifulSoup(html_in or "", "html.parser")
    for tag in soup.find_all(True):
        if tag.name == "a":
            href = tag.get("href") or ""
            if href:
                href = urljoin(base_url or "", href)
                if href.startswith(("http://", "https://", "mailto:", "tel:")):
                    tag.attrs = {"href": href}
                else:
                    tag.unwrap()
            else:
                tag.unwrap()
        elif tag.name == "br":
            # keep line breaks
            continue
        else:
            tag.unwrap()
    return (str(soup) or "").strip()


def _find_headline_element(drv):
    """Return the best guess for the headline element (WebElement) or None."""
    # Inputs / textareas that look like title/headline
    input_like = [
        "//input[( @placeholder or @aria-label ) and (contains(translate(@placeholder,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'headline') or contains(translate(@placeholder,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'title') or contains(translate(@aria-label,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'headline') or contains(translate(@aria-label,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'title'))]",
        "//textarea[( @placeholder or @aria-label ) and (contains(translate(@placeholder,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'headline') or contains(translate(@placeholder,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'title') or contains(translate(@aria-label,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'headline') or contains(translate(@aria-label,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'title'))]",
    ]
    # Contenteditable candidates
    editable_like = [
        "//div[@contenteditable='true' and (@data-placeholder='Add a headline' or @data-placeholder='Add headline' or contains(translate(@data-placeholder,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'title'))]",
        "//h1[@contenteditable='true']",
        "//div[@role='textbox' and @contenteditable='true' and (contains(translate(@aria-label,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'headline') or contains(translate(@aria-label,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz'),'title'))]",
        "//header//*[(@contenteditable='true') or self::h1[@contenteditable='true']]",
        "(//div[@contenteditable='true'])[1]"
    ]
    for xp in input_like + editable_like:
        try:
            el = WebDriverWait(drv, 10).until(EC.element_to_be_clickable((By.XPATH, xp)))
            return el
        except:
            pass
    return None

def _get_textlike_value(drv, el):
    """Read current value/text from input/textarea/contenteditable via JS."""
    return drv.execute_script("""
        const el = arguments[0];
        const tn = el.tagName.toLowerCase();
        if (tn === 'input' || tn === 'textarea') return el.value || '';
        if (el.getAttribute('contenteditable') === 'true') return el.innerText || el.textContent || '';
        return '';
    """, el) or ""

def _set_via_exec_command(drv, el, text):
    """Use execCommand insertText which triggers input events in most editors."""
    return drv.execute_script("""
        const el = arguments[0];
        const text = arguments[1];
        el.focus();
        try { document.execCommand('selectAll', false, null); document.execCommand('delete', false, null); } catch(e){}
        const ok = document.execCommand('insertText', false, text);
        el.dispatchEvent(new InputEvent('input', {bubbles:true}));
        el.dispatchEvent(new Event('change', {bubbles:true}));
        return ok;
    """, el, text)

def _set_value_and_events(drv, el, text):
    """Directly set value/textContent and fire events."""
    return drv.execute_script("""
        const el = arguments[0];
        const text = arguments[1];
        const tn = el.tagName.toLowerCase();
        el.focus();
        if (tn === 'input' || tn === 'textarea') {
            el.value = text;
        } else if (el.getAttribute('contenteditable') === 'true') {
            el.textContent = text;
        } else {
            return false;
        }
        el.dispatchEvent(new InputEvent('input', {bubbles:true}));
        el.dispatchEvent(new Event('change', {bubbles:true}));
        el.blur();
        return true;
    """, el, text)


def find_clickable(drv, xps, timeout_each=10):
    """Try a list of XPaths; return the first clickable WebElement or None."""
    for xp in xps:
        try:
            el = WebDriverWait(drv, timeout_each).until(EC.element_to_be_clickable((By.XPATH, xp)))
            return el
        except:
            pass
    return None

def ensure_modal(drv, timeout=60):
    """Wait for a modal/dialog to be present."""
    try:
        WebDriverWait(drv, timeout).until(EC.presence_of_element_located(
            (By.XPATH, "//div[contains(@role,'dialog') or contains(@class,'artdeco-modal')]")
        ))
        return True
    except:
        return False


def load_posted():
    if POSTED_PATH.exists():
        try:
            return set(json.loads(POSTED_PATH.read_text()))
        except Exception:
            return set()
    return set()

def save_posted(s):
    POSTED_PATH.write_text(json.dumps(sorted(list(s)), indent=2))

def debug_dump(drv, stem="debug"):
    try:
        png = f"{stem}.png"
        html_path = f"{stem}.html"
        drv.save_screenshot(png)
        Path(html_path).write_text(drv.page_source, encoding="utf-8", errors="ignore")
        print(f"[debug] Saved {png} and {html_path}")
    except Exception as e:
        print(f"[debug] Could not save debug artifacts: {e}")

@dataclass
class NewsItem:
    link: str
    title: str
    description_html: str  # sanitized HTML with <a> preserved


def fetch_news_items_from_xml(url: str) -> list[NewsItem]:
    """Parse the XML page; each <item> becomes a NewsItem (preserving <a> links in description)."""
    d = feedparser.parse(url)
    items: list[NewsItem] = []
    for e in d.entries:
        base_url = (e.get("link") or url or "").strip()
        link = (e.get("link") or "").strip()
        title = (e.get("title") or "").strip()

        # Prefer full content, else description/summary
        desc_html_raw = ""
        if getattr(e, "content", None):
            try:
                desc_html_raw = e.content[0].value or ""
            except Exception:
                desc_html_raw = ""
        if not desc_html_raw:
            desc_html_raw = e.get("description") or e.get("summary") or ""

        desc_html = sanitize_keep_links(desc_html_raw, base_url)

        if link or title or desc_html:
            items.append(NewsItem(link=link, title=title, description_html=desc_html))
    return items


from zoneinfo import ZoneInfo, ZoneInfoNotFoundError

def make_title() -> str:
    tz = None
    try:
        tz = ZoneInfo(TIMEZONE)
    except ZoneInfoNotFoundError:
        try:
            import tzdata  # ensure package is present
            tz = ZoneInfo(TIMEZONE)
        except Exception:
            tz = datetime.now().astimezone().tzinfo  # fallback to local tz

    if EDITION_DATE:
        try:
            dt = datetime.strptime(EDITION_DATE, "%Y-%m-%d")
            if dt.tzinfo is None:
                dt = dt.replace(tzinfo=tz)
        except ValueError:
            dt = datetime.now(tz)
    else:
        dt = datetime.now(tz)

    return f"{TITLE_PREFIX}{dt.strftime(TITLE_DATE_FORMAT)}"

def build_issue_html(items: list[NewsItem]) -> str:
    """
    Render each item as:
      <p><a href="LINK">TITLE</a> DESCRIPTION_HTML</p>
    DESCRIPTION_HTML is sanitized but preserves <a> and <br>.
    """
    parts = []
    for it in items:
        link_attr = html.escape(it.link or "", quote=True)
        title_text = html.escape(it.title or "")
        desc_html = it.description_html or ""
        if link_attr and title_text:
            parts.append(f'<p><a href="{link_attr}">{title_text}</a> {desc_html}</p>')
        elif title_text:
            parts.append(f'<p><strong>{title_text}</strong> {desc_html}</p>')
        elif desc_html:
            parts.append(f"<p>{desc_html}</p>")
    return "\n".join(parts)


# ===================== SELENIUM SETUP =====================

def make_driver():
    Path(PROFILE_DIR).mkdir(parents=True, exist_ok=True)
    options = Options()
    options.add_argument(f"--user-data-dir={PROFILE_DIR}")
    options.add_argument("--profile-directory=Default")
    options.add_argument("--disable-notifications")
    options.add_argument("--start-maximized")
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)
    if HEADLESS:
        options.add_argument("--headless=new")
    drv = webdriver.Chrome(options=options)
    drv.set_page_load_timeout(120)
    drv.implicitly_wait(2)
    return drv

def wait(drv, timeout=45):
    return WebDriverWait(drv, timeout)

def logged_in(drv):
    try:
        wait(drv, 8).until(EC.presence_of_element_located((By.CSS_SELECTOR, "input[placeholder*='Search']")))
        return True
    except:
        return False

def ensure_login(drv):
    drv.get("https://www.linkedin.com/login")
    time.sleep(1.5)
    if logged_in(drv):
        print("[login] Already logged in.")
        return
    if not LINKEDIN_EMAIL or not LINKEDIN_PASSWORD:
        print("[login] Waiting for manual login/2FA…")
        for _ in range(150):
            if logged_in(drv):
                print("[login] Detected logged-in state.")
                return
            time.sleep(1)
        print("[login] Proceeding without explicit login check.")
        return
    try:
        email = wait(drv).until(EC.presence_of_element_located((By.ID, "username")))
        pwd = drv.find_element(By.ID, "password")
        email.clear(); email.send_keys(LINKEDIN_EMAIL)
        pwd.clear(); pwd.send_keys(LINKEDIN_PASSWORD); pwd.send_keys(Keys.ENTER)
        for _ in range(120):
            if logged_in(drv):
                print("[login] Success.")
                return
            time.sleep(1)
    except Exception as e:
        print(f"[login] warning: {e}")

# ---------- Composer opening (robust) ----------

def ready_state_complete(drv, timeout=60):
    t0 = time.time()
    while time.time() - t0 < timeout:
        try:
            if drv.execute_script("return document.readyState") == "complete":
                return True
        except:
            pass
        time.sleep(0.25)
    return False

def click_if_visible(drv, xpaths, pause=0.25):
    for xp in xpaths:
        try:
            el = WebDriverWait(drv, 3).until(EC.element_to_be_clickable((By.XPATH, xp)))
            drv.execute_script("arguments[0].click();", el)
            time.sleep(pause)
        except:
            pass

def editor_ready(drv):
    selectors = [
        "//div[@contenteditable='true' and (@data-placeholder='Add a headline' or @data-placeholder='Add headline')]",
        "//h1[@contenteditable='true']",
        "//div[@role='textbox' and contains(@aria-label,'headline')]",
        "//div[@contenteditable='true' and contains(@aria-label,'headline')]",
        "//div[@contenteditable='true' and contains(@data-placeholder,'Start writing')]",
        "//div[@role='textbox' and @contenteditable='true' and not(ancestor::header)]",
        "//div[@contenteditable='true' and not(ancestor::header)]",
        "//*[@data-test-id[contains(.,'editor')]]//div[@contenteditable='true']",
    ]
    for xp in selectors:
        if drv.find_elements(By.XPATH, xp):
            return True
    return False

def try_composer_url(drv):
    drv.get(COMPOSER_URL)
    ready_state_complete(drv, 60)
    # Dismiss banners/coachmarks
    click_if_visible(drv, [
        "//button[.//span[contains(.,'Accept') or contains(.,'Agree')]]",
        "//button[.//span[contains(.,'Got it') or contains(.,'OK') or contains(.,'Ok')]]",
        "//button[.//span[contains(.,'Skip') or contains(.,'Not now')]]",
        "//button[normalize-space()='Accept']",
        "//button[normalize-space()='Agree']",
        "//button[normalize-space()='Got it']",
        "//button[normalize-space()='Skip']",
        "//button[normalize-space()='Not now']",
    ])
    t0 = time.time()
    while time.time() - t0 < 75:
        if editor_ready(drv):
            return True
        time.sleep(0.5)
    return False

def try_feed_then_click_write_article(drv):
    drv.get("https://www.linkedin.com/feed/")
    ready_state_complete(drv, 60)
    click_if_visible(drv, [
        "//button[.//span[contains(.,'Accept') or contains(.,'Agree')]]",
        "//button[.//span[contains(.,'Got it') or contains(.,'OK') or contains(.,'Ok')]]",
        "//button[.//span[contains(.,'Skip') or contains(.,'Not now')]]",
    ])
    candidates = [
        "//a[contains(@href,'/article/new')]",
        "//a[.//span[contains(.,'Write article')]]",
        "//button[.//span[contains(.,'Write article')]]",
        "//div[contains(@data-test-id,'share-box')]//a[contains(@href,'/article/new')]",
    ]
    for xp in candidates:
        try:
            el = wait(drv, 20).until(EC.element_to_be_clickable((By.XPATH, xp)))
            drv.execute_script("arguments[0].scrollIntoView({block:'center'});", el)
            drv.execute_script("arguments[0].click();", el)
            break
        except:
            pass
    if len(drv.window_handles) > 1:
        drv.switch_to.window(drv.window_handles[-1])
    ready_state_complete(drv, 60)
    t0 = time.time()
    while time.time() - t0 < 75:
        if editor_ready(drv):
            return True
        time.sleep(0.5)
    return False

def open_composer(drv):
    if try_composer_url(drv):
        return
    if try_feed_then_click_write_article(drv):
        return
    debug_dump(drv, "debug_composer")
    raise TimeoutError("LinkedIn editor did not appear. See debug_composer.*")

# ---------- Editor actions ----------

def set_headline(drv, headline):
    el = _find_headline_element(drv)
    if not el:
        debug_dump(drv, "debug_headline_not_found")
        raise RuntimeError("Could not locate the headline field.")

    # Scroll into view & focus
    try:
        drv.execute_script("arguments[0].scrollIntoView({block:'center'});", el)
        el.click()
        time.sleep(0.2)
    except:
        pass

    def ok_now():
        val = (_get_textlike_value(drv, el) or "").strip()
        want = (headline or "").strip()
        # Normalize internal whitespace for a fair compare
        val = " ".join(val.split())
        want = " ".join(want.split())
        return val == want

    # Strategy 1: plain key events
    try:
        el.send_keys(Keys.CONTROL, "a"); el.send_keys(Keys.DELETE)
        time.sleep(0.1)
        # type in chunks to better trigger frameworks like Draft.js/Slate
        for chunk in [headline[i:i+20] for i in range(0, len(headline), 20)]:
            el.send_keys(chunk)
            time.sleep(0.02)
        if ok_now():
            return
    except:
        pass

    # Strategy 2: execCommand('insertText') (fires beforeinput/input)
    try:
        _set_via_exec_command(drv, el, headline)
        time.sleep(0.15)
        if ok_now():
            return
    except:
        pass

    # Strategy 3: set value/textContent and dispatch input/change
    try:
        _set_value_and_events(drv, el, headline)
        time.sleep(0.15)
        if ok_now():
            return
    except:
        pass

    # Final attempt: re-focus, type again with keys
    try:
        el.click()
        time.sleep(0.1)
        el.send_keys(Keys.CONTROL, "a"); el.send_keys(Keys.DELETE)
        el.send_keys(headline)
        time.sleep(0.15)
        if ok_now():
            return
    except:
        pass

    debug_dump(drv, "debug_headline_sticky")
    raise RuntimeError("Headline could not be set (framework ignored changes).")


def set_body(drv, html_body):
    candidates = [
        "//div[@contenteditable='true' and contains(@data-placeholder,'Start writing')]",
        "//div[@role='textbox' and @contenteditable='true' and not(ancestor::header)]",
        "//div[@contenteditable='true' and not(ancestor::header)]",
        "(//div[@contenteditable='true'])[last()]",
    ]
    last_err = None
    for xp in candidates:
        try:
            body = wait(drv, 20).until(EC.element_to_be_clickable((By.XPATH, xp)))
            drv.execute_script("arguments[0].scrollIntoView({block:'center'});", body)
            body.click()
            drv.execute_script("""
                const el = arguments[0];
                const html = arguments[1];
                el.focus();
                try { document.execCommand('selectAll', false, null); document.execCommand('delete', false, null); } catch(e){}
                const sel = window.getSelection();
                if (!sel.rangeCount) {
                  const r = document.createRange();
                  r.selectNodeContents(el);
                  r.collapse(false);
                  sel.removeAllRanges();
                  sel.addRange(r);
                }
                const range = sel.getRangeAt(0);
                const tmp = document.createElement('div');
                tmp.innerHTML = html;
                const frag = document.createDocumentFragment();
                while (tmp.firstChild) frag.appendChild(tmp.firstChild);
                range.deleteContents();
                range.insertNode(frag);
            """, body, html_body)
            time.sleep(0.8)
            return
        except Exception as e:
            last_err = e
    debug_dump(drv, "debug_body")
    raise RuntimeError(f"Could not set body content: {last_err}")


def click_next(drv):
    """Click the pre-publish 'Next' step if LinkedIn shows it. If it's not there, do nothing."""
    next_selectors = [
        "//button[.//span[normalize-space()='Next']]",
        "//button[normalize-space()='Next']",
        "//button[contains(@aria-label,'Next')]",
        "//button[contains(., 'Next')]",  # fallback (broader)
        "//div[@role='dialog']//button[.//span[normalize-space()='Next']]",  # if inside dialog
    ]
    btn = find_clickable(drv, next_selectors, timeout_each=5)
    if btn:
        try:
            drv.execute_script("arguments[0].scrollIntoView({block:'center'});", btn)
            time.sleep(0.2)
            drv.execute_script("arguments[0].click();", btn)
            print("[next] Clicked.")
            # Give the UI a beat to transition
            time.sleep(0.8)
        except Exception as e:
            print(f"[next] warning: {e}")
    else:
        print("[next] No 'Next' button visible; continuing.")

def select_newsletter_and_publish(drv, subtitle_text):
    """
    Handles BOTH flows:
      A) If there's a 'Publish' button on the page, click it to open the modal.
      B) If we're already in a modal (after 'Next'), just proceed.
    Then: ensure 'Newsletter' destination, choose the target newsletter, fill subtitle, and click Publish.
    """

    def click_publish_button_on_page():
        publish_selectors = [
            "//button[.//span[normalize-space()='Publish']]", 
            "//button[normalize-space()='Publish']",
            "//*[@data-test-id[contains(.,'publish')]]",
            # Some variants put Publish in a sticky header bar
            "//header//button[.//span[normalize-space()='Publish']]", 
        ]
        btn = find_clickable(drv, publish_selectors, timeout_each=5)
        if btn:
            drv.execute_script("arguments[0].scrollIntoView({block:'center'});", btn)
            time.sleep(0.2)
            drv.execute_script("arguments[0].click();", btn)
            print("[publish] Primary clicked (to open modal).")
            return True
        return False

    # 1) If no modal yet, try to open it via a Publish button on the page
    modal_present = ensure_modal(drv, timeout=4)
    if not modal_present:
        opened = click_publish_button_on_page()
        if opened:
            modal_present = ensure_modal(drv, timeout=20)

    if not modal_present:
        # Last-chance: sometimes 'Next' immediately shows modal content; brief wait:
        modal_present = ensure_modal(drv, timeout=10)

    if not modal_present:
        debug_dump(drv, "debug_publish_open")
        raise RuntimeError("Could not open the publish dialog (modal not found).")

    # 2) Inside the modal, prefer 'Newsletter' destination if shown
    try:
        # Radio / tab labeled "Newsletter"
        click_if_visible(drv, [
            "//label[.//span[contains(.,'Newsletter')]]/preceding-sibling::input[@type='radio']",
            "//button[.//span[contains(.,'Newsletter')]]",
            "//*[contains(@role,'tab') and .//span[contains(.,'Newsletter')]]",
        ], pause=0.2)
    except:
        pass

    # 3) Choose your newsletter by name (works whether it's a list or dropdown)
    picked = False

    # (a) Directly clickable label/list item
    try:
        cand = drv.find_elements(By.XPATH,
            f"//span[normalize-space()='{NEWSLETTER_NAME}']/ancestor::*[(self::label or self::button or self::div or self::li)][1]"
        )
        if cand:
            drv.execute_script("arguments[0].click();", cand[0])
            picked = True
    except:
        pass

    # (b) Open dropdown/combobox and pick from menu
    if not picked:
        try:
            # Try to open any newsletter picker
            click_if_visible(drv, [
                "//*[@role='combobox']",
                "//button[contains(@id,'newsletter') and contains(@aria-expanded,'false')]",
                "//button[.//span[contains(.,'Select') and contains(.,'newsletter')]]",
            ], pause=0.3)
            time.sleep(0.4)
            # Click the item by name in the menu/listbox
            opt = find_clickable(drv, [
                f"//div[@role='listbox']//div[normalize-space()='{NEWSLETTER_NAME}']",
                f"//ul[contains(@role,'listbox')]//li[.//span[normalize-space()='{NEWSLETTER_NAME}']]",
                f"//*[self::div or self::span or self::li][normalize-space()='{NEWSLETTER_NAME}']",
            ], timeout_each=5)
            if opt:
                drv.execute_script("arguments[0].click();", opt)
                picked = True
        except:
            pass

    if not picked:
        print("[publish] Newsletter picker not visible or already selected; continuing.")

    # 4) Subtitle/description (optional field in modal)
    try:
        sub = drv.find_element(By.XPATH, "//textarea | //div[@role='textbox' and @contenteditable='true']")
        drv.execute_script("arguments[0].scrollIntoView({block:'center'});", sub)
        sub.click()
        for _ in range(3):
            sub.send_keys(Keys.CONTROL, "a"); sub.send_keys(Keys.DELETE)
        sub.send_keys(subtitle_text[:250])
    except:
        pass

    # 5) Final 'Publish' inside the modal
    confirm_selectors = [
        "//div[contains(@role,'dialog') or contains(@class,'artdeco-modal')]//button[.//span[normalize-space()='Publish']]",
        "//div[contains(@role,'dialog') or contains(@class,'artdeco-modal')]//button[normalize-space()='Publish']",
        "//div[contains(@role,'dialog') or contains(@class,'artdeco-modal')]//button[.//span[contains(.,'Publish now')]]",
        "//div[contains(@role,'dialog') or contains(@class,'artdeco-modal')]//button[.//span[normalize-space()='Post']]",  # rare variant
        "//button[@data-test-id='confirmPublish']",
    ]

    btn = find_clickable(drv, confirm_selectors, timeout_each=15)
    if not btn:
        debug_dump(drv, "debug_publish_confirm")
        raise RuntimeError("Final Publish confirm not found.")
    drv.execute_script("arguments[0].scrollIntoView({block:'center'});", btn)
    time.sleep(0.2)
    drv.execute_script("arguments[0].click();", btn)
    print("[publish] Confirmed.")



# ===================== MAIN FLOW =====================

def main():
    # Build content from XML
    print("[build] Fetching XML…")
    items = fetch_news_items_from_xml(FEED_XML_URL)
    if not items:
        print("[build] No <item> elements found in XML. Aborting.")
        sys.exit(1)
    title = make_title()
    print(f"[build] Title: {title}")
    body_html = build_issue_html(items)

    # Duplicate guard by edition title (date-based)
    posted = load_posted()
    unique_key = f"issue:{title}"
    if unique_key in posted:
        print(f"[guard] Already posted today: {title}")
        return

    # Start browser
    drv = make_driver()
    try:
        ensure_login(drv)
        # Open editor
        open_composer(drv)
        print(f"[editor] Ready. Setting headline/body…")
        set_headline(drv, title)
        set_body(drv, body_html)
        click_next(drv)  # clicks 'Next' if present (otherwise harmless)
        # If no modal within ~2s, re-check the headline and try Next again once.
        if not ensure_modal(drv, timeout=2):
            try:
                el = _find_headline_element(drv)
                if el and _get_textlike_value(drv, el).strip():
                    click_next(drv)
            except:
                pass

        subtitle = f"Summary for {title}"
        select_newsletter_and_publish(drv, subtitle)  # opens modal (if needed) and clicks Publish

        try:
            WebDriverWait(drv, 45).until(EC.presence_of_element_located(
                (By.XPATH, "//div[contains(.,'Published') or contains(.,'published')] | //a[contains(.,'View') and contains(.,'post')]")
            ))
        except:
            pass
        print("[done] Issue published.")
        posted.add(unique_key)
        save_posted(posted)
    except Exception as e:
        print(f"[fatal] {e}")
        debug_dump(drv, "debug_fatal")
        raise
    finally:
        if HEADLESS:
            drv.quit()

if __name__ == "__main__":
    main()

(Here's a downladable link to the script so you don't have to mess with copy and paste):

I also created a directory for my Chrome profile: E:\Websites\downes\chrome_profile

Then I ran the script manually foer the first time, in order to log in and create the profile (this is useful if there's a capcha or 2FA or something). From the project directory in PowerShelll:

python .\li_newsletter_selenium.py
 

This will open Chrome and allow you to log in if you need to (I didn't need to; it just used my .env values and went straight in).

If there are errors the script will output screenshots and error reports in the project directory. Here's one:


Heh.

To automate the newsletter I used the built in task scheduler for WSindows (on Linux I would just use cron, but there's nothing so simple in Windows). 

Here's the script ()run_olddaily.ps1) to run:

Set-Location E:\Websites\downes
& .\venv\Scripts\Activate.ps1
$env:HEADLESS="true"           # run Chrome headless
$env:TITLE_DATE_FORMAT="%Y-%m-%d"  # or "%B %d, %Y" for "September 5, 2025"
python .\li_newsletter_selenium.py *>> .\run.log


Then set up the task scheduler as follows:

Action: Program/script: powershell.exe

Arguments: -ExecutionPolicy Bypass -File "E:\Websites\downes\run_olddaily.ps1" 

Start in: E:\Websites\downes

Trigger: Daily at your preferred time (since I only publish weekdays, I selected 'weekly' and then pick the specific days)

Options: Run whether user is logged on or not; configure for Windows 10/11.

This should work but I haven't run it yet (the script won't run a second time on a given day).

 

That's it!

 

 

 

Comments

Popular Posts