Skip to content
Snippets Groups Projects
Commit f919d75f authored by Aida Nikkhah Nasab's avatar Aida Nikkhah Nasab
Browse files

Refactor visit analysis: filter URLs with visits >= 500 and add average visit...

Refactor visit analysis: filter URLs with visits >= 500 and add average visit comparison for day and night
parent f433bbd3
No related branches found
No related tags found
No related merge requests found
Pipeline #54968 failed
Codes/visit_in_24h/visit_in_24h.png

503 KiB

......@@ -39,49 +39,42 @@ try:
# Group by URL and hour to count visits
df_grouped = df.groupby(["url_hostname", "hour"]).size().reset_index(name="visit_count")
# Ensure all hours are present (0-24), filling missing hours with 0 visits
all_hours = pd.DataFrame({"hour": range(25)})
df_full = pd.DataFrame()
for url in df_grouped["url_hostname"].unique():
df_url = df_grouped[df_grouped["url_hostname"] == url]
df_url_full = pd.merge(all_hours, df_url, on="hour", how="left")
df_url_full["url_hostname"] = url
df_url_full["visit_count"] = df_url_full["visit_count"].fillna(0).astype(int)
df_full = pd.concat([df_full, df_url_full])
# Filter out entries with less than 500 visits
df_filtered = df_grouped[df_grouped["visit_count"] >= 500]
# Define a function for plotting
def plot_chart(df_subset, title):
plt.figure(figsize=(14, 7))
for url in df_subset["url_hostname"].unique():
df_url = df_subset[df_subset["url_hostname"] == url]
plt.plot(
df_url["hour"],
df_url["visit_count"],
color="blue", # Same color for all lines
linewidth=0.8 # Thin lines
)
# Plot: Full-Day Line Chart (Filtered URLs)
plt.figure(figsize=(14, 7))
for url in df_filtered["url_hostname"].unique():
df_url = df_filtered[df_filtered["url_hostname"] == url]
plt.plot(
df_url["hour"],
df_url["visit_count"],
color="lightgreen", # Same light-green color for all lines
linewidth=1.0
)
plt.xticks(range(0, 25), [f"{hour}:00" for hour in range(0, 25)], rotation=45)
plt.xlabel("Hour of the Day", fontsize=12)
plt.ylabel("Number of Visits", fontsize=12)
plt.title("Number of Visit URLs by Hour (Filtered: Visits >= 500)", fontsize=14)
plt.tight_layout()
plt.show()
# Customize x-axis for 24 hours
plt.xticks(range(df_subset["hour"].min(), df_subset["hour"].max() + 1),
[f"{hour}:00" for hour in range(df_subset["hour"].min(), df_subset["hour"].max() + 1)],
rotation=45)
plt.xlabel("Hour of the Day", fontsize=12)
plt.ylabel("Number of Visits", fontsize=12)
plt.title(title, fontsize=14)
plt.tight_layout()
plt.show()
# 1. Plot for night data (00:00 to 04:00)
df_night = df_full[(df_full["hour"] >= 0) & (df_full["hour"] <= 4)]
plot_chart(df_night, "Number of Visit URLs by Hour (Night: 00:00 - 04:00)")
# 2. Plot for day data (04:00 to 24:00)
df_day = df_full[(df_full["hour"] >= 4) & (df_full["hour"] <= 24)]
plot_chart(df_day, "Number of Visit URLs by Hour (Day: 04:00 - 24:00)")
# 3. Plot for all data (00:00 to 24:00)
plot_chart(df_full, "Number of Visit URLs by Hour (All Day: 00:00 - 24:00)")
# Calculate averages
day_avg = df_filtered[df_filtered["hour"].between(0, 4)]["visit_count"].mean()
night_avg = df_filtered[df_filtered["hour"].between(4, 23)]["visit_count"].mean()
# Plot: Two-Bar Chart for Averages
plt.figure(figsize=(8, 6))
plt.bar(
["00:00-04:00 (Day Average)", "04:00-24:00 (Night Average)"],
[day_avg, night_avg],
color=["skyblue", "orange"],
alpha=0.7
)
plt.ylabel("Average Number of Visits", fontsize=12)
plt.title("Day vs. Night Average Visits", fontsize=14)
plt.tight_layout()
plt.show()
except Exception as e:
print(f"An error occurred: {e}")
Codes/visit_in_24h/visit_in_24h_day.png

503 KiB

Codes/visit_in_24h/visit_in_24h_night.png

188 KiB

No preview for this file type
......@@ -348,9 +348,7 @@ Figure \ref{fig:requestcountlinear} provides a linear scale representation of th
Figure \ref{fig:24hvisit} illustrates the number of visits to different URLs over a 24-hour period. The x-axis represents the hours of the day, while the y-axis indicates the number of visits to each URL. This visualization provides a clear overview of the distribution of visits throughout the day, highlighting peak usage times and periods of lower activity. By examining this data, it is possible to identify trends and patterns in user behavior, which can be instrumental in detecting anomalies or suspicious activities. This analysis serves as a foundational step in understanding the dataset's behavior and establishing a baseline for further investigations.
As shown, the distribution of visits predominantly falls within the range of 0500, which is significantly higher compared to the range of 5003,500.
From the figure, it is evident that some URLs exhibit high activity levels initially but experience a steep decline, with their visit counts approaching zero around 04:00. Based on this observation, the URL activity was categorized into two distinct periods: day activity and night activity, each represented by separate charts for better analysis.
As shown, the distribution of visits predominantly falls within the range of 0500, which is significantly higher compared to the rest.
\begin{figure}
\centering
......@@ -359,6 +357,8 @@ From the figure, it is evident that some URLs exhibit high activity levels initi
\label{fig:24hvisit}
\end{figure}
From the figure, it is evident that certain URLs exhibit high activity levels during the initial hours but experience a sharp decline, with their visit counts approaching zero around 04:00. This observation led to the categorization of URL activity into two distinct periods: day activity, which begins at 00:00 and ends at 04:00, and night activity, which spans from 04:00 to 24:00. To better understand these terms and analyze the patterns, the average number of visits during each period was calculated. This analysis provides valuable insights into how URL activity fluctuates throughout the day and highlights significant differences in usage between the two time frames, enabling more effective resource allocation and decision-making based on user behavior.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{../Thesis_Docs/media/seconds.png}
......
Thesis_Docs/media/visit_in_24h.png

503 KiB | W: | H:

Thesis_Docs/media/visit_in_24h.png

205 KiB | W: | H:

Thesis_Docs/media/visit_in_24h.png
Thesis_Docs/media/visit_in_24h.png
Thesis_Docs/media/visit_in_24h.png
Thesis_Docs/media/visit_in_24h.png
  • 2-up
  • Swipe
  • Onion skin
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment